Summary
Overview
Work History
Education
Skills
Certification
Accomplishments
Timeline
Generic

Aakaash N

Bangalore

Summary

  • Results-driven Data Engineer with 4.5 years of experience in the Big Data domain, delivering scalable and high-performance data solutions.
  • Strong expertise in big data processing, storage, and analytics, leveraging PySpark, SQL, Python, Hive, Hadoop, Sqoop, Apache Spark, and AWS services to efficiently handle large-scale datasets.
  • Proven experience in data orchestration and automation, utilizing AWS Airflow to design, schedule, and monitor complex data workflows, improving pipeline reliability and efficiency.
  • Proficient in advanced data management and optimization techniques, working with diverse data formats such as Parquet, CSV, JSON, and XML to ensure performance, scalability, and compatibility.
  • Demonstrated strong problem-solving and analytical skills, effectively identifying bottlenecks and implementing optimized solutions in distributed data environments.
  • Exhibited leadership and collaboration skills, working cross-functionally with architects, analysts, and stakeholders to design and implement scalable, business-aligned data models.
  • Delivered enterprise-grade Big Data solutions for global clients including Mercedes-Benz and Daimler Truck, optimizing AWS-based data pipelines and workflows to drive impactful business outcomes.
  • Seeking challenging and higher-responsibility assignments to apply advanced Big Data expertise and contribute to data-driven decision-making at scale.

Overview

5
5
years of professional experience
1
1
Certification

Work History

Technology Analyst - Big data

Infosys Limited
Bengaluru
05.2022 - Current
  • Engineering scalable data pipelines using Spark, Hive, and Hadoop to ensure efficient data processing.
  • Orchestrating end-to-end Spark workflow on AWS Airflow, AWS S3, and AWS Athena, integrating data from Snowflake to automate and streamline data pipelines and data sources from snowflake.
  • Integrating Spark SQL with Hadoop, Hive, and Kafka to enhance data processing performance and workflow efficiency.
  • Configuring Sqoop for incremental data transfers, leveraging its import features to maintain data consistency.
  • Designing and implementing data lake architectures on Amazon S3, utilizing partitioning and columnar formats like Parquet to boost query performance and minimize storage costs.
  • Executing data import/export operations with Sqoop, managing various formats including CSV, Avro, and Parquet.
  • Applying Spark DataFrame transformations and actions to process large-scale structured and semi-structured datasets, including filtering, mapping, reducing, grouping, and aggregating.
  • Leveraging Spark DataFrame caching and persistence to reduce processing overhead and improve query execution speed.
  • Orchestrating Spark DataFrame schemas by adding, renaming, dropping columns, casting data types, and managing null values.
  • Implementing Spark DataFrame optimization techniques like predicate pushdown, column pruning, and vectorized execution to maximize performance and resource utilization.
  • Troubleshooting and resolving data processing issues, performance bottlenecks, and scalability challenges in production environments.
  • Developing performance monitoring and alerting mechanisms to proactively identify and address potential issues.
  • Collaborating with data architects and analysts to design scalable and efficient data models.
  • Utilizing Spark libraries and frameworks, including Spark SQL, MLlib, and GraphFrames, for advanced processing tasks.
  • Addressing technical challenges in Big Data processing and storage to maintain reliability and efficiency.
  • Clients: Mercedes-Benz and Daimler Truck
  • Boosted data processing efficiency by 35% by re-engineering PySpark jobs with partition pruning, broadcast joins, and optimized shuffle strategies.
  • Designed highly scalable data lake architectures, dramatically improving query performance and lowering storage costs.
  • Automated ETL pipelines with Airflow DAGs and EMR operators, reducing manual interventions by 70% and cutting average job completion time by 2 hours per run.
  • Optimized AWS resources and achieved substantial cost savings while maintaining robust system performance.
  • Scaled data pipelines to handle 5+ TB/day of structured (Snowflake tables) and semi-structured (API JSON, S3 logs) data, enabling downstream analytics at scale.
  • Automated Spark workflows, saving significant manual effort.
  • Ensured 99%+ data consistency with Sqoop-based incremental transfers.
  • Reduced query runtimes and compute costs through strategic Spark optimizations, improving overall pipeline efficiency.
  • Minimized downtime with proactive monitoring and faster MTTR.

Big Data Engineer

Nayagara Technologies LTD
Bengaluru
04.2021 - 05.2022
  • Developed and enhanced Spark DataFrame-based ETL pipelines, applying operations such as select, filter, group by, and join for large datasets.
  • Integrated Spark DataFrames with diverse data sources, including relational databases, CSV, and JSON files, to enable seamless data transformation.
  • Authored custom UDFs and leveraged Spark window functions to execute advanced analytics and implement complex business logic at scale.
  • Conducted performance tuning and optimization of Spark DataFrame applications using partitioning, caching, and compression strategies.
  • Implemented real-time data processing pipelines by integrating Spark DataFrames with streaming sources, delivering timely insights.
  • Created custom serializers and deserializers to optimize storage and retrieval for large-scale applications.
  • Utilized Spark DataFrame APIs for data cleaning, normalization, and validation, enhancing data quality and reliability.
  • Applied DataFrame-based Machine Learning pipelines for feature engineering, contributing to scalable predictive modeling workflow.
  • Performed in-depth data profiling and exploratory data analysis to derive actionable insights for business decisions.
  • Integrated DataFrames with distributed file systems like HDFS and S3 to ensure high-performance, fault-tolerant data storage and processing.
  • Client: Hector Beverages

Education

B.E. - Computer Science

SRM Easwari Engineering College
Chennai
01.2021

XII -

SRV Matriculation Higher Secondary School
Trichy
01.2017

X -

Veludaiyar Higher Secondary School
Thiruvarur
01.2015

Skills

  • Data Lake Architecture
  • Cloud Data Management
  • Performance Optimization
  • Data Integration
  • Advanced Analytics
  • ETL Process Automation
  • Real-time Data Processing
  • Scalable Data Modeling
  • Data Orchestration & Automation
  • Large-Scale Data Processing
  • PySpark
  • Spark SQL
  • Hadoop
  • Hive
  • Sqoop
  • AWS
  • Glue
  • S3
  • Athena
  • Redshift
  • Lambda
  • MySQL
  • Python
  • Communication
  • Problem-solving

Certification

AWS Certified Cloud Practitioner

Accomplishments

Award : Insta Rise Award - 2024

Timeline

Technology Analyst - Big data

Infosys Limited
05.2022 - Current

Big Data Engineer

Nayagara Technologies LTD
04.2021 - 05.2022

B.E. - Computer Science

SRM Easwari Engineering College

XII -

SRV Matriculation Higher Secondary School

X -

Veludaiyar Higher Secondary School
Aakaash N