Results-driven Data Engineer with 4.5 years of experience in the Big Data domain, delivering scalable and high-performance data solutions.
Strong expertise in big data processing, storage, and analytics, leveraging PySpark, SQL, Python, Hive, Hadoop, Sqoop, Apache Spark, and AWS services to efficiently handle large-scale datasets.
Proven experience in data orchestration and automation, utilizing AWS Airflow to design, schedule, and monitor complex data workflows, improving pipeline reliability and efficiency.
Proficient in advanced data management and optimization techniques, working with diverse data formats such as Parquet, CSV, JSON, and XML to ensure performance, scalability, and compatibility.
Demonstrated strong problem-solving and analytical skills, effectively identifying bottlenecks and implementing optimized solutions in distributed data environments.
Exhibited leadership and collaboration skills, working cross-functionally with architects, analysts, and stakeholders to design and implement scalable, business-aligned data models.
Delivered enterprise-grade Big Data solutions for global clients including Mercedes-Benz and Daimler Truck, optimizing AWS-based data pipelines and workflows to drive impactful business outcomes.
Seeking challenging and higher-responsibility assignments to apply advanced Big Data expertise and contribute to data-driven decision-making at scale.
Overview
5
5
years of professional experience
1
1
Certification
Work History
Technology Analyst - Big data
Infosys Limited
Bengaluru
05.2022 - Current
Engineering scalable data pipelines using Spark, Hive, and Hadoop to ensure efficient data processing.
Orchestrating end-to-end Spark workflow on AWS Airflow, AWS S3, and AWS Athena, integrating data from Snowflake to automate and streamline data pipelines and data sources from snowflake.
Integrating Spark SQL with Hadoop, Hive, and Kafka to enhance data processing performance and workflow efficiency.
Configuring Sqoop for incremental data transfers, leveraging its import features to maintain data consistency.
Designing and implementing data lake architectures on Amazon S3, utilizing partitioning and columnar formats like Parquet to boost query performance and minimize storage costs.
Executing data import/export operations with Sqoop, managing various formats including CSV, Avro, and Parquet.
Applying Spark DataFrame transformations and actions to process large-scale structured and semi-structured datasets, including filtering, mapping, reducing, grouping, and aggregating.
Leveraging Spark DataFrame caching and persistence to reduce processing overhead and improve query execution speed.
Orchestrating Spark DataFrame schemas by adding, renaming, dropping columns, casting data types, and managing null values.
Implementing Spark DataFrame optimization techniques like predicate pushdown, column pruning, and vectorized execution to maximize performance and resource utilization.
Troubleshooting and resolving data processing issues, performance bottlenecks, and scalability challenges in production environments.
Developing performance monitoring and alerting mechanisms to proactively identify and address potential issues.
Collaborating with data architects and analysts to design scalable and efficient data models.
Utilizing Spark libraries and frameworks, including Spark SQL, MLlib, and GraphFrames, for advanced processing tasks.
Addressing technical challenges in Big Data processing and storage to maintain reliability and efficiency.
Clients: Mercedes-Benz and Daimler Truck
Boosted data processing efficiency by 35% by re-engineering PySpark jobs with partition pruning, broadcast joins, and optimized shuffle strategies.
Designed highly scalable data lake architectures, dramatically improving query performance and lowering storage costs.
Automated ETL pipelines with Airflow DAGs and EMR operators, reducing manual interventions by 70% and cutting average job completion time by 2 hours per run.
Optimized AWS resources and achieved substantial cost savings while maintaining robust system performance.
Scaled data pipelines to handle 5+ TB/day of structured (Snowflake tables) and semi-structured (API JSON, S3 logs) data, enabling downstream analytics at scale.
Executive Assistant Business Partner– Senior Specialist at Intuit India – Product Development Centre Pvt. Ltd.Executive Assistant Business Partner– Senior Specialist at Intuit India – Product Development Centre Pvt. Ltd.