IT professional with 10+ years of experience, including 7 years in Big Data technologies and 3 years in Automation Testing (Selenium with Java).
Expertise in Hadoop ecosystem, HDFS, Big Data, PySpark, SparkSQL, and Spark Architecture.
Strong hands-on experience in PySpark RDDs, DataFrames, and SparkSQL for data processing and analytics.
Proficient in AWS Cloud Services (S3, EC2, EKS) and Hadoop Cluster Management.
Designed, developed, and deployed scalable ingestion and ETL frameworks using PySpark, Hive, and Airflow.
Implemented data validation and reconciliation in Spark to ensure high data quality.
Optimized Spark applications, reducing job execution time and improving efficiency.
Extensive experience in Python programming, SQL, Hive, Sqoop, MapReduce, Impala, Presto, and SAS.
Strong knowledge of data warehousing concepts, partitioning & bucketing in Hive, and workflow orchestration using Airflow.
Led PoCs and database migrations from traditional systems to Hadoop-based data lakes.
Adept at automation testing using Selenium WebDriver with Java, ensuring robust software quality.
Excellent problem-solving, analytical, and communication skills, with a quick learning ability to adapt to new tools and technologies.
Overview
11
11
years of professional experience
Work History
Senior Big Data Engineer
MACQUARIE GLOBAL SERVICES
Gurugram
11.2021 - Current
Company Overview: Ingestion and ETL Framework
Developed a configuration-based ingestion framework to create scalable ETL pipelines for extracting, transforming, and loading data from multiple sources
Implemented pre-validation and post-validation checks (e.g., data reconciliation) to ensure 99% data accuracy before and after processing
Deployed and executed 50+ production jobs on AWS EKS Spark clusters, ensuring high availability and scalability
Extracted & processed data from multiple sources like Jdbc, Postgres, Sybase, Hive using PySpark, Python programming, storing it in a Hive warehouse through ETL pipelines
Developed & deployed PySpark applications on AWS
Optimized Spark Core & Spark SQL processing, reducing job execution time by 40%
Managed Hadoop clusters and ensured efficient data storage on AWS S3 and HDFS
Automated job orchestration and scheduling using Apache Airflow, reducing manual effort
Monitored and optimized production jobs, minimizing failures and improving system uptime to 99.9%
Ensured code reusability, reducing redundant development efforts by 25%
Led a team of 3 engineers, providing technical guidance and mentoring
Managed weekly production releases, ensuring zero downtime deployments
Followed Agile methodology, participating in sprint planning and retrospectives to enhance project efficiency
Ingestion and ETL Framework
Big Data Engineer
UNITED HEALTH GROUP
Gurugram
09.2018 - 10.2021
Developed a dashboard providing comprehensive analysis, trends, and comparisons of new members joining the UHC insurance plan in the current and previous years
Extracted data from source systems using Sqoop and stored it in the Hive data warehouse
Designed and developed PySpark applications based on business requirements
Processed large datasets using PySpark, leveraging Spark Core and Spark SQL for analytics
Worked extensively with Python and its data-processing libraries
Managed and optimized data processing on a Hadoop cluster
Stored and managed data in HDFS for scalable and efficient access
Performed data analytics using PySpark to generate insights
Orchestrated and scheduled ETL workflows using Apache Airflow
Monitored production jobs to ensure smooth execution and system reliability
Followed Agile methodology for iterative development and continuous improvement
Big Data Engineer and Automation Engineer
ORANGE BUSINESS SERVICES
Gurugram
05.2016 - 09.2018
Company Overview: Smart Data Bundle is an application developed by Orange for users to share various types of datasets
Data is systematically organized by Smart Cities, and registered users have permission to upload data to the platform
Users can download datasets uploaded by others for analysis and insights
The system includes different user roles, such as Data Owners, Data Publishers, and Smart Users, each with distinct access permissions
Managed file storage for the Smart Data Bundle project in HDFS
Processed and transformed data using Pig and Hive
Worked extensively on Hadoop clusters for data management
Monitored production jobs to ensure seamless execution
Participated in daily scrum meetings to align with Agile best practices
Designed and implemented job orchestration and script automation
Followed Agile methodology with a three-week sprint cycle for iterative development
Smart Data Bundle is an application developed by Orange for users to share various types of datasets
Software Engineer
ADEPTIA INDIA PVT. LIMITED
Noida
03.2014 - 05.2016
Company Overview: Adeptia is a self-service integration solution that helps B2B data onboarding companies upload customer information faster into multi-enterprise business ecosystems
Conducted automation testing using Selenium WebDriver with Java for efficient test execution
Performed manual testing, including functional, integration, and regression testing to ensure software quality
Designed and executed test scenarios and test cases, ensuring comprehensive test coverage
Logged and tracked defects using JIRA, facilitating efficient bug resolution
Adeptia is a self-service integration solution that helps B2B data onboarding companies upload customer information faster into multi-enterprise business ecosystems
Education
B.Tech/B.E. -
Uttar Pradesh Technical University (UPTU)
Mathura
07-2014
XIIth - English
Gyan Sthaly Public School
Jhansi
06-2010
Xth - English
Gyan Sthaly Public School
Jhansi
07-2008
Skills
Data Warehousing
ETL development
Big Data Engineer
Big Data Analytics
Pyspark RDD
Python
SparkSQL
SQL
Apache Kafka
JIRA
Pyspark
Spark
Hive
Data pipeline design
Data quality
Agile methodology
Data engineering
Cloudera distribution
Hadoop ecosystem
Pig
Java
Airflow
AWS
Aws cloud
S3 bucket
Languages
English
Hindi
Affiliations
• Worked as a team leader for a team of 3 interns in TATA CMC during Summer Training.