Summary
Overview
Work History
Education
Skills
Languages
Affiliations
Timeline
Generic
Prashant Kumar

Prashant Kumar

Gurugram

Summary

  • IT professional with 10+ years of experience, including 7 years in Big Data technologies and 3 years in Automation Testing (Selenium with Java).
  • Expertise in Hadoop ecosystem, HDFS, Big Data, PySpark, SparkSQL, and Spark Architecture.
  • Strong hands-on experience in PySpark RDDs, DataFrames, and SparkSQL for data processing and analytics.
  • Proficient in AWS Cloud Services (S3, EC2, EKS) and Hadoop Cluster Management.
  • Designed, developed, and deployed scalable ingestion and ETL frameworks using PySpark, Hive, and Airflow.
  • Implemented data validation and reconciliation in Spark to ensure high data quality.
  • Optimized Spark applications, reducing job execution time and improving efficiency.
  • Extensive experience in Python programming, SQL, Hive, Sqoop, MapReduce, Impala, Presto, and SAS.
  • Strong knowledge of data warehousing concepts, partitioning & bucketing in Hive, and workflow orchestration using Airflow.
  • Led PoCs and database migrations from traditional systems to Hadoop-based data lakes.
  • Adept at automation testing using Selenium WebDriver with Java, ensuring robust software quality.
  • Excellent problem-solving, analytical, and communication skills, with a quick learning ability to adapt to new tools and technologies.

Overview

11
11
years of professional experience

Work History

Senior Big Data Engineer

MACQUARIE GLOBAL SERVICES
Gurugram
11.2021 - Current
  • Company Overview: Ingestion and ETL Framework
  • Developed a configuration-based ingestion framework to create scalable ETL pipelines for extracting, transforming, and loading data from multiple sources
  • Implemented pre-validation and post-validation checks (e.g., data reconciliation) to ensure 99% data accuracy before and after processing
  • Deployed and executed 50+ production jobs on AWS EKS Spark clusters, ensuring high availability and scalability
  • Extracted & processed data from multiple sources like Jdbc, Postgres, Sybase, Hive using PySpark, Python programming, storing it in a Hive warehouse through ETL pipelines
  • Developed & deployed PySpark applications on AWS
  • Optimized Spark Core & Spark SQL processing, reducing job execution time by 40%
  • Managed Hadoop clusters and ensured efficient data storage on AWS S3 and HDFS
  • Automated job orchestration and scheduling using Apache Airflow, reducing manual effort
  • Monitored and optimized production jobs, minimizing failures and improving system uptime to 99.9%
  • Ensured code reusability, reducing redundant development efforts by 25%
  • Led a team of 3 engineers, providing technical guidance and mentoring
  • Managed weekly production releases, ensuring zero downtime deployments
  • Followed Agile methodology, participating in sprint planning and retrospectives to enhance project efficiency
  • Ingestion and ETL Framework

Big Data Engineer

UNITED HEALTH GROUP
Gurugram
09.2018 - 10.2021
  • Developed a dashboard providing comprehensive analysis, trends, and comparisons of new members joining the UHC insurance plan in the current and previous years
  • Extracted data from source systems using Sqoop and stored it in the Hive data warehouse
  • Designed and developed PySpark applications based on business requirements
  • Processed large datasets using PySpark, leveraging Spark Core and Spark SQL for analytics
  • Worked extensively with Python and its data-processing libraries
  • Managed and optimized data processing on a Hadoop cluster
  • Stored and managed data in HDFS for scalable and efficient access
  • Performed data analytics using PySpark to generate insights
  • Orchestrated and scheduled ETL workflows using Apache Airflow
  • Monitored production jobs to ensure smooth execution and system reliability
  • Followed Agile methodology for iterative development and continuous improvement

Big Data Engineer and Automation Engineer

ORANGE BUSINESS SERVICES
Gurugram
05.2016 - 09.2018
  • Company Overview: Smart Data Bundle is an application developed by Orange for users to share various types of datasets
  • Data is systematically organized by Smart Cities, and registered users have permission to upload data to the platform
  • Users can download datasets uploaded by others for analysis and insights
  • The system includes different user roles, such as Data Owners, Data Publishers, and Smart Users, each with distinct access permissions
  • Managed file storage for the Smart Data Bundle project in HDFS
  • Processed and transformed data using Pig and Hive
  • Worked extensively on Hadoop clusters for data management
  • Monitored production jobs to ensure seamless execution
  • Participated in daily scrum meetings to align with Agile best practices
  • Designed and implemented job orchestration and script automation
  • Followed Agile methodology with a three-week sprint cycle for iterative development
  • Smart Data Bundle is an application developed by Orange for users to share various types of datasets

Software Engineer

ADEPTIA INDIA PVT. LIMITED
Noida
03.2014 - 05.2016
  • Company Overview: Adeptia is a self-service integration solution that helps B2B data onboarding companies upload customer information faster into multi-enterprise business ecosystems
  • Conducted automation testing using Selenium WebDriver with Java for efficient test execution
  • Performed manual testing, including functional, integration, and regression testing to ensure software quality
  • Designed and executed test scenarios and test cases, ensuring comprehensive test coverage
  • Logged and tracked defects using JIRA, facilitating efficient bug resolution
  • Adeptia is a self-service integration solution that helps B2B data onboarding companies upload customer information faster into multi-enterprise business ecosystems

Education

B.Tech/B.E. -

Uttar Pradesh Technical University (UPTU)
Mathura
07-2014

XIIth - English

Gyan Sthaly Public School
Jhansi
06-2010

Xth - English

Gyan Sthaly Public School
Jhansi
07-2008

Skills

  • Data Warehousing
  • ETL development
  • Big Data Engineer
  • Big Data Analytics
  • Pyspark RDD
  • Python
  • SparkSQL
  • SQL
  • Apache Kafka
  • JIRA
  • Pyspark
  • Spark
  • Hive
  • Data pipeline design
  • Data quality
  • Agile methodology
  • Data engineering
  • Cloudera distribution
  • Hadoop ecosystem
  • Pig
  • Java
  • Airflow
  • AWS
  • Aws cloud
  • S3 bucket

Languages

  • English
  • Hindi

Affiliations

• Worked as a team leader for a team of 3 interns in TATA CMC during Summer Training.

• Won Table tennis tournament in Macquarie.

Timeline

Senior Big Data Engineer

MACQUARIE GLOBAL SERVICES
11.2021 - Current

Big Data Engineer

UNITED HEALTH GROUP
09.2018 - 10.2021

Big Data Engineer and Automation Engineer

ORANGE BUSINESS SERVICES
05.2016 - 09.2018

Software Engineer

ADEPTIA INDIA PVT. LIMITED
03.2014 - 05.2016

B.Tech/B.E. -

Uttar Pradesh Technical University (UPTU)

XIIth - English

Gyan Sthaly Public School

Xth - English

Gyan Sthaly Public School
Prashant Kumar