Summary

Overview

Work History

Education

Skills

Languages

Affiliations

Timeline

Prashant Kumar

Gurugram

Summary

IT professional with 10+ years of experience, including 7 years in Big Data technologies and 3 years in Automation Testing (Selenium with Java).
Expertise in Hadoop ecosystem, HDFS, Big Data, PySpark, SparkSQL, and Spark Architecture.
Strong hands-on experience in PySpark RDDs, DataFrames, and SparkSQL for data processing and analytics.
Proficient in AWS Cloud Services (S3, EC2, EKS) and Hadoop Cluster Management.
Designed, developed, and deployed scalable ingestion and ETL frameworks using PySpark, Hive, and Airflow.
Implemented data validation and reconciliation in Spark to ensure high data quality.
Optimized Spark applications, reducing job execution time and improving efficiency.
Extensive experience in Python programming, SQL, Hive, Sqoop, MapReduce, Impala, Presto, and SAS.
Strong knowledge of data warehousing concepts, partitioning & bucketing in Hive, and workflow orchestration using Airflow.
Led PoCs and database migrations from traditional systems to Hadoop-based data lakes.
Adept at automation testing using Selenium WebDriver with Java, ensuring robust software quality.
Excellent problem-solving, analytical, and communication skills, with a quick learning ability to adapt to new tools and technologies.

Overview

years of professional experience

Work History

Senior Big Data Engineer

MACQUARIE GLOBAL SERVICES

Gurugram

11.2021 - Current

Company Overview: Ingestion and ETL Framework
Developed a configuration-based ingestion framework to create scalable ETL pipelines for extracting, transforming, and loading data from multiple sources
Implemented pre-validation and post-validation checks (e.g., data reconciliation) to ensure 99% data accuracy before and after processing
Deployed and executed 50+ production jobs on AWS EKS Spark clusters, ensuring high availability and scalability
Extracted & processed data from multiple sources like Jdbc, Postgres, Sybase, Hive using PySpark, Python programming, storing it in a Hive warehouse through ETL pipelines
Developed & deployed PySpark applications on AWS
Optimized Spark Core & Spark SQL processing, reducing job execution time by 40%
Managed Hadoop clusters and ensured efficient data storage on AWS S3 and HDFS
Automated job orchestration and scheduling using Apache Airflow, reducing manual effort
Monitored and optimized production jobs, minimizing failures and improving system uptime to 99.9%
Ensured code reusability, reducing redundant development efforts by 25%
Led a team of 3 engineers, providing technical guidance and mentoring
Managed weekly production releases, ensuring zero downtime deployments
Followed Agile methodology, participating in sprint planning and retrospectives to enhance project efficiency
Ingestion and ETL Framework

Big Data Engineer

UNITED HEALTH GROUP

Gurugram

09.2018 - 10.2021

Developed a dashboard providing comprehensive analysis, trends, and comparisons of new members joining the UHC insurance plan in the current and previous years
Extracted data from source systems using Sqoop and stored it in the Hive data warehouse
Designed and developed PySpark applications based on business requirements
Processed large datasets using PySpark, leveraging Spark Core and Spark SQL for analytics
Worked extensively with Python and its data-processing libraries
Managed and optimized data processing on a Hadoop cluster
Stored and managed data in HDFS for scalable and efficient access
Performed data analytics using PySpark to generate insights
Orchestrated and scheduled ETL workflows using Apache Airflow
Monitored production jobs to ensure smooth execution and system reliability
Followed Agile methodology for iterative development and continuous improvement

Big Data Engineer and Automation Engineer

ORANGE BUSINESS SERVICES

Gurugram

05.2016 - 09.2018

Company Overview: Smart Data Bundle is an application developed by Orange for users to share various types of datasets
Data is systematically organized by Smart Cities, and registered users have permission to upload data to the platform
Users can download datasets uploaded by others for analysis and insights
The system includes different user roles, such as Data Owners, Data Publishers, and Smart Users, each with distinct access permissions
Managed file storage for the Smart Data Bundle project in HDFS
Processed and transformed data using Pig and Hive
Worked extensively on Hadoop clusters for data management
Monitored production jobs to ensure seamless execution
Participated in daily scrum meetings to align with Agile best practices
Designed and implemented job orchestration and script automation
Followed Agile methodology with a three-week sprint cycle for iterative development
Smart Data Bundle is an application developed by Orange for users to share various types of datasets

Software Engineer

ADEPTIA INDIA PVT. LIMITED

Noida

03.2014 - 05.2016

Company Overview: Adeptia is a self-service integration solution that helps B2B data onboarding companies upload customer information faster into multi-enterprise business ecosystems
Conducted automation testing using Selenium WebDriver with Java for efficient test execution
Performed manual testing, including functional, integration, and regression testing to ensure software quality
Designed and executed test scenarios and test cases, ensuring comprehensive test coverage
Logged and tracked defects using JIRA, facilitating efficient bug resolution
Adeptia is a self-service integration solution that helps B2B data onboarding companies upload customer information faster into multi-enterprise business ecosystems

Education

B.Tech/B.E. -

Uttar Pradesh Technical University (UPTU)

Mathura

07-2014

XIIth - English

Gyan Sthaly Public School

Jhansi

06-2010

Xth - English

Gyan Sthaly Public School

Jhansi

07-2008

Skills

Data Warehousing
ETL development
Big Data Engineer
Big Data Analytics
Pyspark RDD
Python
SparkSQL
SQL
Apache Kafka
JIRA
Pyspark
Spark
Hive

Data pipeline design
Data quality
Agile methodology
Data engineering
Cloudera distribution
Hadoop ecosystem
Pig
Java
Airflow
AWS
Aws cloud
S3 bucket

Languages

English
Hindi

Affiliations

• Worked as a team leader for a team of 3 interns in TATA CMC during Summer Training.

• Won Table tennis tournament in Macquarie.

Timeline

Senior Big Data Engineer

MACQUARIE GLOBAL SERVICES

11.2021 - Current

Big Data Engineer

UNITED HEALTH GROUP

09.2018 - 10.2021

Big Data Engineer and Automation Engineer

ORANGE BUSINESS SERVICES

05.2016 - 09.2018

Software Engineer

ADEPTIA INDIA PVT. LIMITED

03.2014 - 05.2016

B.Tech/B.E. -

Uttar Pradesh Technical University (UPTU)

XIIth - English

Gyan Sthaly Public School

Xth - English

Gyan Sthaly Public School

Prashant Kumar

Summary

Overview

Work History

Senior Big Data Engineer

Big Data Engineer

Big Data Engineer and Automation Engineer

Software Engineer

Education

B.Tech/B.E. -

XIIth - English

Xth - English

Skills

Languages

Affiliations

Timeline

Senior Big Data Engineer

Big Data Engineer

Big Data Engineer and Automation Engineer

Software Engineer

B.Tech/B.E. -

XIIth - English

Xth - English

Similar Profiles

Vamsidhar ChallaVamsidhar Challa

Ayyappa PillaiAyyappa Pillai

Aditya Priya Aditya Priya null

SUNIL MEHTASUNIL MEHTA