Data Engineer with 4 years of experience in designing and optimizing data pipelines and managing big data workflows on AWS Cloud platforms. Proficient in leveraging cloud services, PySpark ETL scripting, MongoDB, Python, and SQL to create scalable solutions across all project phases. Expertise in designing, testing, and maintaining data management systems, leading to enhanced data retrieval processes and improved system efficiency.
Overview
4
4
years of professional experience
Work History
Data Engineer
Accenture solution
06.2024 - Current
Designed and implemented ETL pipelines with AWS Glue to extract data from S3 and load into MongoDB.
Developed custom transformation logic in AWS Glue scripts to hardcode values based on business rules.
Executed data validation through count checks to ensure integrity between stage and main collections in MongoDB.
Optimized AWS Glue jobs using partitioning strategies in S3 to enhance ETL job performance.
Wrote hardcoded scripts to convert data from S3 into structured JSON arrays for MongoDB ingestion.
Validated record counts and dates to ensure accurate transfer between S3 and MongoDB collections.
Implemented exception handling and logging within AWS Glue for effective error reporting via AWS CloudWatch.
Data Engineer
TCS, Tata consultancy services
03.2022 - 05.2024
Developed an efficient data integration pipeline using AWS Glue, AWS Lambda, and Amazon S3.
Data retrieved through APIs and processed through Pandas for Data insertion, manipulation, and filtering
Data retrieving through API and code through AWS Lambda
Data Ingestion through lambda in AWS S3
Performing POC on client local machine through Pycharm and implementing it on Lambda, Developing an existing Lambda module, testing in separately in Pycharm using VCS git link
Performed Spark Optimization technique by applying Repartition and Coalesce for betterment of data processing using Pyspark
Created AWS IAM when Lambda implementing, then Batch processing on newly arrived data on existing S3 Bucket through AWS Glue
Creating Crawlers with respective to the Database and Data location given, Performed, Feature testing on AWS Glue and AWS Lambda before adding the VCS branch to main Tree on GIT.
Using Pyspark and performing ETL operations on raw data from clients.
Worked on Merging of Data frames and filtering out desired results and providing it to ML team
Write to the S3 Bucket in Parquet Format.
Worked on transaction data and performed transformations for increasing the value of data received.
Performed Transformation and provided it back to ML engineers for Analysis.
Used SNS for mailing troubleshooting and errors received to source coder, ML engineer as well as client.
Changing the source file format and converting it to another desired file format.
Transforming the data type of a column record and providing datasets to the ML team for processing, that info will be received as an Inc.