Summary
Overview
Work History
Education
Skills
Additional Information
Timeline
Generic
Sainath Thorve

Sainath Thorve

Data Engineer
Aurangabad

Summary

Results-driven Data Engineer with nearly 4.5 years of specialized experience in the IT industry, focusing on Hadoop Ecosystem Development and Spark-SQL. Expertise in developing efficient data extraction logic using Python and leveraging AWS services, including S3, EC2, and Redshift, to enhance data management capabilities. Proven track record in overseeing data migration projects, effectively handling 4-5 GB of incremental data daily while creating and managing robust data pipelines and executing ETL processes. Recognized for strong analytical skills and the ability to tackle complex challenges, optimizing data processing workflows to drive operational efficiency.

Overview

4
4
years of professional experience

Work History

DATA ENGINEER

Aspirant Technologies PVT LTD.
10.2021 - 03.2026
  • Overall 4.5 years of extensive experience in different domains in the IT industry and mainly in Hadoop Ecosystem Development and Spark-Sql, Spark-Python API’s and Cloud service (Amazon Web Services). Have experience writing data extraction logic in python, databricks. Have a hands-on experience on Amazon web services mainly S3, EC2, EMR, Redshift, IAM Role, RDS. Have experience with Data Flow, Data Pipeline and workflow management tool. Experience processing large amounts of structured and unstructured data, including integrating data from multiple sources. Perform tasks such as writing scripts, calling APIs, write SQL queries, etc. Develop batch processing, integration solutions and process structured and Unstructured Data. Good experience on Spark Architecture including Spark-Core, Spark-Sql. Experience in using Hadoop distribution like Cloudera. Experience in transferring data from RDBMS to HDFS and Hive tables using Sqoop and Spark API’s. Experience in creating tables, partitioning, bucketing, loading in Hive. Gather and process raw data at scale and Analyze processed data.

Education

B.Tech/B.E. -

MGM's Jawaharlal Nehru Engineering College
Aurangabad, India
09-2021

XIIth -

Sardar Dalipsingh Commerce And Science College
Aurangabad, India
05-2017

Xth -

St.Lawrence High School
Aurangabad, India
06-2015

Skills

  • Data Pipeline
  • PySpark
  • AWS
  • Python
  • Data Flow
  • Data Migration
  • AWS Glue

Additional Information

End-to-End Retail Data Pipeline using AWS (POS to Data Warehouse)

Designed and implemented a scalable end-to-end data pipeline for retail POS data using AWS services, enabling real-time analytics through CDC and batch processing. Built ETL workflows, automated orchestration, and implemented data warehousing with SCD logic in Redshift.

My Responbilities:

Designed an end-to-end data pipeline ingesting POS data from SQL Server using AWS Database Migration Service with Full Load + CDC (Change Data Capture).
Built a data lake on Amazon S3 with partitioned storage strategy to optimize query performance.
Automated metadata discovery using AWS Glue Crawler and maintained a centralized Data Catalog.
Performed data validation and ad-hoc analysis using Amazon Athena to ensure data quality and integrity.
Developed ETL pipelines using AWS Glue for:
Data cleansing (null handling, deduplication)
Data transformation (joins across sales, product, and customer datasets)
Data enrichment (CLV, gross margin calculations)
Implemented event-driven orchestration using Amazon EventBridge and AWS Lambda to trigger downstream workflows.
Designed and executed second-stage ETL (GlueJob2) to load processed data into Amazon Redshift.
Implemented Slowly Changing Dimensions (SCD Type 1 & Type 2) for dimensional modeling in Redshift.
Used AWS Secrets Manager for secure handling of database credentials.
Built staging and final data models (fact & dimension tables) for analytics and reporting.
🔹 Tools & Technologies Section
You can add this separately:
Cloud: AWS (S3, DMS, Glue, Athena, Lambda, EventBridge, Redshift, Secrets Manager)
Databases: SQL Server, Amazon Redshift
Concepts: ETL, Data Lake, Data Warehousing, CDC, SCD Type 1 & 2, Data Modeling
Languages: SQL, Python (if used in Glue).

Timeline

DATA ENGINEER

Aspirant Technologies PVT LTD.
10.2021 - 03.2026

B.Tech/B.E. -

MGM's Jawaharlal Nehru Engineering College

XIIth -

Sardar Dalipsingh Commerce And Science College

Xth -

St.Lawrence High School
Sainath ThorveData Engineer