

Results-driven Data Engineer with nearly 4.5 years of specialized experience in the IT industry, focusing on Hadoop Ecosystem Development and Spark-SQL. Expertise in developing efficient data extraction logic using Python and leveraging AWS services, including S3, EC2, and Redshift, to enhance data management capabilities. Proven track record in overseeing data migration projects, effectively handling 4-5 GB of incremental data daily while creating and managing robust data pipelines and executing ETL processes. Recognized for strong analytical skills and the ability to tackle complex challenges, optimizing data processing workflows to drive operational efficiency.
End-to-End Retail Data Pipeline using AWS (POS to Data Warehouse)
Designed and implemented a scalable end-to-end data pipeline for retail POS data using AWS services, enabling real-time analytics through CDC and batch processing. Built ETL workflows, automated orchestration, and implemented data warehousing with SCD logic in Redshift.
My Responbilities:
Designed an end-to-end data pipeline ingesting POS data from SQL Server using AWS Database Migration Service with Full Load + CDC (Change Data Capture).
Built a data lake on Amazon S3 with partitioned storage strategy to optimize query performance.
Automated metadata discovery using AWS Glue Crawler and maintained a centralized Data Catalog.
Performed data validation and ad-hoc analysis using Amazon Athena to ensure data quality and integrity.
Developed ETL pipelines using AWS Glue for:
Data cleansing (null handling, deduplication)
Data transformation (joins across sales, product, and customer datasets)
Data enrichment (CLV, gross margin calculations)
Implemented event-driven orchestration using Amazon EventBridge and AWS Lambda to trigger downstream workflows.
Designed and executed second-stage ETL (GlueJob2) to load processed data into Amazon Redshift.
Implemented Slowly Changing Dimensions (SCD Type 1 & Type 2) for dimensional modeling in Redshift.
Used AWS Secrets Manager for secure handling of database credentials.
Built staging and final data models (fact & dimension tables) for analytics and reporting.
🔹 Tools & Technologies Section
You can add this separately:
Cloud: AWS (S3, DMS, Glue, Athena, Lambda, EventBridge, Redshift, Secrets Manager)
Databases: SQL Server, Amazon Redshift
Concepts: ETL, Data Lake, Data Warehousing, CDC, SCD Type 1 & 2, Data Modeling
Languages: SQL, Python (if used in Glue).