Sanjay Nadarajan

Coimbatore,TN

Summary

Results-driven Senior Data Engineer with over 7 years of experience in designing and implementing scalable analytics and data processing solutions on AWS. Expertise in PySpark, Amazon Redshift, and data lakehouse architectures, supported by a strong background in ETL modernization, performance optimization, and large-scale data integration. Proven success in migrating legacy pipelines, optimizing costs, and delivering near real-time analytics through the creation of well-architected, high-performance data platforms. Dedicated to leveraging advanced technologies to drive business insights and improve decision-making processes.

Overview

years of professional experience

Certification

Work History

Senior Data Engineer

Wavicle Data Solutions

07.2022 - Current

Role: Senior Data Engineer
Domain: Cars
Project: Pyspark Forge conversion ,API call , Third Party Development, GA4.
Timeline:
Led the conversion of legacy Spark EMR jobs to Forge EC2 Spot instances, optimizing memory usage, improving job run-time, and delivering significant cost savings.
Designed and implemented PySpark-based ETL pipelines to pull GA4 data from Big Query, perform data transformations, and load into Amazon Redshift, enabling advanced marketing analytics.
Applied performance tuning techniques (partitioning, caching, broadcast joins) to GA4 ETL pipelines, achieving substantial improvements in query latency and processing throughput.
Executed UC4 to Airflow migration, modernizing pipeline orchestration, improving maintainability, and ensuring more reliable job scheduling.
Contributed to new API development and third-party integrations for product launches, building scalable data ingestion frameworks in PySpark and Python.
Built and maintained data warehouse / Lakehouse designs on AWS, leveraging S3, Glue catalog, Athena, Redshift, and Delta Lake for analytical use cases.
Implemented data quality checks, schema evolution, and dimension modelling principles to support downstream analytics and BI reporting.
Collaborated with cross-functional teams (analytics, dev, product) to deliver high-quality, production-grade data pipelines.
Actively engaged in cost optimization initiatives through intelligent resource allocation, EC2 Spot usage, and Lakehouse architectural improvements.
Environment: AWS, Pyspark, EC2 , Airflow, Jenkins.
Delivered a Redshift monolithic-to-multi-cluster migration POC, validating workload isolation, performance gains, and scalable architecture using Redshift data sharing.
Conducted dependency analysis, permission migration, and pilot workload replay to ensure seamless transition with no access regression.

Data Engineer

04.2019 - 06.2022

Role: Data Engineer
Project: Data Migration - Talend On-prem to Cloud, Talend to Pyspark conversion
Domain: Cars
Timeline:
Conversion of Datastage to Talend jobs in cloud using talend big data tool.
Using Spark environment will execute the jobs and also TMC for console purposes.
Hive/Redshift will be used for DDL query creation & optimization.
Created DDLs and views on RedShift for MasterData tables.
Worked on data registration and data ingestion using API Gateway.
Experience working with AWS Cloud services like EC2, Amazon S3, AWS IAM, AWS RDS, EMR, Glue, Athena Redshift, Amazon API Gateway.
Worked on UC4 and Atomic jobs to schedule backfill and repartitioned jobs.
Analysed datasets using Python/SQL, and Jupiter Notebook.
Created Talend jobs for one-time load and enriched to move to S3 bucket and published it to TMC.
Conversion of various talend to PySpark jobs for faster performance.
Environment: Talend, Spark, EMR.

Jr. Data Integration Developer

11.2018 - 03.2019

Role: Jr. Data Integration Developer
Project: Data Migration— C360
Domain: C360
Timeline:
Developed ETL jobs in Talend to cleanse, process, and load data that was like provided Talend jobs.
Managed SQL scripts to execute shell commands for the provided connections and database tables.
Worked on importing data from MSSQL to HDFS to Redshift with ETL jobs.
Performed Data Analysis using Amazon RedShift, MSSQL and have good knowledge in HIVE querying.
Involved in the implementation testing process and manage Offshore in assisting development and testing on a daily basis.
Interacted with clients and support teams to keep track of job scheduling and data loads.
Environment: Talend

Education

Master of Business Administration - Information Systems

Bharathiyar University

Coimbatore, Tamil Nadu

01.2021

Bachelor of Technology - Information Technology

Sri Ramakrishna Engineering College