Summary
Overview
Work History
Education
Skills
Websites
Certification
Timeline
Generic

Sanjay Nadarajan

Coimbatore,TN

Summary

Results-driven Senior Data Engineer with over 7 years of experience in designing and implementing scalable analytics and data processing solutions on AWS. Expertise in PySpark, Amazon Redshift, and data lakehouse architectures, supported by a strong background in ETL modernization, performance optimization, and large-scale data integration. Proven success in migrating legacy pipelines, optimizing costs, and delivering near real-time analytics through the creation of well-architected, high-performance data platforms. Dedicated to leveraging advanced technologies to drive business insights and improve decision-making processes.

Overview

7
7
years of professional experience
1
1
Certification

Work History

Senior Data Engineer

Wavicle Data Solutions
07.2022 - Current
  • Role: Senior Data Engineer
  • Domain: Cars
  • Project: Pyspark Forge conversion ,API call , Third Party Development, GA4.
  • Timeline:
  • Led the conversion of legacy Spark EMR jobs to Forge EC2 Spot instances, optimizing memory usage, improving job run-time, and delivering significant cost savings.
  • Designed and implemented PySpark-based ETL pipelines to pull GA4 data from Big Query, perform data transformations, and load into Amazon Redshift, enabling advanced marketing analytics.
  • Applied performance tuning techniques (partitioning, caching, broadcast joins) to GA4 ETL pipelines, achieving substantial improvements in query latency and processing throughput.
  • Executed UC4 to Airflow migration, modernizing pipeline orchestration, improving maintainability, and ensuring more reliable job scheduling.
  • Contributed to new API development and third-party integrations for product launches, building scalable data ingestion frameworks in PySpark and Python.
  • Built and maintained data warehouse / Lakehouse designs on AWS, leveraging S3, Glue catalog, Athena, Redshift, and Delta Lake for analytical use cases.
  • Implemented data quality checks, schema evolution, and dimension modelling principles to support downstream analytics and BI reporting.
  • Collaborated with cross-functional teams (analytics, dev, product) to deliver high-quality, production-grade data pipelines.
  • Actively engaged in cost optimization initiatives through intelligent resource allocation, EC2 Spot usage, and Lakehouse architectural improvements.
  • Environment: AWS, Pyspark, EC2 , Airflow, Jenkins.
  • Delivered a Redshift monolithic-to-multi-cluster migration POC, validating workload isolation, performance gains, and scalable architecture using Redshift data sharing.
  • Conducted dependency analysis, permission migration, and pilot workload replay to ensure seamless transition with no access regression.

Data Engineer

04.2019 - 06.2022
  • Role: Data Engineer
  • Project: Data Migration - Talend On-prem to Cloud, Talend to Pyspark conversion
  • Domain: Cars
  • Timeline:
  • Conversion of Datastage to Talend jobs in cloud using talend big data tool.
  • Using Spark environment will execute the jobs and also TMC for console purposes.
  • Hive/Redshift will be used for DDL query creation & optimization.
  • Created DDLs and views on RedShift for MasterData tables.
  • Worked on data registration and data ingestion using API Gateway.
  • Experience working with AWS Cloud services like EC2, Amazon S3, AWS IAM, AWS RDS, EMR, Glue, Athena Redshift, Amazon API Gateway.
  • Worked on UC4 and Atomic jobs to schedule backfill and repartitioned jobs.
  • Analysed datasets using Python/SQL, and Jupiter Notebook.
  • Created Talend jobs for one-time load and enriched to move to S3 bucket and published it to TMC.
  • Conversion of various talend to PySpark jobs for faster performance.
  • Environment: Talend, Spark, EMR.

Jr. Data Integration Developer

11.2018 - 03.2019
  • Role: Jr. Data Integration Developer
  • Project: Data Migration— C360
  • Domain: C360
  • Timeline:
  • Developed ETL jobs in Talend to cleanse, process, and load data that was like provided Talend jobs.
  • Managed SQL scripts to execute shell commands for the provided connections and database tables.
  • Worked on importing data from MSSQL to HDFS to Redshift with ETL jobs.
  • Performed Data Analysis using Amazon RedShift, MSSQL and have good knowledge in HIVE querying.
  • Involved in the implementation testing process and manage Offshore in assisting development and testing on a daily basis.
  • Interacted with clients and support teams to keep track of job scheduling and data loads.
  • Environment: Talend

Education

Master of Business Administration - Information Systems

Bharathiyar University
Coimbatore, Tamil Nadu
01.2021

Bachelor of Technology - Information Technology

Sri Ramakrishna Engineering College
Coimbatore, Tamil Nadu
04.2018

Skills

  • Big Data Technologies: Spark, Apache Spark, HDFS, Hive, AWS EMR, Athena, AWS Lambda
  • Cloud & Data Lakehouse: AWS (S3, EC2, Glue, Redshift, Athena), Delta Lake, Serverless
  • Scheduling & Workflow: UC4, Airflow, Jenkins
  • Programming Languages: Pyspark, Python, SQL
  • Databases / NoSQL: SQL, Redshift, Teradata
  • Version Control & DevOps: Git, Jenkins, Jira
  • File Formats & Data Processing: JSON, Parquet, Delta, Avro
  • Platforms: Windows, Unix, Linux
  • ETL development
  • Data migration

Certification

  • AWS Certified Solution Architect Associate. (Valid up to 2028)
  • Databricks Certified Associate Developer for Apache Spark 3.
  • Talend Data Integration Developer

Timeline

Senior Data Engineer

Wavicle Data Solutions
07.2022 - Current

Data Engineer

04.2019 - 06.2022

Jr. Data Integration Developer

11.2018 - 03.2019

Bachelor of Technology - Information Technology

Sri Ramakrishna Engineering College

Master of Business Administration - Information Systems

Bharathiyar University
Sanjay Nadarajan