Summary
Overview
Work History
Education
Skills
Certification
Timeline
PROJECTS & ACHIEVEMENTS
KEY COMPETENCIES
background-images
ANUP SIGEDAR

ANUP SIGEDAR

Pune

Summary

Results-driven Data Engineer and ETL Developer with 5+ years of experience designing, developing, and optimizing large-scale data pipelines and ETL workflows. Proven expertise in migrating legacy systems to modern cloud platforms (AWS, Azure) and converting code to PySpark on Hadoop and Databricks environments. Strong background in data integration, warehousing, and business intelligence with proven ability to improve data processing performance and ensure data quality across enterprise applications.

Overview

7
7
years of professional experience
1
1
Certification

Work History

Senior Software Engineer

IMPETUS TECHNOLOGIES INDIA PVT. LTD.
12.2022 - Current
  • Foresters Finanacial

- Worked on Migration of Informatica Workflows to Azure Databricks Workflows by converting the code to PySpark and processing 10+ million financial data by improving the data load performance by 40%.

- Performed Data Validation and File Validation with respect to the Source Informatica Tables and Files resulting 99% accuracy.

  • Banco General

- Migrated the legacy code of Sybase Database to PySpark on Azure Databricks platform, processing 10+ million records data and doing the data validation by giving 98% accuracy.

  • Customer Relationship Hub (CRH) – Bank of America

- Worked on migration project converting Vertica and DataStage code to PySpark on Hadoop environment, reducing query execution time by 40%

- Engineered and executed Spark 2 to Spark 3 upgrade with comprehensive testing, ensuring zero data loss and 100% compatibility across all modules

- Designed and implemented data validation framework ensuring data integrity across DataStage and PySpark workflows with 99.9% accuracy rate

- Managed Autosys job scheduling for converted PySpark pipelines, successfully scheduling and monitoring 50+ daily ETL jobs

- Collaborated in Agile environment using JIRA, delivering weekly sprints with zero critical bugs in production

  • Capgemini Vanguard Migration Project

- Orchestrated migration of DataStage workflows to PySpark, creating 50+ AWS Glue jobs for data pipeline replication

- Performed comprehensive data validation comparing DataStage and PySpark outputs, identifying and resolving 25+ data inconsistencies

- Reduced data processing time by 50% through PySpark optimization and parallel processing techniques

  • USPS (United States Postal Services) Migration Project

- Migrated Teradata (Bteq) and Ab Initio workflows to PySpark on Azure Databricks platform.

- Designed ETL pipelines handling 10+ million records daily with 99.95% success rate

- Implemented data quality checks and error handling mechanisms, reducing post-processing issues by 35%

  • Leap Logic Tool Project

- Converted Teradata (Bteq) queries to optimized PySpark code, improving query performance by 45%

- Developed and deployed 15+ AWS Glue jobs for automated data ingestion and transformation

Associate R&D Engineer – Data Engineering

ABB INDIA LIMITED
03.2020 - 12.2022
  • Genix Ability Analytics Suite Product Development
  • - Architected and developed optimized data ingestion pipelines serving 5+ enterprise applications, processing 500GB+ monthly data volume
  • - Engineered data integration for Opportunity Loss Manager and System Anomaly Detection applications using Azure Data Factory, reducing data load time by 60%
  • - Designed multi-source data integration consolidating Azure SQL Database, Azure Cosmos DB, and Azure Data Lake, ensuring unified data governance
  • - Implemented Proof of Concept for Talend ETL tool evaluation, documenting integration patterns and performance benchmarks
  • - Successfully deployed Docker-containerized Flask microservices on Azure Kubernetes Services (AKS), exposing REST APIs for 8+ data applications
  • - Independently managed Rundeck scheduler application supporting 12+ teams, scheduling and monitoring 200+ REST API calls daily with 99.8% uptime
  • - Led data validation and debugging for complex ETL flows, identifying and resolving 40+ data quality issues across production pipelines
  • - Collaborated with international teams (Italy) on ENEL Project data loading activities, ensuring compliance with strict SLAs

Intern – Smart City Automation

TECHBEAN SYSTEMS PVT. LTD.
06.2018 - 11.2018
  • - Contributed to IoT-based smart city automation projects, gaining foundational knowledge in data collection and integration

Education

Post Graduate Diploma - Big Data Analytics

Centre for Development of Advanced Computing (C-DAC)
Bengaluru, Karnataka, India
02.2020

Bachelor of Technology - Electronics and Telecommunications

Symbiosis Institute of Technology
Pune, Maharashtra, India
03.2019

Post Graduate Diploma - Business Management

Symbiosis Institute of Business Management
Pune, Maharashtra, India
01.2017

Skills

  • Programming Languages: Python, SQL, Shell Scripting
  • Database & Data Management: SQL Server, Azure SQL Database, Azure Cosmos DB, Azure Data Lake, Teradata, Vertica, Data Integration, Data Warehousing, Extract Transform Load (ETL), Job Scheduling
  • Big Data & Processing: PySpark, Hadoop (HDFS), Hive, Apache Spark
  • Python Libraries: Pandas, NumPy
  • Cloud Platforms: Amazon Web Services (AWS Glue, S3, Lambda, EMR), Microsoft Azure (Azure Databricks, Azure Data Factory, Azure Kubernetes Services, Azure Data Lake)
  • Tools & Frameworks: Apache Airflow, Tableau, Power BI, Apache Rundeck, Talend, Ab Initio, DataStage, Autosys, JIRA, Docker, Kubernetes, Flask
  • Data Science: Machine Learning, Deep Learning, Convolutional Neural Networks (CNN)

Certification

  • Databricks Certified Data Engineer Associate
  • Business Intelligence with Power BI – Skill Nation
  • Analyst Program – Able Jobs
  • Cricket Analytics – Mad About Sports
  • SQL Intermediate Certificate – HackerRank

Timeline

Senior Software Engineer

IMPETUS TECHNOLOGIES INDIA PVT. LTD.
12.2022 - Current

Associate R&D Engineer – Data Engineering

ABB INDIA LIMITED
03.2020 - 12.2022

Intern – Smart City Automation

TECHBEAN SYSTEMS PVT. LTD.
06.2018 - 11.2018

Bachelor of Technology - Electronics and Telecommunications

Symbiosis Institute of Technology

Post Graduate Diploma - Business Management

Symbiosis Institute of Business Management

Post Graduate Diploma - Big Data Analytics

Centre for Development of Advanced Computing (C-DAC)

PROJECTS & ACHIEVEMENTS

  • Projects

Data Analytics Projects | Python, SQL, Tableau, Excel

- Expense Tracker Application (Excel with Macros), - Stock Market Performance Analysis and Forecasting (Python)

- Data Science Job Salaries Analysis and Visualization (Python)

- Sales Data Analysis and Reporting (SQL)

- British Airways Dashboard with Business Intelligence Insights (Tableau)

- Video Games Market Dashboard and Trend Analysis (Tableau)

KEY COMPETENCIES

Azure Databricks |  ETL Development | Cloud Migration | PySpark Optimization | Data Pipeline Architecture| AWS Glue | Data Warehousing | SQL Query Optimization | Data Validation | Apache Spark | Agile Methodology | JIRA | REST API Development

ANUP SIGEDAR