Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Yashadatt Sawant

Pune

Summary

Software engineering professional with deep expertise in developing robust, scalable applications. Strong focus on team collaboration, driving projects to successful completion, and adapting to evolving requirements. Proficient in multiple programming languages, frameworks, and tools. Values delivering high-quality results and fostering productive work environment.

Overview

7
7
years of professional experience
1
1
Certification

Work History

Senior Software Engineer

Velotio Technologies
03.2019 - Current

Client: Sight Machine (Jan 2020 – Present)
Role: Core Architect
Frameworks Development:

  • SMWorkspace CLI Tool: Designed and developed with 30+ features (pipeline listing, dashboard updating, Git-style diffing/reporting, data summarizing, etc.), integrated into product and used by all customers.
  • Operator Testing Framework: Validates changes instantly without pipeline execution, reducing test iteration time by ~80%.
  • Data Quality Check Framework: Built for General Mills – replacing manual notebooks with dynamic, no-code analytics for factory teams.
  • Spike and Sudden Drop Detection System: Built a Z-score+IQR-based anomaly detection engine to flag spikes in Kafka topic streams. Optimized ingestion and processing using concurrent Python threads.
  • SEEQ Integration Tool: Built a fully automated Python tool that extracts and parses analytical sheet data from SEEQ APIs, performs dependency resolution using NetworkX graphs and topological sorting, and translates it into Sight Machine-compatible data dictionaries using ANTLR4 parsing. Eliminated manual dictionary creation and streamlined ingestion into analytics pipelines.

Role: Lead Data Engineer
Client: Orora via Sight Machine | Stack: Python, Kafka, Spark, GCP, Azure

  • Owned and executed end-to-end migration of all 7 Orora facilities from ETL2 to ETL3 pipelines.
  • Built roadmap, scoped all data sources, performed transformation mapping, Kafka topic integration, and visualization alignment.
  • Developed and validated high-efficiency pipelines tailored for each site with optimized data processing and minimized latency.
  • Supported full GCP-to-Azure migration, collaborating with infra team for coordination, validation, and UAT for production cutovers.

Role: Data Engineer
Client: Chamberlain Group | Stack: Azure Data Factory, Databricks, PySpark, DBT, Delta Live Tables (DLT)

  • Developed end-to-end DLT pipelines on Azure Databricks, transforming source data into modeled tables based on customer-defined star schemas.
  • Worked across the pipeline lifecycle: identifying source tables, designing inter-table dependencies, and building robust transformations aligned with business requirements.
  • Applied PySpark and DBT for modular, testable transformation logic and semantic layer creation.
  • Participated in Alpha and Unit Testing of data pipelines, validating complex business logic and ensuring correctness across stages before deployment.
  • Gained hands-on experience with Azure Data Factory, orchestrating workflows and data movement in a scalable environment.


Role: Data Engineer / Solution Architect
Client: LeoLabs | Stack: AWS Lambda, S3, SNS, SQS, Redshift, EMR, PySpark, Deequ, CloudWatch
Near Real-Time CDM Pipeline:

  • Architected and implemented a real-time ETL workflow that ingests, transforms, and loads CDM JSON files from zip bundles via event-driven AWS services.
  • New zip files are uploaded to S3, triggering SNS → SQS messages.
  • A Lambda function processes messages in batches, extracts and consolidates CDM JSON files, and loads them into Redshift using COPY from an intermediate S3 bucket.
  • Designed for low-latency, auto-triggered ingestion without manual intervention.

External CDM ETL using Spark:

  • Developed a scheduled Spark job on AWS EMR, triggered via Lambda + CloudWatch, that reads hourly data from RDS, performs transformations, and loads partitioned output to S3.
  • Data is validated using Amazon Deequ, enforcing strict quality checks (datatype, permissible values, null columns).
  • Only clean, validated data proceeds to Redshift, while failure cases are logged and reported through CloudWatch metrics.

Backfill Support & Scaling:

  • Built robust backfill jobs for both CDM and external CDM data, supporting date-based and ID-based ranges.
  • Designed EMR steps to run parallel, non-overlapping backfills with full metadata logging and progress tracking.
  • Ensured Redshift load handling, concurrency control, and no data duplication.


Python Developer

Velotio Technologies
03.2019 - 01.2020

Client: AirPR
Role: Python Developer

  • Developed scalable and maintainable code, ensuring long-term stability of the software.
  • Improved software performance by identifying and resolving bottlenecks in the code.
  • Built backend and automation features using Django, Selenium, and AWS.
  • Developed a Twitter Search API deployed on AWS Lambda.
  • Implemented client-requested features including UI dropdown filters and data handling logic.
  • Wrote unit test cases and developed scrapers for multiple websites to feed PR analytics dashboards.

Android Developer

Raja Software Labs
05.2018 - 09.2019

Client: LinkedIn Android App
Role: Software Engineer

  • Worked on the company account section of LinkedIn’s Android app, contributing new features such as the People tab.
  • Developed unit test cases using JUnit.
  • Created UI test cases using Espresso to ensure seamless user experience.
  • Contributed to Android app enhancements, including authentication modules.
  • Participated in UI/UX improvements and performance optimizations.
  • Wrote tests using Espresso and JUnit frameworks.

Education

B.Tech - Information Technology

Walchand College of Engineering, Sangli
Sangli, India
05-2018

Skills

    SKILLS
    Languages: Python, SQL, Bash
    Data Engineering: Apache Spark, PySpark, Kafka, Airflow, DBT, ETL/ELT, Data Warehousing
    Cloud: AWS (S3, Lambda, Glue, Redshift, EMR, SNS, SQS), Azure Data Factory
    Databases: PostgreSQL, MySQL, MongoDB, Snowflake, Redshift, SQLite
    Tools & Frameworks: Databricks, Docker, Git, Terraform, Pandas, NumPy, REST APIs, Selenium, Django
    Other: Data Modeling, CI/CD, Unit Testing, Monitoring & Alerting (CloudWatch)

Certification

Databricks Certified Data engineer

Timeline

Senior Software Engineer

Velotio Technologies
03.2019 - Current

Python Developer

Velotio Technologies
03.2019 - 01.2020

Android Developer

Raja Software Labs
05.2018 - 09.2019

B.Tech - Information Technology

Walchand College of Engineering, Sangli
Yashadatt Sawant