Summary
Overview
Work History
Education
Skills
Awards
Hobbies and Interests
Languages
Personal Information
Certification
Languages
Accomplishments
Timeline
Generic
Sandip Balisha Kamble

Sandip Balisha Kamble

Kothrud, Pune

Summary

Ensure ACA compliance for clients by managing, validating, and analyzing large-scale data from multiple sources.

3.1+ years of Experience in Data Engieering and with Python, AWS,Data Migrations,Data bricks,Pyspark,Data Validataion,Data Governance,Data Modeling,ETL,SQL,API Developement,Testing using Postman, API,Django,Flask,Integration,FastAPI,Azure,SQL,ETL,Python,Git Docker,CICD,Jenkins,Pyspark,DataBricks AWS (S3, Glue, Redshift, Athena), Azure (Data Lake, SQL Database), Apache Kafka, Apache Airflow, Docker, Kubernetes, Terraform, ETL, ELT, Data Warehousing, Pandas, NumPy, Git, Jenkins, Tableau, Power BI, Snowflake, MongoDB, PostgreSQL, MySQL, Hadoop HDFS, Parquet, Avro, JSON, XML, Data Pipeline, Data Governance, BigQuery, DataBricks, REST APIs, Data Lake, Cloud Storage

Overview

5
5
years of professional experience
1
1
Certification

Work History

Data Engineer

Universe Logistics
01.2023 - Current
  • Skil sets : Cloud computing, data engineering, data staging, data anonymization, Couchbase, NoSQL databases, machine learning, predictive analytics, logic app development, API integration, real-time tracking, Python, Java, automation scripting, SQL, Power BI, Tableau, problem-solving, algorithm design, cross-functional collaboration, Agile methodology, Scrum, communication, customer experience optimization
  • Shipping Transformation Project Summary
  • The project aims to modernize logistics by automating processes, reducing costs, and improving customer satisfaction
  • It uses cloud technology and advanced analytics to optimize shipping, enhance real-time tracking, and improve operational efficiency
  • Objectives:
  • Automate logistics processes, and reduce manual work.
  • Provide real-time tracking and predictive delivery updates
  • Optimize performance and reduce shipping costs
  • Process Flow:
  • 1.Customers send emails with logistics data
  • 2.Logic app stores data in staging
  • 3.Data is anonymized and stored in Couchbase
  • 4.Orchestration pipeline identifies the customer and processes the data
  • 5.ML pipeline predicts logistics metrics
  • Challenge: Identifying the correct customer from incoming data
  • Solution: Use an algorithm to match email details and data with customer records

Data Engineer

Costco
09.2021 - 11.2022
  • Company Overview: PeopleSoft ERP
  • Skill sets : AWS S3, AWS Glue, Amazon Athena, AWS Lambda, RDS/Redshift, QuickSight, Python, PySpark, SharePoint API
  • Overview of the Process:
  • Problem: The client faced difficulties in calculating project budgets manually, leading to inaccurate estimates (over or underestimation)
  • Solution: We implemented a machine learning system to automate the budgeting process
  • Data Collection: We gathered historical project data from the client's ERP system, including budget, resources, and costs
  • Data Preparation: The data was cleaned and processed by checking data quality, filling null values, removing duplicates, and eliminating outliers
  • Data Analysis: We analyzed the data to find patterns and relationships
  • Feature Engineering: We used techniques like f-statistic score and SelectKBest to select the most relevant features, and created new features like project duration
  • Modeling: We applied regression models (Linear Regression, Random Forest) to predict project costs
  • The Linear Regression model performed best
  • Integration: The best model was integrated into the ERP system and evaluated using metrics like Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and Adjusted R²
  • User Interface: We developed an API or dashboard for users to input project details and get budget predictions
  • Automation: The process was automated, providing regular reports and continuous monitoring to ensure accuracy
  • Outcome: The solution optimized project budgets by 12% - 15%, reduced cost inefficiencies, improved resource utilization, and led to smoother project execution
  • PeopleSoft ERP

Data Engineer

Mercury Insurance Group
06.2020 - 08.2021
  • Skill sets : AWS S3, AWS Glue, Amazon Athena, AWS Lambda, RDS/Redshift, QuickSight, Python, PySpark, SharePoint API
  • Objective:
  • Ensure ACA compliance for clients by managing, validating, and analyzing large-scale data from multiple sources
  • Key Challenges & Solutions:
  • Large Data Handling: Managed 90GB of monthly data via SharePoint API and email ingestion
  • Dynamic Data Matching: Used fuzzy matching for data accuracy
  • Cost Optimization: Leveraged Amazon Athena and AWS Glue for serverless querying and minimized infrastructure costs
  • Responsibilities:
  • Data Ingestion: Used AWS Lambda and SharePoint API for file processing
  • Data Storage: Managed data in S3 and RDS/Redshift
  • Data Processing: Built ETL pipelines with AWS Glue and Python/PySpark
  • Querying & Reporting: Optimized Athena queries and visualized data in QuickSight
  • Data Quality & Governance: Implemented quality checks for HIPAA/ACA compliance
  • Cost Optimization: Optimized S3 storage and used Glue/Athena for cost savings
  • Technology Stack:
  • AWS S3, Glue, Athena, Lambda, RDS/Redshift, QuickSight

Education

MICROSOFT PYTHON CERTIFICATION - Machine Learning with Python- From Linear Models to Deep Learning

Microsft
01.2023

Mechanical Engineering - ME Mechanical Engg.

Savitribai Phule Pune University
Pune, India
08-2014

Skills

  • Python Expert
  • AWS Expert
  • SQL Expert
  • Big Data Expert
  • Data Warehousing Expert
  • ETL Expert
  • NoSQL Expert
  • Data Modeling Expert
  • Cloud Computing Expert
  • Data Visualization Expert
  • Machine Learning Expert
  • Business Intelligence Expert
  • Data Analysis Expert
  • Apache Spark Expert
  • Apache Hadoop Expert
  • Data Mining Expert
  • AWS S3, Lambda, RDS, Redshift Expert
  • Data Pipeline Expert
  • CICD Expert
  • Jenkins Expert
  • Docker Expert
  • Data integration
  • NoSQL databases
  • Data governance
  • Data pipeline control
  • Scripting languages
  • Data pipeline design
  • Data modeling
  • Data migration
  • Data warehousing
  • Spark framework

Awards

Employee of The Month, 06/01/24

Hobbies and Interests

  • Reading
  • Traveling
  • Bike riding

Languages

  • English
  • Hindi
  • Marathi
  • Japanese

Personal Information

  • Date of Birth: 03/31/01
  • Nationality: Indian
  • Driving License: Yes

Certification

  • AWS Certified Cloud Practitioner

Languages

English
First Language
English
Proficient (C2)
C2
Hindi
Advanced (C1)
C1
Marathi
Proficient (C2)
C2

Accomplishments

  • Best Employee Award

Timeline

Data Engineer

Universe Logistics
01.2023 - Current

Data Engineer

Costco
09.2021 - 11.2022

Data Engineer

Mercury Insurance Group
06.2020 - 08.2021

MICROSOFT PYTHON CERTIFICATION - Machine Learning with Python- From Linear Models to Deep Learning

Microsft

Mechanical Engineering - ME Mechanical Engg.

Savitribai Phule Pune University
Sandip Balisha Kamble