Summary
Overview
Work History
Education
Skills
Certification
Languages
Timeline
Generic

Shubham Kharate

Summary

Google Cloud Certified Professional Data Engineer with over 5 years of hands-on experience in architecting and optimizing scalable data pipelines on the Google Cloud Platform. Proficient in Python, SQL, PySpark, and Big Data technologies, like Apache Spark and Hive. Adept at leveraging GCP services such as BigQuery, Dataproc, Cloud Composer (Airflow), Dataform, and Google Cloud Storage to build end-to-end data workflows that are robust, cost-efficient, and performance-tuned.

Skilled in transforming raw data into actionable insights through rigorous data validation, ETL optimization, and workflow automation. Proven ability to ensure data integrity, enhance pipeline reliability, and improve system performance. Experienced in collaborating with cross-functional teams to align technical solutions with business goals, and drive data-driven decision-making across the organization.

Overview

6
6
years of professional experience
1
1
Certification

Work History

Data Engineer

Globant
Pune
09.2024 - 06.2025

Automated Data Workflows – Used Apache Airflow to schedule and manage ingestion, transformation, and aggregation jobs, ensuring timely data availability.

Performed data validation and quality checks – Implemented null checks, row counts, and validation rules to detect and handle data issues early.

Built Spark jobs for data transformation – Processed raw data in Dataproc using Spark, optimizing performance with repartitioning, broadcasting, and caching.

Executed large-scale data processing – Joins, aggregations, and transformations while fine-tuning Spark configurations for efficiency.

Data Storage & Querying – Leveraged BigQuery partitioning, clustering, and query optimization for cost-effective and faster analytics.

Airflow DAGs for orchestration – Managed dependencies across ingestion, transformation, and aggregation stages for seamless execution.

Monitoring & Alerts – Set up GCP Cloud Monitoring to track pipeline performance, detect failures, and trigger alerts. Enhanced Pipeline Reliability – Integrated retry mechanisms, error handling, and logging in Airflow for robust execution and troubleshooting.

Optimized resource allocation – Leveraged cost-effective compute options, and fine-tuned configurations to enhance performance while maintaining efficiency.

Ensured Data Integrity & Consistency – Validated data at each stage using checksums, row counts, and sample comparisons while reconciling source and target data.

Cost-Effective Strategies – Compressed GCS storage, rationalized BigQuery queries, and monitored resource usage to reduce costs.

Collaborated with Stakeholders – Defined data requirements, shared pipeline insights, and noted designs for easy maintenance and onboarding.

Investigated and resolved pipeline issues - By analyzing logs, utilizing performance monitoring tools, identifying bottlenecks, troubleshooting failures, and implementing optimizations.

Data Engineer

Datametica Solutions Pvt. Ltd.
Pune
10.2024 - 01.2025

Project : Noom

Built end-to-end data pipelines – developed STS jobs to extract batch data, load it into GCS, and further into BigQuery using scheduled Dataform load jobs.
Automated Orchestration – Created individual Dataform DAGs per table, and implemented unit testing to validate pipeline integrity.
Python-Based Utilities Designed Python scripts to dynamically extract source metadata and automate batch data ingestion into BigQuery.
Streamlined Batch Processing Ensured reliability and scheduling of batch data flows, improving maintainability and performance of the pipeline.

Data Engineer

SmartMatrix Digital Services Pvt. Ltd.
Pune
10.2019 - 10.2023

Project : Linfox

Efficient Data Querying Crafted complex Oracle PL/SQL queries to fetch, update, and manage logistics data with optimal performance.
Unit Testing for Reliability Designed and executed test cases to validate SQL logic, ensuring accuracy before deployment.
Pattern-Based Extraction Developed regular expressions for precise validation and extraction of structured data.
Process Documentation – Recorded code logic and workflows for smoother team collaboration, and easier maintenance.
Database Object Management – Maintained tables, views, and indexes to enhance query performance, and data consistency.
Version Control with Git Tracked code changes, collaborated with team members, and maintained project history effectively.
Maintained Data Integrity Enforced validation rules, constraints, and consistency checks within SQL queries to ensure data quality.

Project: HCA

Cross-System Connectivity Built Python utilities to connect source and target systems, enabling seamless data processing using PySpark.
Generic PySpark Framework Enhanced and modularized existing PySpark code for reuse across multiple workflows and platforms.
Data Retrieval via SQL Developed efficient SQL queries to extract relevant datasets from databases for downstream processing.
Version Control & Collaboration Managed code through GitHub, including pull, push, and commit operations for seamless teamwork.
Data Debugging and Validation – Investigated and fixed data issues by sampling, sorting, and analyzing extracted records in Oracle.
Troubleshooting and Optimization – Identified pipeline failures and performance bottlenecks, applying fixes to improve reliability.

Education

Bachelor of Engineering -

MIT Academy of Engineering
Pune
01-2019

Skills

  • SQL
  • Python
  • PySpark
  • Google Cloud Storage (GCS)
  • Bigquery
  • Dataproc
  • Dataflow
  • Googlecloud SDK shell
  • Cloud Composer (Airflow)
  • S3 Bucket
  • Redshift
  • Glue
  • Lambda
  • Athena
  • IAM
  • JIRA
  • Git / GitHub
  • Agile

Certification

  • Google Cloud Certified Professional Data Engineer.

Languages

Marathi
First Language
English
Advanced (C1)
C1
Hindi
Advanced (C1)
C1

Timeline

Data Engineer

Datametica Solutions Pvt. Ltd.
10.2024 - 01.2025

Data Engineer

Globant
09.2024 - 06.2025

Data Engineer

SmartMatrix Digital Services Pvt. Ltd.
10.2019 - 10.2023

Bachelor of Engineering -

MIT Academy of Engineering
Shubham Kharate