Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Himanchal Jain

Data Engineer

Summary

Driven and experienced professional with around 8 years of expertise in data engineering, big data, analytics, and MLOps. Proficient in designing comprehensive technical solutions that encompass data storage, processing, and infrastructure for consumption, facilitating seamless data operations.

Overview

8
8
years of professional experience
2
2
Certifications

Work History

Data Engineer II

Amazon
Noida
12.2022 - Current
  • Established CI/CD pipelines with Python and TypeScript for automated AWS deployments using CDK.
  • Created distinct repositories for Infrastructure, Spark/SQL scripts, and Python Airflow DAGs, ensuring that all code modifications are deployed to AWS accounts following thorough Amazon code review ecosystem.
  • Outlined data pipeline structures and flows for the Tax data warehouse; published SOPs for migration management and AWS outage recovery.
  • Mentored 7-8 junior team members, facilitating their onboarding and offering feedback through code and design reviews to support their career growth.
  • Implemented security measures: secret manager for credential rotations, limited AWS UI access in production, and tracked developer interventions with CloudTrail logging.

Data Engineer I

Amazon
Gurugram
03.2020 - 11.2022
  • Streamlined data ingestion and access for tax reporting using AWS EMR, Glue, and Redshift, resolving ETL issues and offering on-call customer support.
  • Transitioned to Airflow to utilize open-source libraries, crafting custom Python operators for EMR cluster management and ETL job submissions.
  • Upgraded to latest Graviton based instances, EMR versions & Spark 3.x for optimizing job performances and also modified Pyspark scripts to make it compatible with latest Spark versions.
  • Leveraged EMR managed scaling along with Flexible fleet for better utilization of resources & ensuring availability of resources across multiple regions.
  • Modified data flow of existing pipelines built on top of Amazon internal ETL manager tool to restrict unnecessary bulk scan of data for saving bunch of cost related to Redshift spectrum.
  • Efficiently delivered summarized results, analysis and conclusions to stakeholders during Prime Day and Black Friday/Cyber Monday peak sale events.

Senior Data Engineer

Mindtree Limited
Bangalore
07.2017 - 02.2020
  • Worked on analyzing and churning data related to CPG industry.
  • Used advanced SQL queries on AWS redshift to build complex logic for big chunk of datasets.
  • Able to migrate client datasets from on-premises Windows server to AWS S3 datalake and also scaling up existing ETL process using spark on AWS EMR.
  • Developed multiple Spark jobs using Scala for computing variables to be used for modelling process in machine learning.
  • Leverage multiple libraries in python for data pre-processing and data parsing.
  • Able to orchestrate analytical life-cycle from data ingestion to machine learning modelling using shell scripting and put it into single working framework.
  • Migrated from existing usage of on-demand to Spot EC2 instances which has reduced overall churn cost by around 70 %.

Education

Bachelor of Technology - Computer Science

The LNMIIT
Jaipur
07.2013 - 2017.04

Skills

    Big Data Ecosystem (MapReduce, HDFS, Spark)

AWS Stack (EMR, Redshift, MWAA, CDK, S3, Glue, Lambda)

Airflow

SQL

Python

Bash/Shell/Type Scripting

Certification

AWS Certified Data Analytics - Specialty

Timeline

Data Engineer II

Amazon
12.2022 - Current

Data Engineer I

Amazon
03.2020 - 11.2022

Senior Data Engineer

Mindtree Limited
07.2017 - 02.2020

Bachelor of Technology - Computer Science

The LNMIIT
07.2013 - 2017.04
Himanchal JainData Engineer