Work Preference
Summary
Overview
Work History
Education
Skills
Accomplishments
Certification
Work Availability
Affiliations
Awards
Quote
Software
Languages
Interests
Websites
Timeline
Generic
GAYATRI GAUTAM

GAYATRI GAUTAM

Data Engineer Cum Data Scientist
Indore

Work Preference

Important To Me

Work-life balanceCompany CultureWork from home optionPaid sick leaveCareer advancementPersonal development programsPaid time offHealthcare benefits401k match4-day work weekFlexible work hours

Summary

Results-driven Lead Data Engineer and Data Science/Machine Learning Engineer specializing with 10+ years of experience in building scalable data pipelines and Lakehouse architectures. Developed and deployed production ML/AI solutions on Azure, leveraging expertise in Azure Databricks, Azure Data Factory, and MLOps. Delivered analytics and business insights through effective collaboration and innovative problem-solving.

Overview

1
1
Certification
11
11
years of professional experience

Work History

Data Science & Machine Learning Engineer

Swachh Bharat Abhiyaan
03.2025 - Current
  • Designed and implemented scalable ETL and data pipelines using Azure Data Factory, Azure Databricks, PySpark, and Delta Lake to ingest and process structured and unstructured enterprise datasets for analytics and ML workloads.
  • Built and optimized end-to-end machine learning workflows for predictive analytics, forecasting, anomaly detection, and recommendation systems, to accelerate model development and delivery.
  • Developed reproducible feature engineering pipelines using Python and Spark SQL to improve model accuracy, and enable consistent training across environments.
  • Built and deployed ML models using Azure Machine Learning and Databricks ML; implemented MLOps practices, including managed endpoints, batch inference, monitoring, and CI/CD for production reliability.
  • Developed and integrated large language model solutions and generative AI workflows for automated reporting, conversational analytics, intelligent summarization, and business insight generation.
  • Implemented Retrieval-Augmented Generation (RAG) architectures, combining semantic search and vector-based retrieval over enterprise data, to improve LLM relevance.
  • Designed Medallion Architecture Bronze, Silver, and Gold data pipelines to standardize, transform, validate, and curate enterprise data for analytics and ML.
  • Leveraged Delta Lake and Lakehouse Architecture to support scalable analytics and AI workloads across the data platform.
  • Optimized PySpark transformations, partitioning strategies, joins, caching, and distributed processing to improve performance on high-volume datasets.
  • Developed real-time and batch processing pipelines using Azure Event Hubs, Spark Structured Streaming, and ADF orchestration to support streaming analytics.
  • Performed data cleansing, preprocessing, anonymization, and quality validation to ensure data consistency, compliance, and reliable model inputs.
  • Collaborated with cross-functional stakeholders, analysts, and engineering teams to gather requirements and deliver scalable, AI-driven solutions aligned to business objectives.
  • Developed dashboards, analytical reports, and KPI monitoring systems to surface model and business performance for operational decision-making.
  • Conducted root-cause analysis and performance optimization for production pipelines, ML jobs, and distributed data systems to maintain reliability.
  • Worked with cloud-native technologies and distributed computing environments to process large-scale enterprise data efficiently.
  • Led data preprocessing and feature engineering efforts to enhance model performance.
  • Decreased critical medicine stockouts by 22% across rural Health and Wellness Centers by engineering a predictive demand-supply optimization.
  • Identified high-risk disease outbreak hotspots 15 days in advance by developing a cross-domain feature engineering pipeline linking Swachh Bharat waste accumulation metrics with Ayushman Bharat health records.

Sr Data Engineer

Johnson & Johnson
10.2021 - 12.2024
  • Implemented CI/CD pipelines and resolved SonarQube issues to improve code quality and accelerate delivery.
  • Used BusinessObjects, BI tools, and reporting platforms to extract data from data solutions, and data warehouses for stakeholder analytics.
  • Developed new business metrics, and automated metric generation to standardize reporting and reduce manual effort.
  • Performed time-series analysis and predictive modeling to inform forecasting and operational planning.
  • Performed root-cause analysis on data-related system issues, and recommended corrective actions to restore data reliability.
  • Interpreted business questions and collaborated with engineering teams to identify and integrate appropriate data sources.
  • Updated and developed scripts and queries to extract and analyze data from multiple sources for ad-hoc and scheduled analyses.
  • Deployed predictive analytics models to forecast future trends and support decision-making.
  • Collaborated on ETL tasks, maintaining data integrity, and verifying pipeline stability during deployments.
  • Designed and developed analytical data structures to support reporting and advanced analytics.
  • Documented and communicated database schemas using accepted notations to support developer onboarding and maintenance.
  • Designed data models to support complex analysis needs, and reporting requirements.
  • Remote
  • Designed and implemented scalable data pipelines to enhance data processing efficiency.
  • Accelerated big data pipeline execution speeds by 40% by building optimized PySpark data transformation scripts, and tuning auto-scaling cluster configurations within Azure Databricks.

Sr Analyst

Eagle Constructions
08.2020 - 07.2021
  • Performed system analysis, documentation, testing, implementation, and user support during platform transitions to reduce downtime.
  • Validated results and performed quality assurance to assess accuracy of data and reporting outputs.
  • Cultivated relationships with industry and internal stakeholders to share best practices and drive improvements.
  • Conducted workplace compliance training to reduce liability risks and improve operational effectiveness.
  • Guided acquisition processes to capture projected cost and revenue synergies during integrations.
  • Identified and resolved issues through root-cause analysis and research to improve system reliability.
  • Queried databases to gather information required for report processing and decision support.
  • Created data models to support decision-making and reporting needs.

Analyst

Tata Communication LTD
12.2015 - 06.2020
  • Performed system analysis, documentation, testing, implementation, and user support for platform transitions.
  • Validated results and performed quality assurance to assess accuracy of data and reporting.
  • Built relationships across the organization to share insights and improve processes.
  • Optimized core processes to improve business performance and operational agility.
  • Guided acquisition-related analysis to capture cost and revenue synergies.
  • Identified and resolved problems through root-cause analysis and research.
  • Recommended process improvements to identify, analyze, and remediate constraints.
  • Queried databases to support report processing and data requests.
  • Mapped connections between policies and business results to reduce confusion and improve outcomes.
  • Assessed data modeling and statistics to align business processes with data rules.
  • Developed and maintained data warehouses and data marts to support business operations.
  • Identified patterns and trends in large datasets and provided actionable insights.
  • Utilized data visualization techniques to present complex data clearly to stakeholders.
  • Deployed predictive analytics models to forecast trends and support planning.
  • Developed complex dashboards and reporting tools to track business performance metrics.
  • Collaborated with stakeholders to identify business needs and data sources required for analysis.
  • Analyzed data to identify root causes and recommend corrective actions.

Sr Process Analyst

Infosys
01.2015 - 11.2015
  • Created documentation detailing process improvement solutions to enhance operational efficiency.
  • Conducted quality assurance checks on transactions and account actions to ensure market compliance.

Education

Master of Science - Computer Science

IST College
Bhopal, India
06-2019

Bachelor of Computing - Computer + Commerce

BSSS Bhopal
Bhopal, India
04.2001 -

Skills

Tools

Spark SQL, Databricks, PySpark, Confluence, JIRA, ADF, Azure Synapse, ADF pipeline, Logic Apps, Docker, GitHub,

Vercel, Railway, Claude, Codex, OpenAI, n8n pipeline integration, SQL, Power BI, MS PowerPoint, Minitab, MS Word,

Trend Analysis, Data Warehousing, Advanced Excel, ETL Processes, Data Modeling, Variance Analysis, Data Mining,

Natural language processing

Feature engineering

ML Model deployment

Clustering algorithms

Random forests

Decision trees

Statistical modeling

Data analytics

Data exploration

Dimensionality reduction

Support vector machines

K-nearest neighbors

Agile methodologies

Reinforcement learning

Unsupervised learning

Neural networks

Ensemble methods

Semi-supervised learning

Big data analytics

Probabilistic models

Supervised learning

Optimization techniques

Bayesian inference

Gradient boosting machines

Time series analysis

Data storage

Predictive modeling

Data cleaning

Accomplishments

  • RNR Award 2015
  • RNR Award 2017
  • Best Employee of the Year company 2020

Certification

DP203

Work Availability

monday
tuesday
wednesday
thursday
friday
saturday
sunday
morning
afternoon
evening
swipe to browse

Affiliations

  • Microsoft Learn Community, Databricks Community, scaler Community

Awards

RNR Award 2015, RNR Award 2017, Best Employee of the Year company 2020

Quote

There is a powerful driving force inside every human being that, once unleashed, can make any vision, dream, or desire a reality.
Tony Robbins

Software

Pyspark, sql, panda,numpy

Languages

English
Advanced (C1)

Interests

Photography

Timeline

Data Science & Machine Learning Engineer

Swachh Bharat Abhiyaan
03.2025 - Current

Sr Data Engineer

Johnson & Johnson
10.2021 - 12.2024

Sr Analyst

Eagle Constructions
08.2020 - 07.2021

Analyst

Tata Communication LTD
12.2015 - 06.2020

Sr Process Analyst

Infosys
01.2015 - 11.2015

Bachelor of Computing - Computer + Commerce

BSSS Bhopal
04.2001 -

Master of Science - Computer Science

IST College
GAYATRI GAUTAMData Engineer Cum Data Scientist