Summary
Overview
Work History
Education
Skills
Certification
Accomplishments
Work Availability
Timeline
Generic

Jyoti Verma

Data Engineer III
Bengaluru

Summary

Dynamic professional with over 6.5 years of experience in Big Data technologies, including Python, Scala, Unix scripting, AWS, and GCP. As a Data Engineer III at Walmart Global Tech India, I have enhanced and supported various projects, optimizing data pipelines and reducing processing times. My key contributions include automating GCP Bucket cleaning, improving SLA metrics, and developing cost-efficient data pipeline processes. Previously, at Publicis Sapient and Cognizant, I managed data models, data pipelines, and ensured data quality.

Overview

6
6
years of professional experience
4
4
years of post-secondary education
5
5
Certifications

Work History

Data Engineer III

Walmart Global Tech India
6 2022 - Current
  • Proficient in Walmart frameworks for data pipeline management.
  • Enhanced and supported projects like NPD - Marketshare & Pricing Beats, IRI, and Outbound Milkyway.
  • Designed and built pipelines for new data sources to determine full market share.
  • Improved SLA metrics by optimizing Airflow execution and comparing it with Automic.
  • Automated GCP Bucket cleaning post data loads, enhancing traceability and reducing storage costs.
  • Reduced processing time by over 50% for CPO and CFH data pipelines by eliminating unnecessary intermediate tables.
  • Developed and optimized plastics recycle and sync processes, significantly reducing costs and processing times.
  • Migrated dataflows to ephemeral clusters, reducing cloud spend.
  • Supported multiple 3P datasets, ensuring business SLAs were met and improving data quality.
  • Migrated the Plastics Recycle Output process in CIF, reducing storage requirements and improving run time by 57%.
  • Improved the Plastics Sync process in CIF, leveraging Spark to enhance run time by 75%.
  • Developed the High Volume High Frequency (HV/HF) solution, enhancing identity team's traceability by 1%.
  • Automated GCP Bucket cleaning post data loads for Wplus, Experian ACC, and recycling processes.
  • Eliminated data duplication and unnecessary intermediate tables, optimizing storage and processing times.
  • Streamlined tracker table for better traceability and performance improvements.
  • Integrated QA testing for Data Onboarding of 3P pipelines and developed new pipelines using Spark 3.3.
  • Explored CRQ Coach agent on DX AI Assistant and ArchiText Assistance, gaining end-to-end knowledge of Experian Pipeline.
  • Attended Generative AI workshop and ensured code quality and coverage.

Data Engineer

Publicis Sapient
04.2021 - 06.2022

Description: The Trade Surveillance System analyzes complex data sets to detect anomalies in trading behavior and generate alerts for further investigation. It integrates trade and order data from exchanges, enriched trade, position, and PNL data from Endure, and additional market data from Refinitiv. This project supports the ETS Compliance teams and evolves to meet user needs.

Technical Responsibilities:

  • Served as Data Engineer for the Trade Surveillance System project for Eni.
  • Experienced with Big Data Ecosystem, Spark, and Scala.
  • Managed end-to-end deliverables, including client interactions and requirement gathering.
  • Involved in data modeling to develop models for Hive tables.
  • Developed, tested, validated, and supported releases, analyzing issues as needed.
  • Fetched data from various sources (JSON, Excel, fixed format) using Scala API.
  • Transformed data according to the data model using Scala and Spark from layer 1 (JSON/Excel/CSV/fixed file) to layer 2 (Avro) and layer 2 (Avro) to layer 3 (Hive table).
  • Created frameworks to process data from source to Hive table in Data Lake using Scala and Spark.
  • Utilized Kafka framework to monitor counts for each layer and receive notifications of failures.
  • Used Hive and Impala for validating, testing, and analyzing table data.
  • Conducted daily checks for production jobs and fixed failures.
  • Employed Hue to store files for each layer and table data, and Oozie for job scheduling.
  • Documented technical requirements and prepared test result documents for team reference.
  • Validated end-to-end data for all possible test cases to ensure data quality.

Data Engineer

Cognizant Technology Solutions
03.2020 - 04.2021

Description: The Supply Chain Data Lake - Cloud Platform Migration project manages and delivers Walmart business data to the user community. It involves migrating supply chain application data from various sources (Oracle, Informix, Teradata, DB2, SQL, Mainframe) to the Google Cloud Platform and loading it into Hive tables. The goal is to enhance data accessibility, usability, and quality.

Technical Responsibilities:

  • Worked as a Data Engineer for the Supply Chain - GCP Migration project for Walmart.
  • Experienced with Google Cloud Platform and Big Data Ecosystem.
  • Created managed and external tables to store processed data.
  • Improved performance using Hive functions like partitioning and bucketing.
  • Developed data migration strategies from databases such as MySQL, Oracle, Informix, Teradata, DB2, and Mainframe.
  • Imported data from various sources to GCP raw location using Sqoop.
  • Executed ETL processes using Spark/Beeline Hive framework and shell scripts to transform and load raw data into catalog tables.
  • Migrated 500+ supply chain application tables from HDFS to GCP.
  • Performed unit testing of developed tables against defined test cases.
  • Implemented a data quality framework and integrated it into all migrated tables.
  • Ensured data availability and quality for downstream users and data analysts.
  • Participated in periodic meetings with the onsite team to discuss system status.
  • Documented technical requirements, prepared checklists, and created Confluence pages for team reference.
  • Worked on performance tuning, job optimization, script creation, and automation.
  • Developed and automated Spark and UNIX shell scripts for data transformation and loading into GCP storage or data lake.
  • Provided knowledge transfer to team members and other teams as needed.
  • Validated data for all possible test cases using Presto.

Data Engineer

Cognizant Technology Solutions
05.2019 - 02.2020

Description: Quantum Migration handles the migration of regional DCs data to GCP using batch uploads for 42 regional DCs, covering applications like PO download, receiving, order filling, and shipping invoicing. The goal is to enhance data accessibility, usability, and quality.

Technical Responsibilities:

  • Imported data from Informix to HDFS raw location using Sqoop.
  • Used Spark framework and shell scripts for ETL processes to transform and load raw data into catalog tables.
  • Moved raw data to HDFS archive location.
  • Ensured project deliverables met requirements.
  • Processed real-time data feeds from Kafka topics.
  • Segregated history load from incremental load data obtained from Kafka feed.
  • Performed ETL processes and loaded processed data into catalog tables.
  • Built jobs and workflows to automate code script runs using an automation tool.
  • Validated data against test cases and prepared validation reports.
  • Migrated 120 tables from regional DCs to GCP under supply chain.
  • Participated in periodic meetings with the onsite team to discuss system status.
  • Prepared required documents and Confluence pages.
  • Developed and automated UNIX shell scripts for data transformation and loading into GCP storage or data lake.

Developer

Cognizant Technology Solutions
10.2018 - 05.2019

Description: The project focuses on enhancing ECE Ab Initio on the cloud platform and migrating data for existing workflows (WFs) to an S3 bucket by adding dropdown components. The DDE platform is being migrated to the cloud, requiring location changes for source files.

Technical Responsibilities:

  • Enhanced, monitored, and fixed existing jobs from on-prem to AWS cloud environment.
  • Conducted critical analysis, stream catchup, and resolved incidents within SLA, including proactive monitoring and status updates for critical streams.
  • Addressed data quality issues in Snowflake tables.
  • Optimized jobs, identified long-running jobs in production, and implemented necessary changes.
  • Performed requirement analysis and defect fixing.
  • Modified required changes if files did not land in the S3 bucket.
  • Identified and escalated issues to concerned teams for resolution.
  • Documented and analyzed problem details, performed root cause analysis (RCA).
  • Participated in periodic meetings with the onsite team to discuss system status.
  • Monitored Arow Jobs (Ab Initio Jobs) as per client requirements to meet SLA.
  • Provided immediate fixes to production issues and made scheduling changes as per client requirements, including modifying existing jobs, calendar updates, and adding required jobs in existing WFs.

Big Data Developer

Cognizant Technology Solutions
07.2018 - 09.2018

Description: The project aims to implement Slowly Changing Dimensions Type 2 using Hive, updating historical data changes automatically.

Responsibilities:

  • Developed scripts to load data from RDBMS to HDFS based on project requirements.
  • Implemented Slowly Changing Dimensions Type 2 in Hive.
  • Participated in data modeling sessions for Hive tables.
  • Worked on Hive partitioning and created both external and internal tables.
  • Implemented joins and hash functions on staging and history tables to identify updated data.
  • Used Oozie workflow to coordinate Hive QL scripts periodically.

Education

Bachelor of Technology - Electronics & Communication Engineering

Lovely Professional University
Phagwara, India
04.2014 - 01.2018

Skills

Python

Certification

Data Science Foundations: Data Engineering, LinkedIn Learning, 2021

Accomplishments

  • Awarded 'Bravo Award' by Wal-Mart for critical pipeline creation.
  • Awarded 'Best Project Team' and 'Passion' awards by Wal-Mart for excellence in project delivery.
  • Awarded 'Best Employee' by Capital One in 2019.

Work Availability

monday
tuesday
wednesday
thursday
friday
saturday
sunday
morning
afternoon
evening
swipe to browse

Timeline

Data Engineer

Publicis Sapient
04.2021 - 06.2022

Data Engineer

Cognizant Technology Solutions
03.2020 - 04.2021

Data Engineer

Cognizant Technology Solutions
05.2019 - 02.2020

Developer

Cognizant Technology Solutions
10.2018 - 05.2019

Big Data Developer

Cognizant Technology Solutions
07.2018 - 09.2018

Bachelor of Technology - Electronics & Communication Engineering

Lovely Professional University
04.2014 - 01.2018

Data Engineer III

Walmart Global Tech India
6 2022 - Current
Jyoti VermaData Engineer III