Summary
Overview
Work History
Education
Skills
Disclaimer
Personal Information
Professional Snapshot
Timeline
Generic

MANISHA SRIVASTAVA

Hyderabad

Summary

IT professional with seven years of experience, five of which are dedicated to Data Engineering in cloud environments. Skilled in developing scalable data pipelines and ETL processes using Azure Databricks, PySpark, and SQL. Strong background in creating data solutions on Azure Data Lake and Data Factory, with hands-on experience in both batch and streaming data for real-time analytics. Expertise in data transformation and optimization, along with successful migration of legacy systems to cloud-based architectures.

Overview

7
7
years of professional experience

Work History

Data Engineer

Capgemini
08.2024 - Current
  • Datahub Expansion
  • Developed and maintained scalable and efficient ETL pipelines on Azure Databricks using PySpark for processing large datasets.
  • Migrated legacy ETL workflows to Azure Data Factory, improving reliability and reducing pipeline runtime by 40%.
  • Implemented Delta Lake for handling batch data in a unified format.
  • Built reusable notebook templates in Databricks, promoting standardization across data engineering teams.
  • Worked closely with analysts and data scientists to provide clean, curated, and accessible data.
  • Tech stack: Azure Databricks, PySpark, SQL, ADF, Delta Lake, Azure Data Lake

Data Engineer

IBM
08.2023 - 08.2024
  • Delta Lake Repository
  • Designed and implemented robust data ingestion frameworks from multiple on-prem and cloud sources into Azure Data Lake.
  • Created data transformation pipelines using PySpark and scheduled jobs using ADF triggers.
  • Implemented data quality checks and logging mechanisms for improved data integrity.
  • Supported Power BI teams by providing curated data models in Azure SQL.
  • Tech stack: Azure Data Factory, PySpark, SQL Server, Azure Blob Storage

Big Data Developer

HCL Technologies
07.2020 - 08.2023
  • Retail DataMart
  • The project primarily focused on processing and analyzing point-of-sale data, which was structured into dimension and fact tables to provide meaningful context for sales analysis.
  • To further enhance employee motivation and performance, we designed and implemented an incentive program that rewarded salespeople with the highest sales volumes in each store.
  • Handling a substantial daily data volume of approximately 100GB, we leveraged Apache PySpark and applied optimization techniques like data caching and broadcast joins to significantly accelerate data processing.
  • This not only improved the speed of our data pipelines but also increased the efficiency of our data analysis.
  • One of the project's major achievements was the implementation of a customer engagement strategy that identified infrequent buyers and provided incentives in the form of coupons.
  • This initiative not only boosted customer retention but also had a positive impact on the overall business growth.
  • Spark will run queries to perform aggregations on the data.
  • Input Data will be partitioned on the date and Output data will be partitioned on Date, Country and Source System.
  • Developed CICD pipeline using Jenkins and connected to Bitbucket.
  • Using GIT as a Version Control.
  • Retail DataMart
  • Tech Stack: Pyspark, Jenkins, GIT, Bitbucket, Azure Databricks

AS400 Developer

HCL Technologies
05.2018 - 06.2020
  • AS400 Project
  • Worked collaboratively with the team as well as gathered, analysed, and understood business requirements and design specifications to support application development using AS/400 OS and RPGLE programming language.
  • Developed and implemented RPGLE programs according to technical specifications and established coding standards as well as planned overall unit and integration testing for various application projects.
  • Identified and fixed programming issues as well as analysed, estimated, and made necessary modifications or enhancements to existing AS400 applications and systems.
  • Performed testing on all fixes and enhancements to ensure code quality as well as proper implementation of system improvements.
  • Applied expertise in DB2 SQL to optimize query creation.
  • Drafted Standard Operating Procedures for team.

Education

B.Tech - Computer Science And Engineering

Shri Ramswaroop Memorial College Of Engineering And Management

Skills

  • Azure Data Lake
  • Azure Data Factory
  • Azure Databricks
  • Azure Blob Storage
  • Databricks
  • Apache Spark
  • Delta Lake
  • Python
  • SQL
  • MySQL
  • Azure SQL
  • SQL Server
  • ADF Pipelines
  • Git
  • Azure DevOps

Disclaimer

I hereby declare that the information provided by me is true to the best of my knowledge and understanding.

Personal Information

Date of Birth: 11/24/95

Professional Snapshot

  • 7 years of total IT experience, including 5 years as a Data Engineer with a focus on cloud-based big data solutions.
  • Expertise in building scalable data pipelines and ETL workflows using Azure Databricks, PySpark, and SQL.
  • Proficient in designing end-to-end data solutions on Azure Data Lake, Data Factory, and Delta Lake.
  • Strong experience in developing batch and streaming data solutions for real-time analytics.
  • Skilled in data wrangling, transformation, and performance optimization of large datasets.
  • Worked closely with data analysts, scientists, and business teams to deliver curated and clean data.
  • Experience in migrating legacy systems to modern cloud-based architectures for enhanced efficiency.
  • Hands-on experience working with structured data (SQL-based) and semi-structured data (JSON, Parquet, Avro).
  • Designed data ingestion frameworks capable of handling diverse data formats from varied source systems.
  • Solid understanding of the internal working of Spark.

Timeline

Data Engineer

Capgemini
08.2024 - Current

Data Engineer

IBM
08.2023 - 08.2024

Big Data Developer

HCL Technologies
07.2020 - 08.2023

AS400 Developer

HCL Technologies
05.2018 - 06.2020

B.Tech - Computer Science And Engineering

Shri Ramswaroop Memorial College Of Engineering And Management
MANISHA SRIVASTAVA