Summary
Overview
Work History
Education
Skills
Certification
Timeline
Tools & Technologies
Generic

Sourabh Maheshwari

Pune,MH

Summary

Dynamic IT professional with over 10 years of experience, including more than 7.5 years specializing in Data Engineering and Data Analytics. Proven expertise in building robust data pipelines and workflows, covering critical phases such as data analysis, transformation, enrichment, and modeling. Demonstrated proficiency in data processing techniques, including validation, masking, and sanitization, complemented by a solid understanding of Azure and AWS environments. Skilled in application program planning, requirement gathering, and developing functional solutions and technical specifications to meet diverse business needs across multiple projects.

Overview

10
10
years of professional experience
1
1
Certificate

Work History

Senior Data Engineer

Ernst & Young LLP
11.2021 - Current

Client: HSBC [Job Role: Senior Data Engineer]

Description: HSBC Corporate and Institutional Banking (CIB) offers financing, global payments, trade solutions and provides a range of wholesale banking services to corporations, financial institutions and governments.

Roles & Responsibilities:

  • Owner of the Requirements– from gathering requirements to development, designing Architectural solutions and deploying applications.
  • Worked on various use cases to develop various Contextual Datasets (CDAs) which are enriched and transformed from Generic Datasets (GDAs) and Master Datasets (MDAs) as per Business requirement.
  • Built and maintained robust Data Pipeline workflows within the Databricks environment to streamline data operations.
  • Worked on creation, managing and monitoring of Airflow DAGs.
  • Designed and developed data transformation scripts using Medallion Architecture incorporating Data ingestion, Data Preprocessing and processing, Data Validation and loading data into the gold layer after all the enrichments and transformation.


Client: Northern Trust [Job Role: Data Engineer]

Description: Northern Trust Corporation is a global financial services company that provides wealth management, asset servicing, banking and liquidity services to large corporations.

Roles & Responsibilities:

  • Worked as a Data Engineer on Azure Services like Synapse Analytics, Databricks to created Data Pipeline which covers end to end steps from Data Ingestion, Data Transformation & Enrichment to finally Data Delivery.
  • Expertise in pre-processing data from various sources and different formats like SFTP, JSON formatted files from APIs, source data from Oracle & MS SQL DB, PSV files, Parquet, Avro etc.
  • Data transferring from source to target by applying operations as per business requirements.
  • Exposing enriched data from the gold layer to further DataMesh architecture which is using Snowflake as a technology and used by a wider group.
  • Involved in Implementation of Agile process of development and CI/CD using Jira, Azure DevOps, GitHub and Git from scratch. Enabled Continuous Delivery through Deployment into several environments of Test, Pre- Prod and Production.

Data Engineer

Infosys Limited
07.2019 - 11.2021

Client: Bank of America [Job Role: Data Engineer ]

Description: Bank of America is a multinational investment bank and financial services holding company offers services such as commercial banking, wealth management, and investment banking, and operates through a network of financial centers, ATMs, and digital banking platforms.

Roles & Responsibilities:

  • Creation and implementation of various rules and transformations on dataset for Data Masking as per business.
  • Worked on Data Quality component, built on Spark for various integrated rule engines to transform data as per business requirements.
  • Experience of working on feature of sensitive data Discovery supports in nearly all file formats like text, csv, pdf, xls, parquet, Avro etc. by performing line by line processing and chunk processing of data.
  • Hands on experience in running Data Discovery, Data Validation as well as Data sanitization thru IEDPS (Infosys Enterprise Data Privacy Suite) Tool.
  • Worked on Sampling of Data by ingesting Data along with development of algorithm and pipelines for Data Masking, Data Sanitization and generation of Synthetic Data.

Senior Software Engineer

Attra Infotech Pvt. Ltd.
01.2016 - 07.2019
  • Worked as Senior Software Engineer as Python Developer.

Education

PGP course - Machine Learning & AI

IIIT, Bangalore

B.Tech. - Computer Science

Arya College of Engineering & I.T.
Jaipur
01.2015

Skills

  • Azure services like Synapse Analytics for building Data Pipelines, Data Lake Storage, Blob Storage, Message Queue, Azure Functions, Container Registry, Azure Devops and Integration of Azure Data Bricks services with Data Pipeline activities
  • GCP services like Big Query, Cloud Composer, Dataproc and GCS buckets
  • PySpark Framework, Spark SQL, Python, Pandas, NumPy, SQL and Hive Queries
  • Hadoop Ecosystem and Big Data Technologies like Hadoop, HDFS, Hive, Apache Spark
  • Airflow Dag Creation, Handling and managing Jenkins Pipelines
  • GitHub and git for version control
  • Familiar to Agile Methodology with usage of JIRA and Azure Boards

Certification

  • PG Diploma in Big Data Engineering affiliated from BITS Pilani.
  • Infosys Global Agile Certification.

Timeline

Senior Data Engineer

Ernst & Young LLP
11.2021 - Current

Data Engineer

Infosys Limited
07.2019 - 11.2021

Senior Software Engineer

Attra Infotech Pvt. Ltd.
01.2016 - 07.2019

B.Tech. - Computer Science

Arya College of Engineering & I.T.

PGP course - Machine Learning & AI

IIIT, Bangalore

Tools & Technologies

  • PySpark, Spark SQL, Python, Pandas, NumPy, SQL, Hive.
  • Synapse Analytics, Azure Databricks, Data Pipeline, Airflow DAG creation, Job Scheduling.
  • AWS Platform: S3 Buckets, Amazon RDS, AWS Glue, Lambda functions, EMR.
  • GCP Platform: GCP Cloud Storage, Big Query, Cloud Composer, Dataproc.
  • MS-SQL, Oracle 10g.
  • Version Controlling and CI/CD integration: Git, GitHub, Azure Devops, Jenkins.
  • JIRA and Azure Board to perform Agile methodology.
  • PyCharm, IntelliJ, Putty, WinSCP, VMware Client, Cloudera.
  • Data Modelling, Data Transformation, Synthetic Data Generation thru IEDPS (Infosys Tool).
  • Data Pre-Processing and Data Scanning thru Infosys Data Workbench with integration with Spark Pipelines.
Sourabh Maheshwari