Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic
Sonam Mahankudo

Sonam Mahankudo

GCP Data Engineer
Bangalore

Summary

Driven Data Engineer with almost 4 years of experience wrangling big datasets, tackling challenging architectural and scalability problems and having passion for learning and implementing new concepts. Eager to build robust data solutions that lay the groundwork for revealing game-changing insights. Worked on various Big Data and Cloud Technologies such as GCP, Dataflow, Pubsub, Dataproc, HDFS ,YARN ,Hive ,Scala and Pyspark.

Overview

6
6
years of professional experience
3
3
years of post-secondary education
5
5
Certifications

Work History

Module Lead

LTIMINDTREE
Bangalore
02.2022 - Current
  • Data onboarding and Data Migration:
  • Description: The client requirements include to effectively handle the data onboarding and migration projects, also to be responsible for designing and implementing data migration strategies, ensuring the seamless transfer of batch and NRT data from on-premises systems to the cloud using GCP services
  • Also, to monitor the data pipelines, proactively identifying and resolving any issues to ensure uninterrupted data flow
  • Responsibilities:
  • Collaborated with cross-functional teams to design and implement data onboarding and migration strategies on the Google Cloud Platform (GCP)
  • Developed and maintained data pipelines using BigQuery, Databricks, and Kafka, ensuring efficient data ingestion and processing
  • Utilized SQL and Python to transform and cleanse data, ensuring data quality and accuracy
  • Implemented version control using GitLab, enabling seamless collaboration and code management
  • Worked closely with stakeholders to understand business requirements and translate them into technical solutions
  • Actively participated in Agile ceremonies, including daily stand-ups, sprint planning, and retrospectives, to ensure timely delivery of projects
  • Assisted in the migration of on-premises data to the cloud, leveraging GCP services
  • Developed and optimized SQL queries to extract, transform, and load data from various sources into BigQuery
  • Collaborated with the development team to design and implement data models for efficient data storage and retrieval
  • Conducted performance tuning and optimization of data pipelines to enhance overall system efficiency
  • Assisted in troubleshooting and resolving data-related issues, ensuring minimal downtime.

Data Engineer

Capgemini Technology Services India Limited
Bangalore
08.2017 - 02.2022
  • Lake ETL Projects:
  • Description: Client wants to use the evolutive and configurable BIG DATA infrastructure to improve the data workflow
  • Hadoop data lake is used as a central location to hold the large amount of data in its native, raw format, as well as a way to organize large volumes of highly diverse data
  • Compared to a hierarchical data warehouse which stores data in files or folders and wants to use a flat architecture to store the data and then perform the ETL operations and generate the business outputs for data science / reporting purpose
  • Responsibilities:
  • Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns
  • Implemented the complete Big data pipeline with real-time processing
  • Analyzing client data using Pyspark, spark SQL and define an end to end data lake presentation towards the team
  • Worked as a Big Data Engineer in the Data Lake Team for Automobile Client where the Client wanted to store, process & manage the huge amount of daily data in day to day operations collected from various source such as xml,json and csv
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning
  • To meet specific business requirements wrote UDF’s in Pyspark
  • Involved in ETL architecture enhancements to increase the performance using query optimizer
  • Analyzing and Processing client data using Pyspark, spark SQL in parquet file format and define an end to end data lake presentation towards the team
  • Design and develop spark job with Pyspark to implement end to end data pipeline for batch processing
  • Responsible for maintaining quality reference data in Data Lake by performing operations such as cleaning, transformation and ensuring Integrity .

Education

Bachelors of Computer Applications -

Imperial College, Berhampur university
06.2014 - 05.2017

Skills

Proficient in Big data Techniques(Hadoop,PySpark,MapRe duce,Scala,Hive,MySQL)undefined

Certification

2023- Data Engineering with Google Cloud Professional Certificate from Coursera

Timeline

Module Lead

LTIMINDTREE
02.2022 - Current

Data Engineer

Capgemini Technology Services India Limited
08.2017 - 02.2022

Bachelors of Computer Applications -

Imperial College, Berhampur university
06.2014 - 05.2017
Sonam MahankudoGCP Data Engineer