Summary

Overview

Work History

Education

Skills

Certification

Timeline

Sonam Mahankudo

GCP Data Engineer

Bangalore

Summary

Driven Data Engineer with almost 4 years of experience wrangling big datasets, tackling challenging architectural and scalability problems and having passion for learning and implementing new concepts. Eager to build robust data solutions that lay the groundwork for revealing game-changing insights. Worked on various Big Data and Cloud Technologies such as GCP, Dataflow, Pubsub, Dataproc, HDFS ,YARN ,Hive ,Scala and Pyspark.

Overview

years of professional experience

years of post-secondary education

Certifications

Work History

Module Lead

LTIMINDTREE

Bangalore

02.2022 - Current

Data onboarding and Data Migration:
Description: The client requirements include to effectively handle the data onboarding and migration projects, also to be responsible for designing and implementing data migration strategies, ensuring the seamless transfer of batch and NRT data from on-premises systems to the cloud using GCP services
Also, to monitor the data pipelines, proactively identifying and resolving any issues to ensure uninterrupted data flow
Responsibilities:
Collaborated with cross-functional teams to design and implement data onboarding and migration strategies on the Google Cloud Platform (GCP)
Developed and maintained data pipelines using BigQuery, Databricks, and Kafka, ensuring efficient data ingestion and processing
Utilized SQL and Python to transform and cleanse data, ensuring data quality and accuracy
Implemented version control using GitLab, enabling seamless collaboration and code management
Worked closely with stakeholders to understand business requirements and translate them into technical solutions
Actively participated in Agile ceremonies, including daily stand-ups, sprint planning, and retrospectives, to ensure timely delivery of projects
Assisted in the migration of on-premises data to the cloud, leveraging GCP services
Developed and optimized SQL queries to extract, transform, and load data from various sources into BigQuery
Collaborated with the development team to design and implement data models for efficient data storage and retrieval
Conducted performance tuning and optimization of data pipelines to enhance overall system efficiency
Assisted in troubleshooting and resolving data-related issues, ensuring minimal downtime.

Data Engineer

Capgemini Technology Services India Limited

Bangalore

08.2017 - 02.2022

Lake ETL Projects:
Description: Client wants to use the evolutive and configurable BIG DATA infrastructure to improve the data workflow
Hadoop data lake is used as a central location to hold the large amount of data in its native, raw format, as well as a way to organize large volumes of highly diverse data
Compared to a hierarchical data warehouse which stores data in files or folders and wants to use a flat architecture to store the data and then perform the ETL operations and generate the business outputs for data science / reporting purpose
Responsibilities:
Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns
Implemented the complete Big data pipeline with real-time processing
Analyzing client data using Pyspark, spark SQL and define an end to end data lake presentation towards the team
Worked as a Big Data Engineer in the Data Lake Team for Automobile Client where the Client wanted to store, process & manage the huge amount of daily data in day to day operations collected from various source such as xml,json and csv
Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning
To meet specific business requirements wrote UDF’s in Pyspark
Involved in ETL architecture enhancements to increase the performance using query optimizer
Analyzing and Processing client data using Pyspark, spark SQL in parquet file format and define an end to end data lake presentation towards the team
Design and develop spark job with Pyspark to implement end to end data pipeline for batch processing
Responsible for maintaining quality reference data in Data Lake by performing operations such as cleaning, transformation and ensuring Integrity .

Education

Bachelors of Computer Applications -

Imperial College, Berhampur university

06.2014 - 05.2017

Skills

Proficient in Big data Techniques(Hadoop,PySpark,MapRe duce,Scala,Hive,MySQL)undefined

Certification

2023- Data Engineering with Google Cloud Professional Certificate from Coursera

Timeline

Module Lead

LTIMINDTREE

02.2022 - Current

Data Engineer

Capgemini Technology Services India Limited

08.2017 - 02.2022

Bachelors of Computer Applications -

Imperial College, Berhampur university

06.2014 - 05.2017

Sonam Mahankudo

Summary

Overview

Work History

Module Lead

Data Engineer

Education

Bachelors of Computer Applications -

Skills

Certification

Timeline

Module Lead

Data Engineer

Bachelors of Computer Applications -

Similar Profiles

MANOJ MMANOJ M

Monali MaliMonali Mali

Kamlesh MadanKamlesh Madan

Suraj Prasad DashSuraj Prasad Dash

Sapna MishraSapna Mishra