Summary
Overview
Work History
Education
Skills
Languages
Timeline
Generic

Madhu Sudhan Patti

Summary

A competent professional with 8 years of expertise in Hadoop with Spark and it’s Ecosystems – mainly in Hadoop, Sqoop, Hive, Spark, PySpark, Python, Scala, AWS and Elastic Search

Overview

8
8
years of professional experience

Work History

Data Engineer

IBM
06.2024 - Current
  • Migrated the India jobs from SG on-prem servers to GCP India server as per RBI guidelines
  • Developed the new jobs in UAT environment and performed data enrichments such as filtering, aggregation using Spark, PySpark and Hive as per business requirement in Sparkola tool
  • Enable the jobs in Airflow and validated the regular runs in jobserver
  • Deployed the jobs in UAT, QA and PROD environments
  • Validated the jobs in Jobserver and dependencies in Airflow

Senior Project Engineer

Wipro Limited
10.2021 - 06.2024
  • Creating the RDDs, DFs for the required input data and performed the data transformations, actions using spark-core and Spark-Data Frames
  • Build data pipelines that are scalable, repeatable, and secure, and can serve multiple purposes
  • Constructing a state-of-the-art data lake on AWS using EMR, Spark, Step Functions, CloudWatch Events
  • Experience in usage of Amazon EMR for processing Big Data across a Hadoop cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3)
  • Experience on Spark Architecture including Spark Core, RDD, Data Frames, Data Sets, Spark SQL, Spark Streaming, and experience in importing the data from source HDFS into Spark RDD for in-memory data computation to generate the output response
  • Hands on experience Using Hive Tables by Spark, performing transformations and Creating Data Frames on Hive tables using SparkSQL
  • Experience in converting Hive/SQL queries into RDD transformations using Spark, Scala
  • Worked with Apache Spark components which provides fast and general engine for large data processing
  • Migrated an existing on-premises application to AWS
  • Designed, Built, and Deployed multiple applications utilizing the AWS stack EC2, S3, EMR focusing on high-availability, fault tolerance, auto-scaling
  • Designed and built custom ETL processes in AWS using Lambda functions, EMR Clusters reducing the cost overhead for the client
  • Developed and maintained automated CI/CD pipeline for the code deployment
  • With the help of this it is easy to manage already existing infrastructure and complex change- sets can be applied to infrastructure with minimal human interaction and thereby avoids many possible human errors
  • This was achieved using technologies such as Terraform, Jenkins, GitHub, AWS CICD services
  • Provide daily monitoring, management, troubleshooting and issue resolution to systems and services hosted on cloud resources
  • Developed the Spark code to perform data enrichments and calculations as per business requirement
  • Worked on performance optimization of various ecosystems such as Hive, Sqoop, Spark, Elastic Search and Kibana
  • Performed Data enrichments such as filtering, sorting and aggregation using Spark and Hive
  • Loading fact tables into Elastic search for visualization through Kibana
  • Created Dashboards and Visualizations in Kibana as per business requirements to monitor the day to day over changes in data

Software Engineer

EPAM Systems
05.2021 - 10.2021
  • Creating the RDDs, DFs for the required input data and performed the data transformations, actions using spark-core
  • Worked closely with business customers for Requirement gatherings
  • Designed Hive repository with external tables, internal tables, buckets, partitions, and ORC compressions for incremental data load of parsed data
  • Worked on performance optimization of various ecosystems such as Hive and Sqoop, Spark
  • Performed Data enrichments such as filtering, sorting and aggregation using Spark
  • Worked on building the scripts for the resource's creation in the AWS Cloud like step functions Glue Jobs, Lambda Handler etc
  • Hands on experience in building pipelines to implement the Business use case functionalities by performing the transformations

Software Engineer

Optum Global Solutions India PVT Ltd
10.2019 - 05.2021
  • Experience in Designing Hadoop and Spark Applications and recommending the right solutions and technologies for the applications
  • RDBMS Tables have been imported/exported using Sqoop
  • Used Apache Hive to run map reduce jobs on top of this HDFS Data
  • Built distributed in-memory applications using SPARK core and SPARK SQL to do analytics efficiently on huge data sets
  • Experience on creating the RDDs, DFs for the required input data and performed the data transformations, actions using spark-core
  • Worked closely with business customers for Requirement gatherings
  • Developing Sqoop jobs with incremental load from heterogeneous RDBM(Oracle) using native dB connectors
  • Designed Hive repository with external tables, internal tables, buckets, partitions, and ORC compressions for incremental data load of parsed data
  • Experienced in developing Hive Queries on different data formats like Text file, CSV file, Log files
  • Leveraging time-based partitioning yields improvement in performance using HiveQL
  • Created Hive external tables for the data in HDFS and moved data from archive layer to business layer with hive transformations
  • Worked on performance optimization of various ecosystems such as hive and Sqoop
  • Improvising the tuning options using HIVE functions such as Partitioning, Bucketing, Index, CBO etc

Software Engineer

HCL Technologies
05.2017 - 10.2019
  • Used Apache Hive to run map reduce jobs on top of this HDFS Data
  • Built distributed in-memory applications using SPARK and SPARK SQL to do analytics efficiently on huge data sets
  • These applications were built using Spark Scala API and used YARN as resource manager
  • Experience on creating the RDDs, DFs for the required input data and performed the data transformations, actions using spark-core
  • Performed Data Enrichment, cleansing and common data aggregations through RDD transformations
  • Interactive analysis of Hive tables through various data frame operations using SparkSQL
  • Involved in performance optimization of Spark Jobs and designed efficient queries
  • Performed Import and Export of data into HDFS using SQOOP
  • Handled heterogeneous data sources such as Oracle and different file formats
  • Created Sqoop jobs with incremental load to populate Hive External tables
  • Performed Data enrichments such as filtering, sorting and aggregation using Hive
  • Worked on performance optimization of various ecosystems such as hive and Sqoop
  • Improvising the tuning options using HIVE functions such as Partitioning, Bucketing, Index, CBO etc
  • Experienced in developing Hive Queries on different data formats like Text file, CSV file, ORC files and leveraging time-based partitioning yields improvement in performance using HiveQL
  • Used Oozie Scheduler system to automate the pipeline workflow and orchestrate the map reduces jobs that extract the data on a timely manner
  • Getting connected with the onshore team to review the code and validation of final results

Education

B.Tech - Mechanical Engineering

JNTU Anantapur
IN
01.2014

Intermediate - M.P.C

Sri Chaitanya J.R College
IN
01.2010

SSC - SSC

Sri sai baba E.M high school
IN
01.2008

Skills

  • Hadoop
  • Sqoop
  • Hive
  • Spark
  • PySpark
  • Python
  • Scala
  • AWS
  • Elastic Search

Languages

  • English, Very Good
  • Telugu, Fluent

Timeline

Data Engineer

IBM
06.2024 - Current

Senior Project Engineer

Wipro Limited
10.2021 - 06.2024

Software Engineer

EPAM Systems
05.2021 - 10.2021

Software Engineer

Optum Global Solutions India PVT Ltd
10.2019 - 05.2021

Software Engineer

HCL Technologies
05.2017 - 10.2019

B.Tech - Mechanical Engineering

JNTU Anantapur

Intermediate - M.P.C

Sri Chaitanya J.R College

SSC - SSC

Sri sai baba E.M high school
Madhu Sudhan Patti