Summary

Overview

Work History

Education

Skills

Timeline

Praveen Kumar Godati

Hyderabad

Summary

Big Data Engineer with 5 years of experience in Apache Spark (RDD, DataFrame, SQL), Hive, Sqoop, and HDFS. Expert in PySpark development, Hive performance tuning, AWS EMR deployment, and schema evolution with Avro. Skilled in integrating data from relational systems using Sqoop, automating ETL pipelines, and debugging distributed workflows. Experienced with file formats like Parquet, ORC, Avro, and orchestrating workflows using AWS Step Functions.
Experienced data engineer specializing in Apache Spark, with strong proficiency in RDDs, DataFrames, and SQL APIs using Python.
Having almost 5 years of experience in design and development of big data ecosystem
Skilled at optimizing Spark performance through memory tuning, efficient partitioning, and advanced serialization techniques.
Proficient in integrating Spark with big data ecosystems such as Hadoop, Hive, and Kafka to build end-to-end scalable data solutions.
Adept at processing a variety of data formats including Avro, Parquet, ORC, and JSON in both batch and semi-structured workflows.
Hands-on experience deploying data pipelines on AWS EMR and Amazon S3, with a focus on fault tolerance, cost efficiency, and large-scale processing.
Strong track record in implementing caching, persistency, and transformation strategies to support real-time analytics and machine learning pipelines.
Experienced in production troubleshooting, performance monitoring, and applying Spark best practices to maintain reliable distributed applications.
Built scalable, distributed data pipelines using PySpark on AWS EMR to process high-volume datasets stored in AWS S3.
Integrated AWS Step Functions to orchestrate multi-stage PySpark workflows, enabling automation, monitoring, and error handling across batch jobs.
Proficient in SQL queries and scripting for data validation and verification during ETL testing.
Knowledgeable about metadata management and testing metadata-driven ETL processes.

Overview

5

5

years of professional experience

Work History

Big data developer

Wipro

03.2023 - Current

Optimized Spark jobs for performance and resource utilization.
Implemented Spark SQL queries for data querying and aggregation.
Collaborated with data scientists to integrate machine learning models into Spark pipelines.
Developed custom Spark functions for complex data transformations.
Managed ETL processes with PySpark running on AWS EMR, utilizing AWS S3 for storage.
Configured Spark jobs on AWS EMR to efficiently read and write data from AWS S3.
Experienced in identifying and resolving performance bottlenecks in Hive, such as data skew, inefficient joins, and excessive shuffling.
Expertise in using Hive explain plans, query profiling, and metrics monitoring to diagnose query performance issues and optimize query execution.
Proficient in performing data validation and cleansing during data transfer using Sqoop's validation and cleansing options.
Adept in scheduling and automating Sqoop jobs for incremental runs.
Technologies: AWS S3, EMR,EC2, Apache Spark (RDD, DataFrame, Spark SQL), Parquet, Avro, ORC, Protobuf, JSON, CSV

Data engineer

Novartis

11.2020 - 01.2023

Debugged Spark jobs running on AWS EMR, identifying performance bottlenecks.
Implemented distributed data processing using PySpark on AWS EMR for batch workflows.
Developed custom data transformations in PySpark, leveraging AWS S3 for storage.
Automated Spark job executions on AWS EMR clusters using AWS Step Functions.
Tuned PySpark performance on AWS EMR by adjusting resource configurations.
Executed large-scale data processing using PySpark and AWS Hive on AWS EMR.
Designed Spark jobs to process data stored in AWS S3 and AWS Hive.
Used AWS Step Functions to manage and monitor PySpark job execution workflows.
Proficient in optimizing Hive query performance by tuning various configuration settings, such as memory allocation, parallelism, and compression.
Technologies: Hadoop, HDFS, Hive, Sqoop, Spark SQL

Education

MTech -

Birla Institute of Technology and Sciences

Pilani

12.2024

B.Sc -

Adikavi Nannaya University

10.2020

Skills

Data Eco System: Hadoop, Sqoop, Hive, Spark3
Cloud Skills: AWS,Azure
Databases: MS SQL Server, MySQL

Languages: Python, SQL, UNIX Shell Script
Operating Systems: Linux and Windows
ETL & Workflow Orchestration: Sqoop, Apache NiFi, AWS Step Functions, Bash scripting

Timeline

Big data developer

Wipro

03.2023 - Current

Data engineer

Novartis

11.2020 - 01.2023

MTech -

Birla Institute of Technology and Sciences

B.Sc -

Adikavi Nannaya University

Similar Profiles

Anand JatavAnand Jatav
Principal ABAP Consultant at WIPROPrincipal ABAP Consultant at WIPRO
Jacquelyn KennedyJacquelyn Kennedy
Prior Authorization Coordinator at WiproPrior Authorization Coordinator at Wipro
Ankan BeraAnkan Bera
Application Architect at WiproApplication Architect at Wipro
NALAM KRISHNA PRASADNALAM KRISHNA PRASAD
Automation Tester at WiproAutomation Tester at Wipro
Ruchitha ReddyRuchitha Reddy
Assistant Branch Head at ESAF Small Finance BankAssistant Branch Head at ESAF Small Finance Bank