Summary
Overview
Work History
Education
Skills
Work Availability
Quote
Timeline
OperationsManager

NAGIREDDY RAVIKUMAR

Bengaluru

Summary

  • 2.5+ years of experience in IT, which includes experience in Big data Technologies, Hadoop ecosystem, Spark Framework and Currently working on Spark framework extensively using Scala as the main programming dialect
  • Good understanding of Hadoop architecture and various components such as HDFS, Job tracker, Task Tracker, Name Node and Data Node.
  • Developed Spark applications using Scala, Spark SQL, Spark RDD and Spark Data Frame API for data cleaning and processing tasks
  • Good Knowledge in loading the data from SQL Server and MySQL databases to HDFS system using SQOOP (Structured Data) .
  • Experience with AWS components like AWS EMR, EC2 instances, S3 buckets.
  • Decent familiarity with UNIX/Linux systems, including the ability to understand the interaction between applications and the operating system.
  • Hands-on experience in application development leveraging Scala, Spark, Hive, and Sqoop, with basic proficiency in Python for specific tasks within Spark jobs.
  • Detailed understanding of Software Development Life Cycle (SDLC) and sound knowledge of project implementation methodologies including Scrum and Agile.


Apache Spark

  • Created Data Frames and performed analysis using Spark SQL.
  • Hands on expertise in writing different RDD (Resilient Distributed Datasets) transformations and actions using Scala and Python.
  • Excellent understanding of Spark Architecture and framework, Spark Session, Spark Context, APIs, RDDs, Spark SQL, Data frames.
  • Experienced in optimizing Spark Jobs performance by tuning various configuration settings, such as memory allocation, caching, and serialization.

Apache Sqoop

  • Used Sqoop to Import data from Relational Database (RDBMS) into HDFS and Hive, storing using different formats like Text, Avro, Parquet, Sequence File, ORC File along with compression codes like Snappy and Gzip.
  • Performed transformations on the imported data and Exported back to RDBMS.

Apache Hive

  • Experience in writing queries in HQL (Hive Query Language), to perform data analysis.
  • Created Hive External and Managed Tables.
  • Implemented Partitioning and Bucketing on Hive tables for Hive Query Optimization.

Overview

2
2
years of professional experience

Work History

Data Engineer

Ipsos Research Pvt. Ltd
03.2023 - 09.2023
  • Collaborated with data modeling teams, stakeholders, and data analysts to comprehend data requirements and translate them into technical specifications and structured data representations.
  • Developed Spark applications in Scala for performing data cleansing, event enrichment, data aggregation, and data preparation to meet business requirements.
  • Implemented data quality checks and validation processes to ensure accuracy, consistency, and completeness of data.
  • Worked on various data formats like AVRO, Sequence File, JSON, Parquet and XML.
  • Worked on fine-tuning spark applications to improve the overall processing time for the pipelines.
  • Created Hive tables, loaded with data, and wrote Hive queries to process the data. Created Partitions and used Bucketing on Hive tables and used required parameters to improve performance.
  • Debugged common issues with Spark RDDs and Data Frames, resolved production issues, and ensured seamless data processing in production environments.
  • As per the business requirement storing the spark processed data in HDFS/S3 with appropriate file formats.
  • Performed Import and Export of data into HDFS and Hive using Sqoop and managed data within the environment.
  • Created EC2 instances and EMR clusters for Spark Code development and testing.
  • Performed step execution in EMR clusters for the job deployment as per requirements.
  • Used Agile Scrum methodology/ Scrum Alliance for development.

Environment: Apache Spark, Spark SQL, Scala, Hive, Hadoop, Sqoop, HDFS, AWS.

Big Data Engineer

TATA Consultancy Services Pvt Ltd
06.2021 - 02.2023
  • Involved in loading the data into HDFS from different Data sources like SQL Server, AWS S3 using Sqoop and load into Hive tables.
  • Involved in creating Hive tables, loading data from different data sources, HDFS locations and other hive tables.
  • Created SQOOP jobs and Scheduled them to handle incremental loads from RDBMS into HDFS and applied Spark transformations.
  • Created Hive external tables to perform ETL on data that is generated on daily basis.
  • Developed Spark code in Scala and Python(Pyspark) and deployed it in AWS EMR.
  • Was responsible for Optimizing Spark SQL and HIVE queries that helped in saving Cost to the project.
  • Worked in monitoring, managing, and troubleshooting the Hadoop and Spark Log files.
  • Worked on Hadoop within Cloudera Data Platform and running services through Cloudera manager.
  • Involved in Agile methodologies, daily Scrum meetings, Sprint planning.

Environment: Apache Spark, Spark SQL, Scala, Hive, Hadoop, Sqoop, HDFS, AWS, Python, Pyspark.

Education

Bachelor of Science - Computer Science

Kalasalingam University
Krishnankoil, Srivilliputhur
05.2021

Skills

Big Data Technologies: Hadoop, HDFS, Hive, Sqoop, Apache Spark, Spark-SQL, PySpark(Basic)

Hadoop Distribution: Cloudera, Apache, AWS

Languages: Scala, SQL, Python

Operating Systems: Windows, LINUX

Version Control: Git, Bitbucket

IDE & Build Tools: Eclipse, Intellij, Notepad

Databases: MySql, SQL Server

Work Availability

monday
tuesday
wednesday
thursday
friday
saturday
sunday
morning
afternoon
evening
swipe to browse

Quote

I have not failed. I’ve just found 10,000 ways that won’t work.
Thomas Edison

Timeline

Data Engineer

Ipsos Research Pvt. Ltd
03.2023 - 09.2023

Big Data Engineer

TATA Consultancy Services Pvt Ltd
06.2021 - 02.2023

Bachelor of Science - Computer Science

Kalasalingam University
NAGIREDDY RAVIKUMAR