Summary
Overview
Work History
Education
Skills
Personal Information
Accomplishments
Domains
Overall Experience
Timeline
Generic

Ramesh Sunkaraboina

Hyderabad

Summary

Data Processing Proficiency: Demonstrated expertise in constructing and maintaining large-scale data pipelines, with practical experience in handling petabyte-scale batch and streaming data using Apache Spark and Apache Pulsar. Cloud Platforms Mastery: Skilled practitioner in Google Cloud Platform (GCP) services including BigQuery, Dataflow, Pub/Sub, and Cloud Storage, with proficiency in ETL/ELT processes, data modeling, and real-time data streaming. Data Pipeline Architecture: Proven track record in designing and constructing data pipelines, ensuring data optimization through cleaning, transformation, and management across various cloud sources such as AWS, GCP, Redshift, and Azure. Data Storage Solutions: Proficient in the design and implementation of sophisticated Data Warehouses, Data Lakes, and Data Lakehouses, strictly adhering to Data Warehousing Principles. Comprehensive Core Competencies: Well-versed in Data Engineering and SQL Development, with particular emphasis on ELT/ETL processes and Apache Spark utilization. Project Execution and Management: Successfully built end-to-end pipelines, integrating sources like Teradata and Oracle, and embracing methodologies such as Scrum for agile deployment. Data Quality and Performance Optimization: Initiated data quality checks at the initial stages, implemented data consistency validations across systems, and finetuned dataset performance. Transformation and Flow Management: Devised and executed ELT pipelines, overseeing 10+ pipelines and 20+ workflows, directed towards streamlining cleaning and transformation, thereby enhancing data accessibility. Continuous Integration and Deployment: Managed release cycles using Jenkins, resolving build issues promptly and ensuring on-time delivery of commitments. Technological Agility: Adept in leveraging various programming languages and cloud technologies for data processing, analytics, orchestration, and quality assurance such as Python, PySpark, ELT/ETL, Hadoop, and more.

Overview

7
7
years of professional experience

Work History

Senior Data Engineer

Synechron Technologies Pvt Ltd
08.2024 - Current
  • The project involved managing massive amounts of data generated from various retail source systems such as Teradata, Oracle, and CDP. This data originated from POS systems, e-commerce platforms, supply chains, and customer interactions. Efficient management of this data was critical for optimizing inventory, which required building scalable, reliable, and cost-effective data pipelines for processing and analyzing data using Google Cloud Dataproc (a managed Apache Spark & Hadoop service) and Apache Airflow (or Cloud Composer for orchestration).
  • Built pipelines from various sources, including Teradata, Oracle, Minerva, and CDP.
  • Implemented data quality checks at the initial stage while bringing raw data to GCP storage buckets.
  • Applied schema changes and validated data consistency across systems.
  • Implemented data cleaning, deduplication, and enrichment logic for retail datasets (e.g., sales, inventory, customer behavior).
  • Managed the Portfolio table across all platforms and core IDs.
  • Performed performance tuning on necessary datasets.
  • Built end-to-end pipelines to bring data from CDP to GCP and made necessary transformations.
  • Created BigQuery views on top of Hive tables and adjusted them accordingly.
  • Managed backdated updates from source to target in Hive tables and performed data validations.
  • Provided production support for the pipeline builds during month-end processes for a smooth transition to the derived layer team.
  • Made necessary changes in the standard layer according to requirements.
  • Resolved TeamCity and Jenkins build issues.
  • Participated in continuous releases using Jenkins.
  • Environment: Teradata, Oracle, CDP, Google Cloud Dataproc, Apache Airflow, BigQuery, Hive, Jenkins, TeamCity.

Data Engineer

The Modern Data Company
07.2022 - 07.2024
  • Developed and optimized a data processing platform for a retail company using Apache Spark, handling both batch and real-time data. Streamlined the processing of transactional and customer data from various sources, including databases, cloud storage, and APIs. Designed and implemented scalable ETL pipelines to ensure efficient data transformation and loading into analytics platforms. Improved data accuracy and reporting speed, supporting better decision-making and enhancing customer experience across the retail network.
  • Designed and constructed an end-to-end Data Lakehouse independently, following Medallion Architecture principles, utilizing tools powered by Apache Iceberg.
  • Processed a large volume of batch and streaming data daily from diverse sources, including Amazon Redshift, Amazon S3, and REST APIs, for the data pipeline.
  • Designed and developed ELT pipelines for processing extensive volumes of both batch and streaming data, reaching into the petabyte range, leveraging the capabilities of Apache Spark and Apache Pulsar.
  • Established over 10 pipelines and 20 workflows to streamline data cleaning and transformation processes, channeling output into Amazon S3 buckets and making it accessible to the Search Portal serving over 1 million customers.
  • Analyzed logs in Splunk to debug issues.
  • Addressed and implemented 50+ change requests and bug fixes, ensuring the seamless operation of the data pipeline.
  • Participated in continuous releases using Jenkins.
  • Responsible for requirements analysis, technical design, implementation, testing, and documentation.
  • Ensured on-time delivery of reports as per defined timelines.
  • Environment: PySpark, Python, Scala, Data Integration, Metadata Management, Data Lineage, Teradata, Oracle, SQL, Spark, Hive, Git, CI/CD, Hadoop, ETL, Data Modeling, Data Quality, Extraction, Performance Tuning.

Associate Data Engineer

National Institute of Indian Medical Heritage (NIIMH)
07.2018 - 07.2022
  • Developed a real-time data processing system for a healthcare organization using Apache Spark and Kafka, enabling the integration and analysis of patient data from various sources, including electronic health records (EHR), IoT devices, and lab results. Designed ETL pipelines to ensure timely and accurate data transformation, supporting predictive analytics and real-time monitoring of patient health. Implemented robust data cleaning and validation processes to maintain data integrity and compliance with healthcare regulations. Enhanced decision-making capabilities, improving patient outcomes and operational efficiency across the healthcare network.
  • Involved in requirement gathering, design, and deployment of the application using Scrum (Agile) as the development methodology.
  • Developed Hive SQL queries, mappings, and tables for analysis across different banners, and worked on partitioning, optimization, compilation, and execution.
  • Implemented Spark using Scala for faster processing of data.
  • Utilized batch processing in Spark to improve performance.
  • Imported data from various sources into Spark RDD for processing.
  • Used Sqoop to import data from RDBMS to Hadoop.
  • Created Hive target tables to hold the data after all ETL operations using HQL.
  • Employed Cloudera Manager for the installation and management of the Hadoop cluster.
  • Gained experience in working with Spark SQL for processing data in the Hive tables.
  • Involved in Amazon Web Services (AWS) EMR and S3 for data processing and storage.
  • Environment: Apache Spark, Kafka, Hive SQL, Scala, Sqoop, Cloudera Manager, AWS EMR, S3.

Technical Consultant

Symbioun Technologies
06.2018 - 07.2019
  • Company Overview: Developed and maintained a user-friendly support interface, enhancing customer interaction and issue resolution efficiency. Implemented real-time chat features, automated ticketing systems, and integrated knowledge bases to streamline support processes and improve user satisfaction.
  • Gathered functional and technical requirements for the application.
  • Made necessary changes to the application that improved its performance, modifying workflow components like forms and links as needed.
  • Adjusted the user interface according to requirements.
  • Developed the client-side using Angular 6 and the server-side using PHP and MySQL.
  • Hired technical personnel for various projects.
  • Developed and maintained a user-friendly support interface, enhancing customer interaction and issue resolution efficiency. Implemented real-time chat features, automated ticketing systems, and integrated knowledge bases to streamline support processes and improve user satisfaction.

Education

Bachelor of Technology - Computer Science & Engineering

Jawaharlal Nehru Technological University

Skills

  • SQL
  • PL/SQL
  • BigQuery
  • Teradata
  • Oracle
  • Python
  • PySpark
  • Apache Spark
  • DataProc
  • ELT/ETL
  • Data Modeling
  • Hadoop
  • GCP
  • Microsoft Azure
  • CDP (Cloudera Data Platform)
  • Airflow
  • Docker
  • Apache Pulsar
  • Data Lakehouse
  • Databases
  • FastAPI
  • Soda Data
  • Quality Framework
  • OpenMetadata
  • Benthos
  • Git
  • Unix

Personal Information

Title: Data Engineer

Accomplishments

Received Star Award at Modern Data

Domains

  • Banking
  • Insurance
  • HealthCare
  • Retail

Overall Experience

6 Years 9 Months

Timeline

Senior Data Engineer

Synechron Technologies Pvt Ltd
08.2024 - Current

Data Engineer

The Modern Data Company
07.2022 - 07.2024

Associate Data Engineer

National Institute of Indian Medical Heritage (NIIMH)
07.2018 - 07.2022

Technical Consultant

Symbioun Technologies
06.2018 - 07.2019

Bachelor of Technology - Computer Science & Engineering

Jawaharlal Nehru Technological University
Ramesh Sunkaraboina