Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Chaithra N V

Hyderabad,TG

Summary

I am a Software Engineer with 5 years of extensive experience in Big Data, specializing in the Hadoop ecosystem, including Pyspark, Python, SQL, HDFS, Hive, Impala, Sqoop, and Oozie. My expertise spans key verticals like BFSI and pharmaceutical industries.

Overview

5
5
years of professional experience
1
1
Certification

Work History

Data Engineer

Deloitte
01.2024 - Current

Developed a comprehensive data pipeline using PySpark in Azure Databricks to streamline the process of loading, transforming, and managing data. The pipeline was designed to load CSV file data into Delta tables, perform member matching to identify active records, implement Change Data Capture (CDC) techniques, and finally, load the refined data into SQL Server for downstream analytics and use cases.

Roles and Responsibilities:
Pipeline Development: Designed and implemented a robust data pipeline using PySpark in Azure Databricks to handle large-scale data processing tasks.
Data Ingestion: Loaded CSV file data into Delta tables, ensuring data consistency, reliability, and optimized storage.
Data Transformation: Conducted member matching processes to filter and extract active member records, ensuring data relevance and accuracy.
Change Data Capture (CDC): Applied CDC techniques to Delta tables to efficiently track and manage incremental data changes, maintaining an up-to-date dataset.
Data Integration: Loaded the processed and refined data into SQL Server, enabling seamless integration for downstream analytics and business intelligence use cases.
Optimization and Performance Tuning: Ensured the pipeline was optimized for performance, handling large datasets efficiently and minimizing processing time.
Collaboration: Worked closely with data analysts, engineers, and stakeholders to understand requirements and deliver solutions that meet business needs.

Data Engineer

Accenture
08.2022 - 01.2024
  • The data engineering process begins by ingesting data from SharePoint to Foundry using Pyspark, Python, and SQL for efficient file transformation
  • Raw files are converted into a structured dataset, enabling downstream analysis
  • Meaningful transformations enhance data quality, providing valuable insights for informed decision-making, and empowering organizations to make data-driven choices, driving business growth and success.

Associate System Engineer

TCS
02.2020 - 09.2021
  • This project involves data ingestion from multiple databases (Oracle, DB2, MySQL, Netezza) using incremental and truncate load methods
  • Truncate load utilizes Sqoop to ingest full chunks of data into the raw layer, followed by transformations and loading into Hive through Oozie
  • Incremental load captures max incremental column value, ingesting newer records only
  • Automated processing handles varying data volumes and source system changes.

Education

B.E, Electronics and Communication -

Dr. Ambedkar Institute of Technology, Bangalore
06.2019

Skills

  • Big Data Ecosystems:Pyspark, HDFS, Hive, Sqoop, Oozie, Impala, BigQuery
  • Databases: MySQL
  • Programming Languages: Python
  • Cloud: Azure, Databricks

Certification

  • Data Engineer Associate from Databricks
  • Azure Fundamentals from Microsoft

Timeline

Data Engineer

Deloitte
01.2024 - Current

Data Engineer

Accenture
08.2022 - 01.2024

Associate System Engineer

TCS
02.2020 - 09.2021

B.E, Electronics and Communication -

Dr. Ambedkar Institute of Technology, Bangalore
Chaithra N V