Summary

Overview

Work History

Education

Skills

Certification

Timeline

Chaithra N V

Hyderabad,TG

Summary

I am a Software Engineer with 5 years of extensive experience in Big Data, specializing in the Hadoop ecosystem, including Pyspark, Python, SQL, HDFS, Hive, Impala, Sqoop, and Oozie. My expertise spans key verticals like BFSI and pharmaceutical industries.

Overview

years of professional experience

Certification

Work History

Data Engineer

Deloitte

01.2024 - Current

Developed a comprehensive data pipeline using PySpark in Azure Databricks to streamline the process of loading, transforming, and managing data. The pipeline was designed to load CSV file data into Delta tables, perform member matching to identify active records, implement Change Data Capture (CDC) techniques, and finally, load the refined data into SQL Server for downstream analytics and use cases.

Roles and Responsibilities:
Pipeline Development: Designed and implemented a robust data pipeline using PySpark in Azure Databricks to handle large-scale data processing tasks.
Data Ingestion: Loaded CSV file data into Delta tables, ensuring data consistency, reliability, and optimized storage.
Data Transformation: Conducted member matching processes to filter and extract active member records, ensuring data relevance and accuracy.
Change Data Capture (CDC): Applied CDC techniques to Delta tables to efficiently track and manage incremental data changes, maintaining an up-to-date dataset.
Data Integration: Loaded the processed and refined data into SQL Server, enabling seamless integration for downstream analytics and business intelligence use cases.
Optimization and Performance Tuning: Ensured the pipeline was optimized for performance, handling large datasets efficiently and minimizing processing time.
Collaboration: Worked closely with data analysts, engineers, and stakeholders to understand requirements and deliver solutions that meet business needs.

Data Engineer

Accenture

08.2022 - 01.2024

The data engineering process begins by ingesting data from SharePoint to Foundry using Pyspark, Python, and SQL for efficient file transformation
Raw files are converted into a structured dataset, enabling downstream analysis
Meaningful transformations enhance data quality, providing valuable insights for informed decision-making, and empowering organizations to make data-driven choices, driving business growth and success.

Associate System Engineer

TCS

02.2020 - 09.2021

This project involves data ingestion from multiple databases (Oracle, DB2, MySQL, Netezza) using incremental and truncate load methods
Truncate load utilizes Sqoop to ingest full chunks of data into the raw layer, followed by transformations and loading into Hive through Oozie
Incremental load captures max incremental column value, ingesting newer records only
Automated processing handles varying data volumes and source system changes.

Education

B.E, Electronics and Communication -

Dr. Ambedkar Institute of Technology, Bangalore

06.2019

Skills

Big Data Ecosystems:Pyspark, HDFS, Hive, Sqoop, Oozie, Impala, BigQuery
Databases: MySQL

Programming Languages: Python
Cloud: Azure, Databricks

Certification

Data Engineer Associate from Databricks
Azure Fundamentals from Microsoft

Timeline

Data Engineer

Deloitte

01.2024 - Current

Data Engineer

Accenture

08.2022 - 01.2024

Associate System Engineer

TCS

02.2020 - 09.2021

B.E, Electronics and Communication -

Dr. Ambedkar Institute of Technology, Bangalore

Chaithra N V

Summary

Overview

Work History

Data Engineer

Data Engineer

Associate System Engineer

Education

B.E, Electronics and Communication -

Skills

Certification

Timeline

Data Engineer

Data Engineer

Associate System Engineer

B.E, Electronics and Communication -

Similar Profiles

Mudavath VigneshMudavath Vignesh

Aishwarya BasaniAishwarya Basani

Rishi BaruaRishi Barua

PRIYANKA PAWAR VADAIGHARPRIYANKA PAWAR VADAIGHAR

Supriyo BardhanSupriyo Bardhan