Data engineer with expertise in ETL development, Big Data technologies, Databricks, and excellent knowledge of SQL.
Overview
9
9
years of professional experience
Work History
Data Engineer
HSBC
Pune
12.2023 - Current
Designed and implemented a real-time data streaming pipeline using Apache Kafka and Databricks
Read data from Kafka topics, performed data transformations, and provided the processed data as input to an ML model to predict hardware failures
Enabled real-time decision-making by integrating the pipeline with machine learning prediction workflows
Worked on a data democratization initiative to make data accessible, secure, and reusable across tenants
Created reusable data pipelines using Databricks and Delta Lake, providing self-service capabilities for users
Designed role-based access controls and governance policies to ensure data privacy and compliance
Data Engineer
Amdocs Development Center India (DVCI)
Pune
04.2018 - 12.2023
Designed and implemented data pipelines using Big Data technologies like Hadoop, Sqoop, Hive, and HDFS
Developed ETL processes to efficiently extract, transform, and load data from multiple sources into Hadoop-based data lakes
Utilized Python and PySpark to perform complex data transformations, data cleansing, and aggregations
Implemented Kafka for real-time data streaming and event-driven architectures, enabling efficient data ingestion and processing
Collaborated with data scientists and analysts to provide curated datasets and support data-driven insights and analytics
Conducted data validation and verification to ensure data accuracy, consistency, and compliance with business rules
Implemented data governance practices and security measures to protect sensitive data and maintain data privacy
Designed and developed a custom tool for Informatica workflow comparison, enabling efficient tracking and identification of differences between different workflow versions
Created a data validation reconciliation tool using PySpark, enabling seamless validation of data consistency between ORACLE/MYSQL/PG databases and HIVE/HBase in the Big Data ecosystem
Developed an extract tool utilizing PySpark to read data from HDFS files based on epoch time, providing a flexible and efficient method for data extraction and loading into target systems
ETL Developer
Alignbiz Technologies
Bangalore
04.2016 - 05.2018
Company Overview: Contract with IBM for Idea Project
Collaborated with cross-functional teams to gather requirements and design ETL processes using Informatica and DataStage
Created and optimized SQL queries to extract, transform, and load data from various sources into target data warehouses
Implemented shell scripts for job automation, scheduling, and error handling to improve ETL process efficiency
Troubleshot and resolved data quality issues, ensuring data integrity and consistency
Actively participated in code reviews, ensuring adherence to best practices and quality standards
Designed and developed a custom tool for Informatica workflow comparison, enabling efficient tracking and identification of differences between different workflow versions for retrofit purposes