Summary
Overview
Work History
Education
Skills
Websites
Timeline
Certification
Generic

Ankush Patil

Pune

Summary

Data engineer with expertise in ETL development, Big Data technologies, Databricks, and excellent knowledge of SQL.

Overview

9
9
years of professional experience

Work History

Data Engineer

HSBC
Pune
12.2023 - Current
  • Designed and implemented a real-time data streaming pipeline using Apache Kafka and Databricks
  • Read data from Kafka topics, performed data transformations, and provided the processed data as input to an ML model to predict hardware failures
  • Enabled real-time decision-making by integrating the pipeline with machine learning prediction workflows
  • Worked on a data democratization initiative to make data accessible, secure, and reusable across tenants
  • Created reusable data pipelines using Databricks and Delta Lake, providing self-service capabilities for users
  • Designed role-based access controls and governance policies to ensure data privacy and compliance

Data Engineer

Amdocs Development Center India (DVCI)
Pune
04.2018 - 12.2023
  • Designed and implemented data pipelines using Big Data technologies like Hadoop, Sqoop, Hive, and HDFS
  • Developed ETL processes to efficiently extract, transform, and load data from multiple sources into Hadoop-based data lakes
  • Utilized Python and PySpark to perform complex data transformations, data cleansing, and aggregations
  • Implemented Kafka for real-time data streaming and event-driven architectures, enabling efficient data ingestion and processing
  • Collaborated with data scientists and analysts to provide curated datasets and support data-driven insights and analytics
  • Conducted data validation and verification to ensure data accuracy, consistency, and compliance with business rules
  • Implemented data governance practices and security measures to protect sensitive data and maintain data privacy
  • Designed and developed a custom tool for Informatica workflow comparison, enabling efficient tracking and identification of differences between different workflow versions
  • Created a data validation reconciliation tool using PySpark, enabling seamless validation of data consistency between ORACLE/MYSQL/PG databases and HIVE/HBase in the Big Data ecosystem
  • Developed an extract tool utilizing PySpark to read data from HDFS files based on epoch time, providing a flexible and efficient method for data extraction and loading into target systems

ETL Developer

Alignbiz Technologies
Bangalore
04.2016 - 05.2018
  • Company Overview: Contract with IBM for Idea Project
  • Collaborated with cross-functional teams to gather requirements and design ETL processes using Informatica and DataStage
  • Created and optimized SQL queries to extract, transform, and load data from various sources into target data warehouses
  • Implemented shell scripts for job automation, scheduling, and error handling to improve ETL process efficiency
  • Troubleshot and resolved data quality issues, ensuring data integrity and consistency
  • Actively participated in code reviews, ensuring adherence to best practices and quality standards
  • Designed and developed a custom tool for Informatica workflow comparison, enabling efficient tracking and identification of differences between different workflow versions for retrofit purposes
  • Contract with IBM for Idea Project

Education

Bachelor of Engineering -

Skills

  • Python
  • Apache Spark
  • Databricks
  • Hadoop
  • Hive
  • Apache Kafka
  • MySQL
  • HBase
  • Oracle
  • PostgreSQL
  • DataStage
  • Informatica
  • Git
  • Jira
  • Confluence
  • Azure Databricks
  • Delta Lake

Timeline

Data Engineer

HSBC
12.2023 - Current

Data Engineer

Amdocs Development Center India (DVCI)
04.2018 - 12.2023

ETL Developer

Alignbiz Technologies
04.2016 - 05.2018

Bachelor of Engineering -

Certification

  • Microsoft Certified, Azure Data Engineer Associate - Microsoft.
  • Databricks Certified Data Engineer Professional
  • Databricks Certified Data Engineer Associate
Ankush Patil