A seasoned Lead Data Engineer with a proven track record of driving results, bringing 9 plus years of hands-on experience, seeking to leverage expertise in data engineering and cloud technologies to drive innovation and deliver impactful solutions.
Overview
9
9
years of professional experience
5
5
Certification
Work History
Lead Data Engineer
phData
12.2022 - Current
Created automations and frameworks for the seamless migration and automated testing of thousands of IICS task flows and Teradata BTEQs to Snowflake ETL.
Designed architecture and data flow for modern tech stack in Snowflake.
Presented regular demos and POC enhancements to stakeholders.
Built and optimized Snowflake , dbt and Fivetran ecosystem from ground zero.
Developed a governance framework for data masking and RBAC controls on PII/PHI datasets.
Played an integral role in establishing security policies, private links, and implementing a Project-Administration framework for managing project resources on Snowflake with efficiency.
Snr. Data Engineer
phData
01.2021 - 11.2022
Assisted in the migration of several business streams from CDH to Azure
Auto generated and validated thousands of metadata files for an internal framework.
Extensively utilized ADF and Databricks for ingestion and processing of discrete datasets from various sources
Employed Python to design customized graphs for analyzing and migrating Control-M schedules and their dependencies as a part of automation efforts.
Data Engineer
phData
03.2019 - 01.2021
Built end-to-end data pipelines with streamsets to efficiently consume kafka messages, performed appropriate transformations, and publish the processed data to different destinations such as HDFS and Kudu.
Utilized Deequ, a data quality verification tool, to establish data constraints and metrics.
Utilized Scala-Spark to develop efficient bulk load and transformation jobs
Implemented and optimized CICD processes utilizing Jenkins and Rundeck.
Fine-tuned Hadoop applications to enhance performance and throughput, while assisting in debugging streamsets and cluster-related issues.
I.T. Analyst
Tata Consultancy Services
10.2015 - 03.2019
Implemented and productionized different machine learning paradigms in Spark
Collaborated closely with solution architects and data scientists to optimize project efficiency
Streamlined the operations of complex distributed systems by applying best practices in Spark, Hive, Impala, and MapReduce.
Demonstrated expertise in handling complicated data arrangement in Hbase and Teradata
Worked with administration/application teams to troubleshoot and debug any Hadoop ecosystem related run time issues.