Data Engineer with expertise in designing, developing, and optimizing data pipelines, ensuring efficient data flow and high-quality insights. Proficient in SQL, Python, cloud platforms, and big data technologies to support data-driven decision-making.
Overview
6
6
years of professional experience
1
1
Certification
Work History
Data Engineer (HP)
Mphasis
Bengaluru
08.2023 - Current
Built and maintained ETL/ELT pipelines in AWS Databricks, ingesting structured and semi-structured data from S3, SQL/Oracle to Delta tables, Redshift, and Unity Catalog.
Developed common re-usable code to extract data from various sources.
Developed incremental data processing in Databricks using merge, partitioning, and optimized file formats (Delta) to improve performance and reduce costs.
Tuned Spark configurations (shuffle partitions, caching, and auto-scaling clusters) for optimized job execution in AWS Databricks.
Set up real-time monitoring using Splunk dashboards to track data pipeline failures and performance issues.
Debugged and resolved S3 permission issues, job failures, and Databricks notebook errors, ensuring seamless data processing.
Optimized long-running Spark jobs by tuning shuffle partitions, broadcast joins, and caching strategies, reducing execution time and resource consumption.
Collaborated with source teams to handle late-arriving data and schema evolution, ensuring smooth data ingestion.
Migrated Redshift data to Unity Catalog, deprecating legacy Redshift-based access controls and implementing fine-grained permissions using Unity Catalog for enhanced security and governance.
Data Engineer
Cognizant Technology Solutions
Bengaluru
03.2022 - 05.2023
Worked on migration of data from On-prem SQL server to Azure Cloud databases.
Created ADF pipelines to extract data from on premises source systems to azure cloud data lake storage.
Extensively worked on copy activities and implemented the copy behaviours such as flatten hierarchy, preserve hierarchy and Merge hierarchy. Implemented Error Handling concept through copy activity.
Developed Spark notebooks to transform and partition the data and organize files in ADLS. Worked on Azure Data bricks to run Spark-Python Notebooks through ADF pipelines.
Created Linked Services for multiple source systems i.e. Azure SQL Server, ADLS and blob.
Configured the logic apps to handle email notification to the end users and key shareholders with the help of web services activity. Created a dynamic pipeline to handle multiple source extraction to multiple targets; extensively used azure key vaults to configure the connections in linked services.
Configured the logic apps to handle email notification to the end users and key shareholders with the help of web services activity.
Created a dynamic pipeline to handle multiple source extraction to multiple targets; extensively used azure key vaults to configure the connections in linked services.
Developed data bricks notebook to perform data cleaning and transformation on various tables using Spark SQL and Pyspark. Developed and maintained CI/CD pipelines for Azure Data Factory (ADF) pipelines using tools like Azure DevOps and Git.
Data Engineer
Syren technologies
Bengaluru
06.2021 - 02.2022
Worked on data ingestion from excel files and mails into Azure Blob Storage and processed the data using Azure Databricks.
Developed Spark notebooks in Databricks to clean, transform, and partition the data before loading it into the final SQL database in Azure.
Ensured schema validation and data consistency by performing data profiling activities, including checking data types, null values, and anomalies before final loading.
Optimized data transformation logic by implementing efficient joins, caching strategies, and partitioning techniques in Databricks to improve performance.
Monitored and troubleshooted Databricks job failures, ensuring smooth data processing and resolving schema-related issues during ingestion and transformation.
Analyst
Embibe
08.2019 - 06.2021
Gathered data from various sources, such as student information systems, learning management systems, and external data providers, ensuring data integrity and accuracy.
Developed and maintained data ingestion processes to upload raw data from different sources to Azure Blob storage, ensuring timely and reliable data availability.
Performed data mapping from raw data to master data structures, ensuring consistent and standardized data across different systems and databases.
Implemented data cleansing techniques to identify and rectify data quality issues, including data validation, outlier detection, and data transformation.
Conducted data aggregations and summarizations to generate meaningful insights and reports for key stakeholders, supporting data-driven decision-making processes.