Results-oriented Data Engineer with 2.8 years of experience specializing in PySpark, SQL, and Azure technologies. Skilled in developing efficient frameworks for data ingestion, replication, and extraction using PySpark, and leveraging Azure Function Apps to enhance data processing. Proficient in designing and optimizing data pipelines with Azure Data Factory and Spark Framework, with a strong background in data migration, query optimization, and analysis. A self-driven problem solver, quick learner, and collaborative team player, committed to continuous improvement and delivering innovative solutions in dynamic environments.
Project - Teradata Exit
Key Responsibilities
- Implement tailored migration strategies, including:
- Lift-and-Shift: Replicate schemas and processes in Azure.
- Re-Architect: Optimize workflows for cloud-native architectures.
- Hybrid Migration: Gradually transition workloads with on-premise retention where needed.
- Migrate schemas using Azure Database Migration Service or custom scripts and manage data transfers via Azure Data Factory.
- Conduct rigorous testing to validate data integrity, completeness, and performance by comparing Teradata and Azure results.
- Refactor application queries to support Azure SQL or Synapse-compatible SQL.
- Rebuild and optimize ETL/ELT pipelines using Azure Data Factory, integrating with Azure Databricks for big data processing.
Project - Synapse Exit
Key Responsibilities:
- Migrate from Synapse to Azure DataCore using lift-and-shift, re-architecting, or hybrid strategies.
- Ensure schema compatibility and manage data transfers with Azure tools.
- Validate data accuracy and refactor queries and ETL pipelines for Azure DataCore.
Project - Datacore PaaS Upliftment
Key Responsibilities:
- Design and optimize end-to-end ETL workflows using Azure Data Factory and PySpark, ensuring efficient, secure, and compliant data processing at all stages.
- Build, manage, and enhance ADF pipelines with precise scheduling and integration into Azure DevOps for robust version control.
- Deploy and maintain Synapse SQL resources for stable and scalable production environments.
- Manage data migration activities from Cloudera Kudu to Azure Data Storage, ensuring data integrity and seamless transition.
- Administer PostgreSQL tables to support structured data storage and efficient querying.
- Design and implement secure, compliant frameworks for effective data governance and risk mitigation.
- Improved query performance by around 20% for migrated tables, compared to the previous infrastructure.
- Ensured 99.9% uptime for uplifted tables throughout the migration process in Production.
- Reduced infrastructure costs by approximately 10% after migrating 400 tables to the Datacore PaaS framework.
Business Domain: Telecommunications
Operating System: Windows
Big Data Analytics: Apache Hive, Apache Spark, Databricks, Azure Synapse
Cloud Platforms: Microsoft Azure
Languages: Databricks SQL, Python, T-SQL
IDE/Development Tools: PyCharm, Visual Studio
ETL Tools: Azure Data Factory
Databases: Synapse SQL Server, PostgreSQL, Hive Metastore
Agile Tools: Jira
CI/CD: GitLab, Azure DevOps
With a great penchant for learning, I am committed to expanding my horizons by keeping myself updated with the latest technologies.
I affirm that the details mentioned in this resume are accurate and complete to the best of my understanding.