Over 6.8 Years of IT experience on data driven application design and development with 5 years of relevant hands-on-experience in various Azure data services..
Proficient in Azure technologies such as Azure Data Factory (ADF), Azure DataBricks (ADB),Azure Synapse Analytics, Azure Active Directory, Azure Storage, Azure data Lake Services (ADLS), Azure key vault, Azure SQL DB, Azure HDInsight.
Having good hands on experience on Azure DevOps (ADO) services like Repos, Boards, Build Pipelines (CI/CD), Ansible (yaml scripting) for resource orchestration and code deployment.
Hands on Experience developing data engineering frameworks and notebooks using Azure databricks using Spark SQL, Scala, pyspark.
Experience in Apache Hadoop frameworks such as Hdfs,Map Reduce, Hive etc.
Experience in Microsoft Azure Cloud with Data Factory,LinkServices,HDI Cluster ,DataLake Gen2and DataBricks.
Good knowledge on Azure synapse Analytics.
Proficient in Big data ingestion tools like kafka, spark streaming and sqoop for streaming and batch data ingestion.
Worked with Big Data distributions like Hortonworks (Hortonworks 2.1) with Ambari.
Hands-on experience in application development using Java, RDBMS, and Linux shell scripting.
Hands-on experience working with IDE tools such as Eclipse, NetBeans and Maven.
Worked on Tableau to generate reports.
Overview
8
8
years of professional experience
Work History
senior data engineer
Puresoftware technologies pvt Ltd
03.2023 - Current
Azure Data Developer
ACERT IT SOLUTIONS PRIVATE LIMITED
01.2017 - 03.2023
Education
B.Tech - ECE
National institute of technology,Warangal University(NITW)
Skills
Technical skills :
Languages
Python, SQL, Pyspark
Technologies
Azure, Azure Functions, Azure Data Factory, Data Flows
The goal of the Data Science Technical Delivery Platform ingestion process is to efficiently connect to and ingest data from both on-premises and Azure/external systems of record
This data is captured in its original format and landed into the Enterprise Azure Data Lake (ADLS Gen2)
The ingestion process generates lifecycle events and captures data provenance information, which is then sent to the Data Science Technical Delivery Platform Orchestrator for further processing via event-based integration
Roles and Responsibilities
Pipeline Development
Azure Data Factory (ADF): Designed and implemented data pipelines to extract data from various sources and load it into Azure Synapse Analytics
Data Transformation: Utilized PySpark for data transformation tasks and pushed the processed data into Azure Data Lake Storage (ADLS)
Infrastructure and Deployment
Azure DevOps: Set up infrastructure, built, and deployed applications using Azure DevOps
Incremental Load Strategy: Implemented strategies for daily incremental data loads to ensure efficient and timely updates
Testing and Releases
Release Management: Managed deployment releases, including unit and integration testing, to ensure quality and functionality
Data Factory Management
Linked Services and Datasets: Created and managed linked services, datasets, and pipelines within Azure Data Factory
Stored Procedures: Developed and optimized stored procedures using T-SQL
Activities: Configured and managed copy activities, lookup activities, and metadata activities in ADF
Monitoring: Monitored pipelines, identified issues, and implemented fixes as necessary
Data Transformation and Workflow
Data Flows: Designed and implemented data flows for transforming and moving data to Azure using Azure Data Factory
End-to-End Framework: Developed a comprehensive project framework, ensuring timely delivery and alignment with customer requirements
Key Achievements
Efficient Data Integration: Streamlined the ingestion process from multiple data sources into Azure, ensuring data integrity and availability
Robust Transformation Processes: Leveraged PySpark for scalable data transformations, enhancing data processing capabilities
Effective Deployment: Successfully managed application deployment and testing, ensuring reliable and smooth operation
Proactive Problem-Solving: Demonstrated proactive problem-solving skills by addressing issues promptly and meeting project deadlines
Project 2 :
Project Name : Data lake Data engineering
Client : Communication and Media
Environment : Azure data factory, Azure data bricks, Azure sql db, ADLS 2
Role : Sr
Azure Data engineer
The Data Lake Technology Platform represents a modern technological foundation within a secure, hosted ecosystem
This platform integrates client data with industry-specific data feeds, leveraging Media's unique capabilities in data analytics and advanced AI
It aims to deliver enhanced opportunities throughout the customer lifecycle
Roles and Responsibilities
ETL Workflow Management
Azure ADF & Databricks: Developed and managed ETL workflows using Azure Data Factory (ADF) and Databricks with PySpark
This involved extracting data from relational databases and loading it into Azure SQL Database
Data Transformation: Extensively transformed data using PySpark and pushed the processed data into Azure Data Lake Storage (ADLS) Gen2
Data Storage: Stored transformed data into Azure SQL Database for consumption by Power BI and Spotfire
Data Migration
On-Prem to Azure: Led data migration projects from on-premises systems to Azure Cloud using Databricks and Spark APIs, ensuring seamless transition and data integrity
Daily Operations
Scrum Participation: Attended daily scrum meetings and provided updates on Azure DevOps (ADO) user stories
Pipeline Monitoring: Monitored pipeline jobs for performance and reliability, promptly addressing and fixing any issues that arose
(M.Ram kalyan)
Additional Information
SCD Type 2: Apply Slowly Changing Dimension (SCD) Type 2 techniques to track and preserve historical changes in data, maintaining a complete history.
4. Incremental Load and Delta Processing
Delta Pipeline: Create and manage a delta pipeline for handling incremental loads. Set up jobs to run at hourly or daily intervals to keep the data current.
5. Data Quality and Governance
Quality Rules: Define and enforce data quality rules to ensure data accuracy and integrity throughout the pipeline.
Unity Catalog & Lineage: Utilize Unity Catalog for data governance and lineage graphs to monitor data flow, transformations, and quality.
Best Practices
Documentation: Keep comprehensive documentation of queries, encryption rules, and historical tracking processes.
Testing: Implement robust testing strategies for queries and data transformations to ensure performance and accuracy.
Monitoring & Alerts: Set up real-time monitoring and alerts for pipeline jobs to quickly address any issues.
Version Control: Use version control systems to manage and track changes to queries and rules.
Compliance: Regularly review and update encryption practices to meet evolving compliance standards.
Timeline
senior data engineer
Puresoftware technologies pvt Ltd
03.2023 - Current
Azure Data Developer
ACERT IT SOLUTIONS PRIVATE LIMITED
01.2017 - 03.2023
B.Tech - ECE
National institute of technology,Warangal University(NITW)