Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Sandeep Sai

Bengaluru

Summary

  • Senior Data Engineer with over 6 years of professional experience in the field. Demonstrated expertise in a wide range of technologies, including Python, MySQL, Azure Data Factory, Azure Databricks, Azure Synapse, PySpark, RPA, and Power BI.
  • Demonstrated expertise in building scalable data pipelines and modern data warehouse solutions to support increasing data volume and complexity. Proven ability to collaborate effectively with cross-functional teams to deliver business-driven analytics solutions.
  • Reliable Data Engineer with a strong passion for designing, developing, and implementing data pipelines in Azure Data Factory. Used Azure Data Factory to move and transform data from various sources like databases, files, APIs, and cloud services. Skilled at delivering data to destinations such as Data Lakes, Data Warehouses, and databases.
  • Demonstrated expertise in creating and maintaining database objects such as Tables, Views, Indexes, Triggers, and Sequences, ensuring efficient data storage and retrieval mechanisms. Implemented best practices for database design and optimization to enhance performance and scalability.
  • Proficient in transforming structured and semi-structured data from various sources using Mapping Dataflow Transformations within Azure Data Factory. Utilized a range of transformations including Join, Conditional Split, Lookup, Union All, Sort, Aggregation, Derived Column, Pivot, Parse, Rank, Window, etc., to standardize and enrich data for downstream analytics and reporting purposes.
  • Implemented Slowly Changing Dimensions (SCD Type 1, SCD Type 2, SCD Type 3, SCD Type 4) in Azure Data Factory to capture and load delta changes into the Data Warehouse. This involved designing and deploying strategies for handling historical and incremental data updates, ensuring data consistency and integrity across the data lifecycle.
  • Created pipelines in ADF using linked services/Datasets/Pipeline/to Extract, Transform and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse.
  • Implemented email feature using Azure Logic Apps to automate pipeline notifications, ensuring timely communication and efficient monitoring of data workflows.
  • Demonstrated proficiency in managing key vaults services, ensuring secure storage and management of sensitive data assets in accordance with industry best practices.
  • Led the implementation of CI/CD pipelines for Azure Data Factory (ADF), streamlining the deployment process and enabling continuous integration and delivery of data pipelines.
  • Applied expertise in implementing Incremental pipeline with watermark, optimizing data processing efficiency and minimizing resource utilization for enhanced performance.
  • Managed the publication, scheduling, and triggering of pipelines on a daily and weekly basis, aligning with the specific business requirements of clients. This involved orchestrating data workflows to ensure timely execution and delivery in accordance with client expectations.
  • Demonstrated experience in scheduling Databricks notebooks within Azure Databricks and Azure Data Factory, optimizing data processing workflows for efficiency and reliability.
  • Proficient in utilizing Azure Data Factory for ETL processes, including delta loads and insert-update loads, streamlining data integration and transformation tasks. Successfully automated these processes to enhance productivity and accuracy.
  • Leveraged Azure DevOps for effective management of Azure Data Factory and Azure Databricks across multiple environments. Implemented best practices for version control, deployment, and monitoring, ensuring seamless collaboration and smooth operations.

Overview

6
6
years of professional experience
1
1
Certification

Work History

Azure Data Engineer

HCL Technologies
10.2020 - Current

Project Title: Data Integration and Analysis for Apple Green Company's Fast Food Chains.

Project Overview: Apple Green Company operates several fast-food chains including Burger King, KFC, and Pizza Hut across the US and UK. The project aims to collect, integrate, and analyze data from these locations to provide valuable insights to the client for better decision-making and operational efficiency.

Roles & Responsibilities:

  • Design and develop data pipelines to collect and integrate data from Burger King, KFC, and Pizza Hut locations into Azure.
  • Implement data cleaning and transformation processes to ensure data quality and consistency.
  • Work with Azure services to store and manage the collected data securely.
  • Optimize data processing and storage for performance and scalability.
  • Collaborate with stakeholders to understand analytical requirements and translate them into technical solutions.
  • Develop and deploy analytical models and algorithms to analyze the integrated data and extract insights.
  • Create interactive dashboards and reports using Azure Power BI to visualize and communicate the findings to stakeholders.
  • Monitor and maintain the data pipelines and analytical solutions to ensure they meet performance and reliability requirements.
  • Provide support and training to end-users on using the analytical solutions effectively.

Project Title: Data Integration and Analysis for Apple Green Company's Fast Food Chains.

Project Overview: Apple Green Company operates several fast-food chains including Burger King, KFC, and Pizza Hut across the US and UK. The project aims to collect, integrate, and analyze data from these locations to provide valuable insights to the client for better decision-making and operational efficiency.

Roles & Responsibilities:

  • Experience in creating complex stored procedures & performance tuning of SQL queries.
  • Developed Azure data factory Pipelines for moving data from staging to Datawarehouse using incremental data load process.
  • Azure DataBricks used for performing transformations on the data & ADF pipelines are used to call the data bricks jobs.
  • Pyspark has been used to carry out data bricks related jobs & involved consuming parquet files generated by AKS jobs.
  • Designed Azure ADF pipelines using lift & shift, Implemented ADF, SSIS packages designing and development to load data from sources to Azure database.
  • Designed Azure ADF pipelines to move data from 6 diff sources to Azure Data Lake Gen2 and then to Azure Data Warehouse.
  • Implemented activities Copy activity, Execute Pipeline, Get Meta data, If Condition, Lookup, Set Variable, Filter, For Each pipeline Activities for On-cloud ETL processing.
  • Primarily involved in Data Migration using SQL, SQL Azure, Azure Data Lake and Azure Data Factory.
  • Professional in creating a data warehouse, design-related extraction, loading data functions, testing designs, data modeling, and ensure the smooth running of applications.
  • Responsible for extracting the data from OLTP and OLAP using Azure Data factory and Databricks to Data Lake.
  • Developed pipelines that can extract data from various sources and merge into single source datasets in Data Lake using Databricks.
  • Creating the linked service for source and target connectivity Based on the requirement.
  • Once it’s created pipelines and datasets will be triggered based on LOAD (HISTORY/DELTA) operations.
  • Created Mount point for Data lake and extracted Different formats of data like CSV and Parquet.
  • Created data frames and transformed DF’s using Pyspark.
  • Loading data from on premise to Data lake and to Azure SQL tables using SSIS and Azure data factory.
  • Extracted Data from CSV, Excel and SQL server sources to Staging tables Dynamically using ADF pipelines

Data Engineer

Accenture
09.2018 - 08.2020

Software Engineer

EXL Services
01.2018 - 09.2018
  • Leveraged text, charts and graphs to communicate findings in understandable format.
  • Analyzed large amounts of data to identify trends and find patterns, signals and hidden stories within data.
  • Assessed large datasets, drew valid inferences and prepared insights in narrative or visual forms.
  • Identified, reviewed and evaluated data management metrics to recommend ways to strengthen data across enterprise.
  • Led recruitment and development of strategic alliances to maximize utilization of existing talent and capabilities.
  • Aggregated and cleaned data from TransUnion on thousands of customers' credit attributes
  • Performed missing value imputation using population median, check population distribution for numerical and categorical variables to screen outliers and ensure data quality
  • Leveraged binning algorithm to calculate the information value of each individual attribute to evaluate the separation strength for the target variable
  • Checked variable multicollinearity by calculating VIF across predictors
  • Built logistic regression model to predict the probability of default; used stepwise selection method to select model variables
  • • Tested multiple models by switching variables and selected the best model using performance metrics including KS, ROC, and Somer’s D

Education

Master of Engineering -

Osmania University
Hyderabad, India

B.Tech -

JNTU
Hyderabad, India

Skills

  • Programing language: SQL, Python
  • Databases and Azure Cloud tools : Microsoft SQL server, Azure Data Lake, Azure blob storage Gen 2, Azure Synapse , Azure Event hub, Azure Data Factory, Azure databricks, Azure Monitor service, Azure Functions, Azure LogicApps
  • Frameworks : Spark [Structured Streaming, SQL], KafkaStreams
  • Databases : Azure SQL database , Azure SQL DW, Microsoft SQL Server
  • Visualization: PowerBI
  • Cloud Platforms: Azure
  • 4 years of data engineer experience in the cloud
  • Strong in Azure services including ADB and ADF
  • Experienced in the progress of real-time streaming analytics data pipeline Confidence in building connections between event hub and Stream analytics
  • Skilled and goal-oriented in team work within github version control
  • Fully skilled within data mining by using databricks notebook, Numpy, and Pandas Data visualizations by using PowerBI
  • Writing complex regex expressions and finding patterns in the data
  • Experience in Agile methodology
  • Always looking forward to taking challenges and always curious to learn different things

Certification

DP-900 Azure Data Fundementals

DP-203 Azure Data Engineer Associate

Timeline

Azure Data Engineer

HCL Technologies
10.2020 - Current

Data Engineer

Accenture
09.2018 - 08.2020

Software Engineer

EXL Services
01.2018 - 09.2018

Master of Engineering -

Osmania University

B.Tech -

JNTU
Sandeep Sai