§ 5+ years of experienceas an Azure Data Engineer, focused on building robust data-intensive applications and skilled in tackling significant architectural and scalability issues.
§ Proficient in constructing data pipelinesvia Azure Data Factory and Azure Data bricks to load data into Azure Data Lake, Azure SQL Database, and Azure SQL Data Warehouse, with a strong focus on controlling and granting database access.
§ Expertise in Microsoft SQL Server and a broad range of Azure PaaS components, including Azure Data Factory, Azure Data bricks, Azure Logic Apps, Azure Data Lake Analytics (U-SQL), Azure App Services, Geo-Replication, Azure Application Insights, and Azure SQL Data Warehouse.
§ Developed Spark applications using Spark SQL in Databricks to extract, transform, and aggregate data from diverse file formats, enabling analysis and insights into customer usage patterns.
§ Hands-on experience utilizing a comprehensive suite of Hadoop tools, including HDFS, Hive, Apache Spark, Apache Sqoop, Flume, Oozie, Apache Kafka, Apache Storm, YARN, Impala, Zookeeper, and Hadoop User Experience.
§ Deep understanding and practical experience with Azure Data Factory V2, designing and implementing end-to-end data integration and processing solutions with diverse sources, pipelines, parameters, activities, and various scheduling methods (manual, window-based, and event-based).
§ Experienced in Azure transformation projectsand Azure architecture decision-making. Architected and implemented ETL anddata movement solutions using Azure Data Factory (ADF).
§ Experienced in implementing ETL and ELT solutions for large data sets.
§ Extracted, transformed, and loaded data from source systems into Azure Data Storage services using Azure Data Factory, T-SQL, Spark SQL, and Azure Data Lake Analytics.
§ Proficient in Database Design and BI Development, leveraging SQL Server, SSIS, DTS Packages, SSAS, DAX, OLAP Cubes, Star Schema, and Snowflake Schema.
§ Designed and deployed Power BI reports and dashboards by connecting to Azure Data Lake Gen2, Azure SQL Database, and Databricks to provide actionable insights for business users.
§ Expertise with Power BI for developing interactive dashboards, KPIs, and self-service BI solutions by connecting to Azure Data Lake, Azure SQL Database, and Data bricks.
§ Ensured secure and role-based access to sensitive datasets by implementing Row-Level Security (RLS) and comprehensive workspace governance within Power BI Service.
§ Automated Power BI dataset refresh schedules and integrated with Azure Data Factory for near real-time reporting across financial and operational domains.
§ Experienced in writing complex DAX queries, creating calculated measures, managing data models, and publishing reports to Power BI Service.
§ Hands-on expertise in Hive data partitioning and bucketing, with experience developing Map Reduce jobs to automate data ingestion from HBase.
§ Solid understanding of Spark Architecture, including Spark Core, Spark SQL, Data Frames, Spark Streaming, Driver Node, Worker Node, Stages, Executors, andTasks.
§ Designed and implemented robust ADF data pipelines, proficiently using activities like Get Metadata, Lookup, For Each, Wait, Execute Pipeline, Set Variable, Filter, and Until to ensure efficient data integration.
§ Extensively used Spark and Scala APIsto compare Spark's performance with Hive and SQL, and leveraged Spark SQL to manipulate Data Frames inScala.
§ Extensively worked with Teradata utilities (Fast Export and MultiLoad) to manage large-scale data transfers across flat files and various source systems.
§ Knowledge of Amazon Web Services (AWS) Cloud Platform, encompassing services such as EC2, S3, VPC, ELB, IAM, DynamoDB, Cloud Front, Cloud Watch, Route 53, Elastic Beanstalk (EBS), Auto Scaling, Security Groups, EC2 Container Service (ECS), Code Commit, Code Pipeline, Code Build, Code Deploy, Redshift, Cloud Formation, Cloud Trail, Ops Works, Kinesis, SQS, SNS, and SES.
§ Good knowledge of Data Marts, OLAP, and Dimensional Data Modeling with Ralph Kimball Methodology (Star Schema and Snowflake Schema Modeling for FACT and Dimension Tables) using Analysis Services.
§ Developed Python-based automation regression scripts to validate ETL processes and data integrity across Oracle, SQL Server, Hive, and MongoDB.
§ Solid understanding of Big Data Hadoop andYARN architecture, along with various Hadoop daemons such as Job Tracker, Task Tracker, Name Node, Data Node, Resource/Cluster Manager, and Kafka (distributed stream-processing).
§ Experienced in creating, managing, and maintaining CI/CD pipelines to drive efficient and reliable data engineering workflows. Implemented CI/CD best practices, security scanning/monitoring, and pipeline integration.
§ Strong understanding of Data Modeling and ETL processes in Data Warehouse environments, including Star Schema and Snowflake Schema.
§ Involved in all phases of the Software Development Life Cycle (SDLC), from Requirements Analysis to Support, and adept at agile methodologies.
Description: Worked on end-to-end data engineering projects, developing reliable data pipelines and reporting solutions. Successfully migrated data from legacy systems to modern platforms, improved data quality and security, and created impactful dashboards for enhanced decision-making. Also supported automation, performance tuning, and collaborated effectively with teams in agile environments.
Responsibilities:
§ Developed scalable data engineering solutions in Azure utilizing services like Azure Data Factory (ADF), Data Lake Analytics, HDInsight, Azure Synapse, and Azure Databricks for both batch and streaming data pipelines.
§ Designed and implemented sophisticated ETL/ELT solutions with ADF v2, orchestrating data integration from both on-premise and cloud endpoints(Blob Storage, Azure SQL, Oracle, REST APIs) using a comprehensive suite of activities and Databricks notebooks.
§ Designed and maintained ADF Linked Services, Datasets, and Data Flows for efficient ETL operations, moving data into Azure Data Lake Storage Gen2, Azure SQL, andSynapse Analytics.
§ Automated data ingestion and transformation processes in ADF through the strategic use of Schedule, Tumbling Window, and Event-Based Triggers.
§ Developed PySpark/Scala/Spark SQL notebooks in Azure Databricks for the cleansing, transformation, and enrichment of large-scale datasetsstored in ADLS Gen2.
§ Implemented robust real-time data pipelineswith Azure Stream Analytics, Event Hub, and Service Bus, delivering processed data to Power BI and SQL Data Warehouse for dynamic insights.
§ Designed and implemented Slowly Changing Dimension (SCD) Type 1 & Type 2 data integration frameworks within Azure Data Lake using U-SQL, PySpark, and Databricks workflows.
§ Performed data cleansing, feature engineering, and missing value imputation on structured and semi-structured datasets using Python and Spark.
§ Successfully migrated legacy ETL workloads toAzure cloud by developing SSIS packages and ADF-SSIS IR integrations.
§ Developed and optimized Azure Synapse views, stored procedures, triggers, and partitioned tables to enhance performance and query efficiency.
§ Implemented end-to-end observability using Azure Monitor, Log Analytics, and Spark UI to track pipeline health, performance, and data quality.
§ Developed interactive Power BI dashboardsfor banking executives and operations teams, enabling real-time monitoring of loan disbursements, customer on boarding funnel, account activity trends, and compliance SLAs using data from Azure SQL, Synapse, andADLS.
§ Designed and deployed Power BI's Row-Level Security (RLS) to enforce strict data access policies for various banking roles, such as branch managers, financial advisors, and audit teams, directly contributing to enhanced data confidentiality and adherence to regulatory standards.
§ Created DAX measures, calculated columns, and custom visuals in Power BI Desktop to address complex business logic and data modeling needs.
§ Automated Power BI dataset refreshesvia Dataflows and Power BI Service, integrating real-time data from Azure Event Hubs to monitor banking transactions, fraud detection alerts, and customer interactions across digital channels.
§ Utilized Power BI Embedded to integrate Power BI reports directly intoSharePoint and custom applications, ensuring seamless data access for various business units.
§ Collaborated with business teams to gather visualization requirements and delivered self-service BI solutions using Power BI, Excel Power Query, andSSRS.
§ Successfully transitioned on-premise data pipelines to Azure cloud, implementing solutions with ARM Templates, ADF, and Databricks.
§ Developed and optimized Spark jobs for efficient batch processing of large datasets in various formats, including JSON, CSV, ORC, and Parquet.
§ Optimized data storage and query performance by designing effective partitioning and bucketing strategies inHive and Spark.
§ Maintained data governance by integrating solutions with Azure Key Vault, RBAC, and managed identities.
§ Actively participated in Agile/Scrum sprints, contributing to backlog grooming and effort estimationfor new data platform enhancements.
Description: Developed and deployed enterprise-grade data pipelines for a large banking initiative, facilitating the migration of legacy systems to Azure and establishing real-time analytics capabilities for financial operations. Managed multi-system data integration, implemented real-time risk analytics processing, and built Power BI dashboards for executive insights. Actively contributed to data solutions supporting KYC/AML, Basel reporting, credit risk modeling, and customer behavior analysis.
Responsibilities:
§ Designed and implemented end-to-end data pipelines using Azure Data Factory, Azure Data Lake Gen2, and Azure SQL to support scalable data ingestion and transformation.
§ Built and deployed ELT pipelines for batch and real-time processing using PySpark, Azure Databricks, and Azure Stream Analytics.
§ Successfully transitioned data from legacy systems and on-premise SQL Servers to Azure SQL Database and Synapse Analytics, utilizing ADF and SSIS.
§ Developed Power BI dashboards to track credit utilization, risk exposure, loan performance, and regulatory compliance trends.
§ Designed and implemented robust real-timedata pipelines leveraging Kafka, Event Hubs, and Spark Structured Streamingto power dynamic tax computation engines.
§ Created and scheduled ADF pipelines to extract data from heterogeneous sources (RDBMS, flat files, APIs) into Azure Data Lake.
§ Developed and executed data transformations within Databricks notebooks using PySpark, incorporating parameterized control through ADF widgets.
§ Developed highly optimized dimensional models and star schemas to enhance Power BI reporting for credit risk scoring, transaction history, and customer segmentation.
§ Built Hive tables and executedHiveQL for pre-aggregated datasets, significantly improving reporting performance.
§ Utilized Informatica for ETL developmentand integrated it with Azure resources for legacy-to-cloud migration.
§ Designed and orchestrated CI/CD pipelines using Git, Azure DevOps, and environment-specific configurations.
§ Utilized HDInsight, HDFS, HBase, Pig, Hive, and Sqoop to manage hybrid big data workloads and facilitate cloud migration efforts.
§ Implemented data validation rules to ensure accuracy of financial data pipelines supporting KYC, AML, Basel reporting, and regulatory compliance.
Description: As a Data Engineer, I was part of a project focused on updating and improving data systems. I built solutions for data movement and cleansing, created visual reports for better understanding, and assisted with real-time data processing, leading to clearer insights.
Responsibilities:
§ Assisted in developing ETL pipelines using SSIS and T-SQL for data ingestion into the enterprise Data Warehouse.
§ Contributed to data cleansing and file format conversion efforts through PySpark and Spark SQL transformations.
§ Participated in the migration of on-premise data(SQL Server, Oracle) to Azure Data Lake using ADF (V2).
§ Created PySpark scripts focused on fundamental data extraction, transformation, and aggregation tasks.
§ Assisted in the configuration and management of Databricks clusters to support batch processing workloads.
§ Designed and built straightforward dashboards inPower BI and Tableau to support internal business reporting requirements.
§ Utilized Azure SQL and Azure Storage Explorer for managing and validating ingested datasets.
§ Gained exposure to Kafka-based data streaming into Databricks using JSON format files.
§ Created real-time Power BI dashboards integrating with Kafka-streamed data to track product movement and inventory fluctuations across regional warehouses.
§ Developed Power BI visuals based on Talend-processed data to monitor data ingestion success rates and error patterns.
§ Collaborated with the team to handle data stored in Avro and Parquet formats.
§ Used Python libraries (Pandas, NumPy)to support small data exploration tasks.
§ Maintained thorough documentation of data lineage and addressed data quality issues identified during ETL job execution.
DP-203 - Data Engineering on Microsoft Azure