Mohd Arif Ansari

Noida,UP

Summary

Data Engineer with 3.6 years of IT experience, including 3 years in Big Data and cloud projects. Skilled in designing scalable Big Data applications and migrating data warehouses to the cloud. Proficient in the Big Data ecosystem, including Spark, Hive, Kafka, and more. Experienced in Azure data services like Azure SQL Database, Cosmos DB, Data Lake Storage, and Data Factory. Strong in ETL/ELT processes, data modeling, and analysis. Developed RESTful APIs using Spring Boot and real-time data analytics with Spark Streaming and Kafka. Proficient in Python and Azure Data bricks.

Overview

years of professional experience

Certification

Work History

Azure Data Engineer

Provana India Pvt. Lmt.

09.2023 - Current

Interacted with end customers and gathering requirements for Designing and developing common architecture for storing Retail data within Enterprise and building Data Lake in Azure cloud
Developed Geo Tracker applications using Py-Spark to integrate data coming from other sources like ftp, CSV files processed using Azure Data bricks and written into Snowflake
Developed Spark applications for data extraction, transformation and aggregation from multiple systems and stored on Azure Data Lake Storage using Azure Data bricks notebooks
Worked on Spark with Scala and converted into Py-spark Code for Geo tracker
Written Unzip and decode functions using Spark with Scala and parsing xml files into Azure blog storage
Developed Py-Spark scripts from source system like Azure Event Hub to ingest data in reload, append, and merge mode into Delta tables in Data bricks
Optimized Py-Spark applications on Data bricks, which yielded significant amount of cost reduction
Created Pipelines in ADF to copy parquet files from ADLS Gen2 location to Azure Synapse Analytics Data Warehouse
Environment: Azure ADF, Py-spark, SQL, Snowflake, Data bricks, GitHub, Azure Git, Kafka, ADF Gen2, ADF Blog Storage.

Data Engineer

Infotech

01.2021 - 09.2023

Developed Spark applications for data extraction, transformation and aggregation from multiple systems and stored on Azure Data Lake Storage using Azure Databricks.
Created Pipelines in ADF to copy parquet files from ADLS Gen2 location to Azure Synapse Analytics Data Warehouse.
Generate weekly based reports and ops reports, customer goals reports, mobile scan and pay goals and usage in sales data by using power BI.
Environment: Azure ADF, Scala, Pyspark, Spark, SQL, Snowflake, Databricks, GitHub, Azure Git, Kafka, ADF Gen2, ADF Blog Storage.
Azure Synpase, Power BI.

Education

Post Graduation in Machine Learning - Computer Science

Indian Institute of Technology Madras

Chennai

03.2024

Bachelor of Technology - Computer Science

Lovely Professional University

Phagwara, India

06.2020

Skills

Big Data Ecosystem: HDFS, Hive, Py-Spark, Spark SQL
Cloud Ecosystem: Azure (Data bricks, ADF, Synapse, ADLS Gen2),AWS (EC2, EMR, Lambda, Athena, Glue, Redshift and S3)
Languages: Python, SQL
Tool: Power BI

Databases: MSSQL, SQL Server, and PostgreSQL
CI/CD: Azure, Git
Streaming: Spark Streaming, Kafka