Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Mohd Arif Ansari

Noida,UP

Summary

Data Engineer with 3.6 years of IT experience, including 3 years in Big Data and cloud projects. Skilled in designing scalable Big Data applications and migrating data warehouses to the cloud. Proficient in the Big Data ecosystem, including Spark, Hive, Kafka, and more. Experienced in Azure data services like Azure SQL Database, Cosmos DB, Data Lake Storage, and Data Factory. Strong in ETL/ELT processes, data modeling, and analysis. Developed RESTful APIs using Spring Boot and real-time data analytics with Spark Streaming and Kafka. Proficient in Python and Azure Data bricks.

Overview

3
3
years of professional experience
1
1
Certification

Work History

Azure Data Engineer

Provana India Pvt. Lmt.
09.2023 - Current
  • Interacted with end customers and gathering requirements for Designing and developing common architecture for storing Retail data within Enterprise and building Data Lake in Azure cloud
  • Developed Geo Tracker applications using Py-Spark to integrate data coming from other sources like ftp, CSV files processed using Azure Data bricks and written into Snowflake
  • Developed Spark applications for data extraction, transformation and aggregation from multiple systems and stored on Azure Data Lake Storage using Azure Data bricks notebooks
  • Worked on Spark with Scala and converted into Py-spark Code for Geo tracker
  • Written Unzip and decode functions using Spark with Scala and parsing xml files into Azure blog storage
  • Developed Py-Spark scripts from source system like Azure Event Hub to ingest data in reload, append, and merge mode into Delta tables in Data bricks
  • Optimized Py-Spark applications on Data bricks, which yielded significant amount of cost reduction
  • Created Pipelines in ADF to copy parquet files from ADLS Gen2 location to Azure Synapse Analytics Data Warehouse
  • Environment: Azure ADF, Py-spark, SQL, Snowflake, Data bricks, GitHub, Azure Git, Kafka, ADF Gen2, ADF Blog Storage.

Data Engineer

Infotech
01.2021 - 09.2023
  • Developed Spark applications for data extraction, transformation and aggregation from multiple systems and stored on Azure Data Lake Storage using Azure Databricks.
  • Created Pipelines in ADF to copy parquet files from ADLS Gen2 location to Azure Synapse Analytics Data Warehouse.
  • Generate weekly based reports and ops reports, customer goals reports, mobile scan and pay goals and usage in sales data by using power BI.
  • Environment: Azure ADF, Scala, Pyspark, Spark, SQL, Snowflake, Databricks, GitHub, Azure Git, Kafka, ADF Gen2, ADF Blog Storage.
  • Azure Synpase, Power BI.

Education

Post Graduation in Machine Learning - Computer Science

Indian Institute of Technology Madras
Chennai
03.2024

Bachelor of Technology - Computer Science

Lovely Professional University
Phagwara, India
06.2020

Skills

  • Big Data Ecosystem: HDFS, Hive, Py-Spark, Spark SQL
  • Cloud Ecosystem: Azure (Data bricks, ADF, Synapse, ADLS Gen2),AWS (EC2, EMR, Lambda, Athena, Glue, Redshift and S3)
  • Languages: Python, SQL
  • Tool: Power BI
  • Databases: MSSQL, SQL Server, and PostgreSQL
  • CI/CD: Azure, Git
  • Streaming: Spark Streaming, Kafka

Certification

  • Microsoft Certified: Azure Data Engineer Associate
  • Microsoft Certified: Data Analyst Associate
  • Microsoft Certified: Azure Fundamentals

Timeline

Azure Data Engineer

Provana India Pvt. Lmt.
09.2023 - Current

Data Engineer

Infotech
01.2021 - 09.2023

Post Graduation in Machine Learning - Computer Science

Indian Institute of Technology Madras

Bachelor of Technology - Computer Science

Lovely Professional University
  • Microsoft Certified: Azure Data Engineer Associate
  • Microsoft Certified: Data Analyst Associate
  • Microsoft Certified: Azure Fundamentals
Mohd Arif Ansari