Summary
Overview
Work History
Education
Skills
Disclaimer
Timeline
Generic

Md Irfan

Darbhanga

Summary

Accomplished Data Engineer with a solid foundation in Computer Science and Engineering, backed by 3+ years of hands-on experience. Proficient in leveraging Azure (Databricks, IoT Hub, Data Factory, Data Explorer), AWS (Glue), Python, PySpark, Spark, and SQL to architect, develop, and optimize data pipelines. Experienced in Generative AI and Machine Learning Forecasting Algorithms like ARIMA and Logistic Regression, enabling intelligent automation and predictive analytics. Adept at translating complex business requirements into scalable and efficient data solutions, ensuring seamless data flow for actionable insights. A committed problem-solver, passionate about driving data-driven decision-making and enhancing business intelligence.

Overview

3
3
years of professional experience

Work History

Data Engineer

DAILOQA SOLUTIONS INDIA PVT LTD
Noida
07.2025 - Current

Data Interaction Tool – NLP-driven Database and Data Warehouse Querying:

  • Developed an intelligent data interaction layer enabling natural language queries across multiple databases and data warehouses.
  • Designed and implemented Spark engine endpoints to process AI-converted SQL queries on Delta format data.
  • Optimized performance by dynamically activating only the required tables parsed from queries, and auto-deactivating views after use, ensuring high system efficiency and cost optimization.

FinTech Data Pipeline – Automated OCR, Validation, and Reporting:

  • Built a data engineering + GenAI pipeline to process client financial reports uploaded to SharePoint.
  • Implemented OCR pipelines to extract raw text, followed by custom business-rule–driven field extraction into structured JSON.
  • Developed validation logic to cross-check and merge fields across multiple reports, ensuring data consistency and compliance.
  • Automated generation of final client-ready PDF reports, streamlining reporting workflows for a FinTech client.

Data Engineer

Madgical Techdom(OPC) Pvt. Ltd.
Noida
11.2022 - 07.2025

Working in Azure Data Factory:

  • Designed and implemented ETL pipelines using Azure Data Factory to extract data from Blob storage, load it into a database table, and vice versa.
  • Designed and implemented ETL pipelines using Azure Data Factory to fetch data from the API and load it into Blob storage or a database table.

Working in GenAI:

  • Developed a model that reads custom documents and provides answers to questions asked based on the content available in the documents.
  • Developed a model that reads multiple articles from news APIs provided, and then it categorizes and prepares a unique article (header and content) on a topic.
  • Develop a telecom collection agent that functions as a collection bot. This bot will remind and motivate users to pay before the due date. It will always be polite and handle abusive language appropriately.
  • Fine-tune the large language model (LLaMA 13B) for email classification into nine categories. We achieved an impressive accuracy of 94%.

Working in Azure IoT Hub and PySpark in Databricks:

  • I wrote a program in PySpark Streaming to collect data from the IoT hub that fetches data from multiple IoT devices, and then it processes and stores this collected data in the Azure Delta Table, where we visualize it.

Working in ML/AI:

  • I wrote a program in Python using the ARIMA and Logistic Regression algorithms to train a model for time-series forecasting. I also worked on a Text-to-Speech model for Indian local languages, such as Hindi, Odia, Marathi, etc.

Working in AWS Glue:

  • Designed and executed an ETL pipeline to extract data from an AWS S3 bucket, define the schema, transform the data using PySpark, and load it into OpenSearch.

Graduate Software Trainee

Milestone Online Services PVT LTD
04.2022 - 10.2022

I have 8 months of experience as a Software Trainee at Milestone OS, where I contributed to various projects involving REST APIs, React.js, Node.js, Bootstrap, JavaScript, HTML5, and CSS3. My responsibilities included:

  • Database Design & API Development: Designed databases and created efficient REST APIs using Node.js and PostgreSQL.
  • SQL Optimization: Tuned SQL queries and analyzed execution plans to enhance database performance.
  • Frontend Development: Built responsive UIs using the React framework, ensuring a seamless user experience.
  • Performance Optimization: Reduced API calls at the frontend by leveraging local storage, improving application performance.
  • Collaboration & Training: Assisted team members with support tasks and provided training and consultation to new and less-experienced colleagues.
  • Technical Communication: Possess strong verbal and written technical communication skills, enabling effective collaboration at all levels.

Education

B.Tech. - Computer Science and Engineering

Sharda University
Greater Noida, UP
01.2020

Skills

PROGRAMMING

  • Python
  • Java
  • C
  • NodeJS
  • SQL

DATABASES

  • MySql
  • PostgreSQL
  • DynamoDB
  • AWS RDS
  • MongoDB

CLOUD SERVICES

  • Azure Databricks
  • Azure Data Factory
  • Azure IoT Hub
  • AWS EC2
  • AWS Glue
  • AWS SageMaker
  • AWS S3

LIBRARIES/FRAMEWORKS

  • Pyspark
  • Spark
  • LangChain
  • Llama Index
  • Pandas
  • Streamlit
  • ReactJS
  • LangGraph

TOOLS/PLATFORMS

  • Git
  • Docker
  • Visual Studio

OTHERS

  • Data Structure
  • Algorithm Designing
  • REST APIs
  • Data Analytics

Disclaimer

I hereby declare that mentioned information above is true to the best of my knowledge and belief.

Timeline

Data Engineer

DAILOQA SOLUTIONS INDIA PVT LTD
07.2025 - Current

Data Engineer

Madgical Techdom(OPC) Pvt. Ltd.
11.2022 - 07.2025

Graduate Software Trainee

Milestone Online Services PVT LTD
04.2022 - 10.2022

B.Tech. - Computer Science and Engineering

Sharda University
Md Irfan