Summary
Overview
Work History
Education
Skills
Timeline
Generic

Payel Panja

Data Engineer
Almere

Summary

Skilled data professional with 9 years of experience in IT industry, specialized in developing end-to-end solutions that enable businesses to unlock the full potential of their data. With extensive experience in developing production-ready streaming ingestion pipelines to ingest terabytes of data using Azure DataBricks, Azure Delta Lake, Azure Data Factory, Spark Streaming, and data transformation techniques, I am adept at processing large volumes of data efficiently while ensuring the security and privacy of sensitive information.

Overview

10
10
years of professional experience
4
4
years of post-secondary education

Work History

Data Engineer

NN Group
12.2023 - Current
  • Designed and implemented ingestion pipeline in Azure DataBricks, incorporating logical modeling techniques to extract valuable insights from the data.
  • Developed and productionized streaming solution which dynamically performs masking of sensitive data and capable of processing process terabytes of data efficiently.
  • Proposed and performed refactoring of the existing ingestion pipelines in databricks and azure data factory to achieve performance improvement.

Data Engineer

Accenture Netherlands
10.2021 - 11.2023

Project Name : ASML

Responsibilities:

  • Provided solution and developed production ready streaming ingestion pipeline to ingest terabytes of data from on-prem to Azure Cloud using Azure Databricks and Azure Data Factory, Spark Streaming alone with data transformation. This solution helped ASML to migrate data from on-premise Hadoop to Azure and provides open door to perform exploratory data analysis on various types of machine data.
  • Fine tuned and enhanced ingestion pipelines using Azure Databricks by optimizing spark code and tuning spark parameters.
  • Developed solution using Azure DataBricks pipeline to extract data from on-premise impala database to ADLS and performed transformation to make it as trusted datasets for end users.
  • Developed automated databricks jobs to mitigate data quality issues.
  • Designed & developed MLOps CI/CD pipeline using Azure DevOps & Azure Machine Learning Studio that automate the data ingestion,machine learning lifecycle, from data preprocessing,model training, model evaluation to model deployment in various environment, which have enabled businesses to accelerate the time-to market of their ML models while ensuring quality and compliance.
  • Developed MLOPS framework to automate machine learning models data preprocessing,training, testing using Azure DataBricks & MLFlow.
  • Developed dashboard using Azure monitor to monitor ingestion pipelines running using Azure databricks and Azure Data Factory.
  • Created CI/CD solutions to deploy ingestion pipelines using terraform & Azure DevOps.
  • Hands on working experience with Hadoop and kubernetes.
  • Exposure to work in agile methodologies and DevOps

Machine Learning Engineer

Tata Consultancy Services
11.2018 - 08.2021

Project Name : TCS Digital Twin

Project Description : TCS Digital twin provide the capabilities to perform Data mining, Analysis on Data , Model building , Data processing that combines artificial intelligence, physics of the phenomena involved and domain knowledge of business area .Products enables different domains of customers to experience powerful advanced analytics capabilities.

Responsibilities:

  • Developed machine learning application using Django Rest Framework/Python,which facilitates automated model building capabilities.Application takes least amount of time along with data pre-processing based on any dataset uploaded by user.Models will be selected considering various parameters like accuracy,error,precision/recall/F1 Score & post model building , models can be deployed across various platforms
  • Developed framework to predict order lifecycle of telecom giant using various machine learning & deep learning algorithms and django rest framework
  • Performed modeling to predict order milestone duration and milestone SLA duration along with data extraction from source system, feature generation for modeling,orchestration and deployments of models in various environments
  • Implemented fallout predictions for order milestones using NLP and statistical approach
  • Involved in understanding of ML use cases for Churn prediction, Fraud Order prediction & developed models using supervised and unsupervised algorithms
  • Strong understanding of various machine learning & Deep Learning algorithms Regression, Logistic Regression, Support Vector Machine, Naive Bayes Classification, Decision Tree,Bagging,Boosting,PCA, Clustering, RNN, CNN
  • Skilled in libraries such as Sklearn, Numpy, Pandas, Matplotlib, Seaborn, Tensorflow, Keras
  • Experienced in creating API's using Django Rest Framework & Flask based on requirements
  • Experience of Azure Data storage like Azure blob, Azure Sql, Azure Data lake storage and Azure cosmos DB
  • Experienced in data preprocessing techniques using Azure DataBricks
  • Experience in all phases of the software development life-cycle such as agile process (requirements, design, development, testing, release, support)
  • Exposure to work in agile methodologies and DevOps

Database Administrator & Data Engineer

Accenture
09.2014 - 11.2018

Project Name : H&M and Carrefour

Responsibilities :

  • Developed Machine Learning model for database growth estimation & archive area estimation
  • Development as well as worked on Performance tuning of the ETL codes for populating multiple Layers as Stage,Integration (IDS), Data mart and Aggregate Data mart using OWB
  • Installed & Configured Oracle grid, 11g/12c Databases(Standalone,RAC,Dataguard) and Oracle 13c enterprise manager
  • Performed database restoration for creation of new environment by RMAN and data pump utility & upgradation from standard to enterprise edition in 11g
  • Managed database structure, storage allocation, table/index segments, database access, roles and privileges, RAC as well as standalone databases
  • Worked extensively to tune problematic query by creating baseline, DB profiles & reconstructing DB objects & also experienced in handling performance issues
  • Experienced in RDBMS(Oracle,SQL server, Postgresql) & No SQL databases(Cassandra,Redis)

Education

Bachelor of Science - Electrical, Electronics And Communications Engineering

West Bengal University Of Technology
Kolkata
06.2010 - 06.2014

Skills

Azure Databricks

undefined

Timeline

Data Engineer

NN Group
12.2023 - Current

Data Engineer

Accenture Netherlands
10.2021 - 11.2023

Machine Learning Engineer

Tata Consultancy Services
11.2018 - 08.2021

Database Administrator & Data Engineer

Accenture
09.2014 - 11.2018

Bachelor of Science - Electrical, Electronics And Communications Engineering

West Bengal University Of Technology
06.2010 - 06.2014
Payel PanjaData Engineer