I am Azure Big Data Engineer with strong background in Big Data Analytics, ETL, Data Analytics and Azure Cloud Platform. I have Proven track record of 8+ years in diverse facets of software development, Design, and execution of business applications. I have extensive experience on Big Processing pipeline Design, Data Lake Design, Cloud Migration, Data Quality and Data Management. I have working knowledge on Data Mining, Machine Learning and Data Science.
Overview
13
13
years of professional experience
1
1
Certification
Work History
Cloud Data Engineer
Accenture Applied Intelligence
03.2018 - Current
Working as Tech lead managing a team of 10 Data Engineer.
Creating Azure Data Factory Pipeline for ingestion and processing of data.
Contributed in gathering requirements from the client.
Pre-Processing and processing of Data using Azure Databricks using PySpark and Python pandas.
Insert Processed Data using Azure Databricks and Azure Data Factory into Azure SQL DW.
Work on Synapse for data loading and SPARK pool of Synapse Analytics.
Create views in the Azure SQL DW which are used in Power BI.
Worked in POC for different Analytics Use case for Machine learning and other.
Worked in Creating Power BI chart for end users.
Exploring Data sets which can we used for Analytics.
Product Lead
Accenture
03.2018 - Current
Company Overview: Patented ETL and Machine Learning Automation Product of Accenture (Internal Project)
Work as a Lead Data Engineer who will be designing and implementing end to end data solutions to solve complex business problems.
As technology consultant provide valuable data insights to customer and help them to improve their business effectiveness.
Involve in customer interactions to understand their business problems and provide best in class data solutions.
Perform data analysis, design ETL architecture, data modeling and implementing robust data pipelines.
Use Azure Machine Learning model for real time and batch inference.
Implement Azure Cognitive Service for different AI solution.
Work on Synapse for data loading and SPARK pool of Synapse Analytics.
Create PySpark script to automate the ETL process.
Create Stored Proc, SQL function in Azure SQL.
Patented ETL and Machine Learning Automation Product of Accenture (Internal Project)
Lead Big Data Engineer
American multinational enterprise information technology company
03.2018 - Current
Working as lead big data engineer working in Hortonworks platform.
Creation of MongoDB collection and ingesting data in Collection.
As part of BA team create Design Work Flow (HLD/LLD).
AWS Data Engineer
Publicly-owned corporation of the Australian Government
03.2018 - Current
Working as lead big data engineer working in Hortonworks platform.
Creation of MongoDB collection and ingesting data in Collection.
As part of BA team create Design Work Flow (HLD/LLD).
Extract the data from Oracle Golden Gate and store in Kafka Topic.
Using Datastax Cassandra get the data from Kafka and apply Spark SQL Transformation in the data and make the data ready for Tableau Visualization.
For Phase 2 read the data from AWS S3 bucket and have done spark transformation on the file.
Configure EMR cluster to deploy the Spark Job and deploy the AWS Lambda function for scheduling job in AWS.
Ingestion of the data in AWS Athena for further filtering and Analysis.
Create Jenkins Pipeline for deployment of the code.
Big Data Engineer
Indian nationalized banking and financial services company
03.2018 - Current
Working as lead big data engineer working in Hortonworks platform.
As a Data Engineer we must build Kafka Streaming Pipeline using Cassandra that would take data from Oracle Database and feed it to Cassandra.
Have developed Kafka Streaming to Cassandra and the PySpark and Scala-Spark job to transform the data as per the required format.
Create Staging Layer and Raw Layer as per the ETL Process.
Create Big Data Environment for the Process.
Create Encryption Logic for Customer Data using hive UDF(JAVA).
Big Data Engineer
Mashreq Global Services
10.2016 - 03.2018
Developing Big Data Platform using Stg -> Trans -> Hub Layered Architecture on Supply Chain.
Working on: Different module of Supply Chain and creating Spark Jobs & Hive Scripts for Inventory, Purchase Order, Work in Process Modules, and so on.
Automation of Hive Scripts using PySpark.
Oozie Workflows & Oozie Coordinators.
Survey Data Analysis by using Spark, Hive HR Domain.
Replacement of Kinaxis and implementation of Oracle Cloud Planning.
Migration of Race and implementation of Oracle Cloud Planning in Big Data Platform.
Data grooming in spark.
Creating Data Repository which will help in Machine Learning and Reporting Purpose.
Acted as a member of a team working in automating the end-to-end process of data lifecycle across the layered architecture.
Using Spark & Hive for transformation and working on Hortonworks Data Platform for the same.
Collaborating with Business Users for the Requirement Gathering.
Preparing model for the classification problem, data gathering, cleaning, validation, quality, exploratory data analysis, missing value and outlier treatment.
Performing data pre-processing using NLTK (Natural Language Toolkit).
Contributing in requirement analysis and solution discussion.
System Engineer
Tata Consultancy Services
10.2012 - 10.2016
Analyzed log data from different sockets & files and provided insights about the risk associated with the user.
Developed Hive scripts for end user / analyst requirements to perform ad hoc analysis.
Solved performance issues in Hive with understanding of Joins, Group and aggregation and how does it translate to Map Reduce jobs.
Developed simple to complex Spark jobs using Scala; contributed in the requirement and analysis phase.
Managed the importing of data from various data sources; performed transformations using Spark, Hive & Map Reduce.
Engaged in collecting the data from different data sources using SQOOP.
Migrated Map Reduce jobs to Spark.
Wrote Hive queries to analyses data in Hive Warehouse using Hive Query Language (HQL).
Developed Hive and Spark SQL for the business logic.
Transformed structured data using Dataframe and HiveQL.
Education
B.Tech. - ECE
Asansol Engineering College
Asansol, West Bengal, India
01.2012
Skills
Big Data Analytics and Cloud
Project Execution
Requirement Gathering
Stakeholder Management
Customer Engagement
Machine Learning
Change Requests
Relationship Management
Reporting & Documentation
Liaison & Coordination
Big Data Analytics: Apache Hadoop, Apache Oozie, Hive, HDFS, Kafka and Sqoop, Airflow
Machine Learning: Linear Regression, Naïve Bayes, Logistic Regression, Decision Trees and KNN, NLP
Spark/PySpark: Spark Core, Spark SQL, RDD and Spark ML
AI Decision Science Consultant - Data & AI Engineering Product Lead at Accenture Applied IntelligenceAI Decision Science Consultant - Data & AI Engineering Product Lead at Accenture Applied Intelligence