Summary

Overview

Work History

Education

Skills

Websites

Certification

Languages

Personal Information

Timeline

Sourav Banerjee

Bengaluru

Summary

I am Azure Big Data Engineer with strong background in Big Data Analytics, ETL, Data Analytics and Azure Cloud Platform. I have Proven track record of 8+ years in diverse facets of software development, Design, and execution of business applications. I have extensive experience on Big Processing pipeline Design, Data Lake Design, Cloud Migration, Data Quality and Data Management. I have working knowledge on Data Mining, Machine Learning and Data Science.

Overview

years of professional experience

Certification

Work History

Cloud Data Engineer

Accenture Applied Intelligence

03.2018 - Current

Working as Tech lead managing a team of 10 Data Engineer.
Creating Azure Data Factory Pipeline for ingestion and processing of data.
Contributed in gathering requirements from the client.
Pre-Processing and processing of Data using Azure Databricks using PySpark and Python pandas.
Insert Processed Data using Azure Databricks and Azure Data Factory into Azure SQL DW.
Work on Synapse for data loading and SPARK pool of Synapse Analytics.
Create views in the Azure SQL DW which are used in Power BI.
Worked in POC for different Analytics Use case for Machine learning and other.
Worked in Creating Power BI chart for end users.
Exploring Data sets which can we used for Analytics.

Product Lead

Accenture

03.2018 - Current

Company Overview: Patented ETL and Machine Learning Automation Product of Accenture (Internal Project)
Work as a Lead Data Engineer who will be designing and implementing end to end data solutions to solve complex business problems.
As technology consultant provide valuable data insights to customer and help them to improve their business effectiveness.
Involve in customer interactions to understand their business problems and provide best in class data solutions.
Perform data analysis, design ETL architecture, data modeling and implementing robust data pipelines.
Use Azure Machine Learning model for real time and batch inference.
Implement Azure Cognitive Service for different AI solution.
Work on Synapse for data loading and SPARK pool of Synapse Analytics.
Create PySpark script to automate the ETL process.
Create Stored Proc, SQL function in Azure SQL.
Patented ETL and Machine Learning Automation Product of Accenture (Internal Project)

Lead Big Data Engineer

American multinational enterprise information technology company

03.2018 - Current

Working as lead big data engineer working in Hortonworks platform.
Creation of MongoDB collection and ingesting data in Collection.
As part of BA team create Design Work Flow (HLD/LLD).

AWS Data Engineer

Publicly-owned corporation of the Australian Government

03.2018 - Current

Working as lead big data engineer working in Hortonworks platform.
Creation of MongoDB collection and ingesting data in Collection.
As part of BA team create Design Work Flow (HLD/LLD).
Extract the data from Oracle Golden Gate and store in Kafka Topic.
Using Datastax Cassandra get the data from Kafka and apply Spark SQL Transformation in the data and make the data ready for Tableau Visualization.
For Phase 2 read the data from AWS S3 bucket and have done spark transformation on the file.
Configure EMR cluster to deploy the Spark Job and deploy the AWS Lambda function for scheduling job in AWS.
Ingestion of the data in AWS Athena for further filtering and Analysis.
Create Jenkins Pipeline for deployment of the code.

Big Data Engineer

Indian nationalized banking and financial services company

03.2018 - Current

Working as lead big data engineer working in Hortonworks platform.
As a Data Engineer we must build Kafka Streaming Pipeline using Cassandra that would take data from Oracle Database and feed it to Cassandra.
Have developed Kafka Streaming to Cassandra and the PySpark and Scala-Spark job to transform the data as per the required format.
Create Staging Layer and Raw Layer as per the ETL Process.
Create Big Data Environment for the Process.
Create Encryption Logic for Customer Data using hive UDF(JAVA).

Big Data Engineer

Mashreq Global Services

10.2016 - 03.2018

Developing Big Data Platform using Stg -> Trans -> Hub Layered Architecture on Supply Chain.
Working on: Different module of Supply Chain and creating Spark Jobs & Hive Scripts for Inventory, Purchase Order, Work in Process Modules, and so on.
Automation of Hive Scripts using PySpark.
Oozie Workflows & Oozie Coordinators.
Survey Data Analysis by using Spark, Hive HR Domain.
Replacement of Kinaxis and implementation of Oracle Cloud Planning.
Migration of Race and implementation of Oracle Cloud Planning in Big Data Platform.
Data grooming in spark.
Creating Data Repository which will help in Machine Learning and Reporting Purpose.
Acted as a member of a team working in automating the end-to-end process of data lifecycle across the layered architecture.
Using Spark & Hive for transformation and working on Hortonworks Data Platform for the same.
Collaborating with Business Users for the Requirement Gathering.
Preparing model for the classification problem, data gathering, cleaning, validation, quality, exploratory data analysis, missing value and outlier treatment.
Performing data pre-processing using NLTK (Natural Language Toolkit).
Contributing in requirement analysis and solution discussion.

System Engineer

Tata Consultancy Services

10.2012 - 10.2016

Analyzed log data from different sockets & files and provided insights about the risk associated with the user.
Developed Hive scripts for end user / analyst requirements to perform ad hoc analysis.
Solved performance issues in Hive with understanding of Joins, Group and aggregation and how does it translate to Map Reduce jobs.
Developed simple to complex Spark jobs using Scala; contributed in the requirement and analysis phase.
Managed the importing of data from various data sources; performed transformations using Spark, Hive & Map Reduce.
Engaged in collecting the data from different data sources using SQOOP.
Migrated Map Reduce jobs to Spark.
Wrote Hive queries to analyses data in Hive Warehouse using Hive Query Language (HQL).
Developed Hive and Spark SQL for the business logic.
Transformed structured data using Dataframe and HiveQL.

Education

B.Tech. - ECE

Asansol Engineering College

Asansol, West Bengal, India

01.2012

Skills

Big Data Analytics and Cloud
Project Execution
Requirement Gathering
Stakeholder Management
Customer Engagement
Machine Learning
Change Requests
Relationship Management
Reporting & Documentation
Liaison & Coordination
Big Data Analytics: Apache Hadoop, Apache Oozie, Hive, HDFS, Kafka and Sqoop, Airflow
Machine Learning: Linear Regression, Naïve Bayes, Logistic Regression, Decision Trees and KNN, NLP
Spark/PySpark: Spark Core, Spark SQL, RDD and Spark ML
Azure Cloud: Azure Data Factory, Azure Databricks, Azure Synapse Analytics, Azure Function, Azure Datalake, Azure App Service, Azure Webapp, Azure Cognitive Service, Azure VM

Cloud Platforms: AWS and Azure
Deep Learning: Computer Vision, Keras
NoSQL Database: MongoDB, Cassandra, HBase
Data Science: NumPy, Pandas, Seaborn, Matplotlib, Scikit Learn
DevOps / Continuous Integration: Jira, Bit Bucket, Jenkins and Confluence, GitHub, Gitlab
Languages: Python, Scala, R(Basic)
Environment: Linux and Windows 10/7, Guardian
Database: Oracle and MySQL, MsSql, Snowflake
Developer Tools: Jupyter, Putty, WinSCP, RoboMongo, MongoDB Compass, Beyond Compare, STORQM, NOTEPAD, MS Excel, Ultra Edit, Expeditor, VMWARE
IDE: Eclipse, IntelliJ, VS Code, Spyder, Anaconda, PyCharm, Atom
Visualization Tool: Power BI(Basic)

Websites

Certification

Microsoft Certified Azure Fundamentals
Microsoft Certified Azure Data Engineer
Microsoft Certified AI Fundamental
Master Program Certification for Data Science
IBM – Python for Data Science
Coursera: Machine Learning Specialization

Languages

English
Hindi
Bengali

Personal Information

Date of Birth: 03/14/91
Nationality: Indian

Timeline

Cloud Data Engineer

Accenture Applied Intelligence

03.2018 - Current

Product Lead

Accenture

03.2018 - Current

Lead Big Data Engineer

American multinational enterprise information technology company

03.2018 - Current

AWS Data Engineer

Publicly-owned corporation of the Australian Government

03.2018 - Current

Big Data Engineer

Indian nationalized banking and financial services company

03.2018 - Current

Big Data Engineer

Mashreq Global Services

10.2016 - 03.2018

System Engineer

Tata Consultancy Services

10.2012 - 10.2016

B.Tech. - ECE

Asansol Engineering College

Sourav Banerjee

Summary

Overview

Work History

Cloud Data Engineer

Product Lead

Lead Big Data Engineer

AWS Data Engineer

Big Data Engineer

Big Data Engineer

System Engineer

Education

B.Tech. - ECE

Skills

Websites

Certification

Languages

Personal Information

Timeline

Cloud Data Engineer

Product Lead

Lead Big Data Engineer

AWS Data Engineer

Big Data Engineer

Big Data Engineer

System Engineer

B.Tech. - ECE

Similar Profiles

Deepak satamDeepak satam

Nirlep KathiriyaNirlep Kathiriya

Praveen RPraveen R

Harish KandhariHarish Kandhari