Summary
Overview
Work History
Education
Skills
Websites
Certification
Languages
Personal Information
Timeline
Generic

Sourav Banerjee

Bengaluru

Summary

I am Azure Big Data Engineer with strong background in Big Data Analytics, ETL, Data Analytics and Azure Cloud Platform. I have Proven track record of 8+ years in diverse facets of software development, Design, and execution of business applications. I have extensive experience on Big Processing pipeline Design, Data Lake Design, Cloud Migration, Data Quality and Data Management. I have working knowledge on Data Mining, Machine Learning and Data Science.

Overview

13
13
years of professional experience
1
1
Certification

Work History

Cloud Data Engineer

Accenture Applied Intelligence
03.2018 - Current
  • Working as Tech lead managing a team of 10 Data Engineer.
  • Creating Azure Data Factory Pipeline for ingestion and processing of data.
  • Contributed in gathering requirements from the client.
  • Pre-Processing and processing of Data using Azure Databricks using PySpark and Python pandas.
  • Insert Processed Data using Azure Databricks and Azure Data Factory into Azure SQL DW.
  • Work on Synapse for data loading and SPARK pool of Synapse Analytics.
  • Create views in the Azure SQL DW which are used in Power BI.
  • Worked in POC for different Analytics Use case for Machine learning and other.
  • Worked in Creating Power BI chart for end users.
  • Exploring Data sets which can we used for Analytics.

Product Lead

Accenture
03.2018 - Current
  • Company Overview: Patented ETL and Machine Learning Automation Product of Accenture (Internal Project)
  • Work as a Lead Data Engineer who will be designing and implementing end to end data solutions to solve complex business problems.
  • As technology consultant provide valuable data insights to customer and help them to improve their business effectiveness.
  • Involve in customer interactions to understand their business problems and provide best in class data solutions.
  • Perform data analysis, design ETL architecture, data modeling and implementing robust data pipelines.
  • Use Azure Machine Learning model for real time and batch inference.
  • Implement Azure Cognitive Service for different AI solution.
  • Work on Synapse for data loading and SPARK pool of Synapse Analytics.
  • Create PySpark script to automate the ETL process.
  • Create Stored Proc, SQL function in Azure SQL.
  • Patented ETL and Machine Learning Automation Product of Accenture (Internal Project)

Lead Big Data Engineer

American multinational enterprise information technology company
03.2018 - Current
  • Working as lead big data engineer working in Hortonworks platform.
  • Creation of MongoDB collection and ingesting data in Collection.
  • As part of BA team create Design Work Flow (HLD/LLD).

AWS Data Engineer

Publicly-owned corporation of the Australian Government
03.2018 - Current
  • Working as lead big data engineer working in Hortonworks platform.
  • Creation of MongoDB collection and ingesting data in Collection.
  • As part of BA team create Design Work Flow (HLD/LLD).
  • Extract the data from Oracle Golden Gate and store in Kafka Topic.
  • Using Datastax Cassandra get the data from Kafka and apply Spark SQL Transformation in the data and make the data ready for Tableau Visualization.
  • For Phase 2 read the data from AWS S3 bucket and have done spark transformation on the file.
  • Configure EMR cluster to deploy the Spark Job and deploy the AWS Lambda function for scheduling job in AWS.
  • Ingestion of the data in AWS Athena for further filtering and Analysis.
  • Create Jenkins Pipeline for deployment of the code.

Big Data Engineer

Indian nationalized banking and financial services company
03.2018 - Current
  • Working as lead big data engineer working in Hortonworks platform.
  • As a Data Engineer we must build Kafka Streaming Pipeline using Cassandra that would take data from Oracle Database and feed it to Cassandra.
  • Have developed Kafka Streaming to Cassandra and the PySpark and Scala-Spark job to transform the data as per the required format.
  • Create Staging Layer and Raw Layer as per the ETL Process.
  • Create Big Data Environment for the Process.
  • Create Encryption Logic for Customer Data using hive UDF(JAVA).

Big Data Engineer

Mashreq Global Services
10.2016 - 03.2018
  • Developing Big Data Platform using Stg -> Trans -> Hub Layered Architecture on Supply Chain.
  • Working on: Different module of Supply Chain and creating Spark Jobs & Hive Scripts for Inventory, Purchase Order, Work in Process Modules, and so on.
  • Automation of Hive Scripts using PySpark.
  • Oozie Workflows & Oozie Coordinators.
  • Survey Data Analysis by using Spark, Hive HR Domain.
  • Replacement of Kinaxis and implementation of Oracle Cloud Planning.
  • Migration of Race and implementation of Oracle Cloud Planning in Big Data Platform.
  • Data grooming in spark.
  • Creating Data Repository which will help in Machine Learning and Reporting Purpose.
  • Acted as a member of a team working in automating the end-to-end process of data lifecycle across the layered architecture.
  • Using Spark & Hive for transformation and working on Hortonworks Data Platform for the same.
  • Collaborating with Business Users for the Requirement Gathering.
  • Preparing model for the classification problem, data gathering, cleaning, validation, quality, exploratory data analysis, missing value and outlier treatment.
  • Performing data pre-processing using NLTK (Natural Language Toolkit).
  • Contributing in requirement analysis and solution discussion.

System Engineer

Tata Consultancy Services
10.2012 - 10.2016
  • Analyzed log data from different sockets & files and provided insights about the risk associated with the user.
  • Developed Hive scripts for end user / analyst requirements to perform ad hoc analysis.
  • Solved performance issues in Hive with understanding of Joins, Group and aggregation and how does it translate to Map Reduce jobs.
  • Developed simple to complex Spark jobs using Scala; contributed in the requirement and analysis phase.
  • Managed the importing of data from various data sources; performed transformations using Spark, Hive & Map Reduce.
  • Engaged in collecting the data from different data sources using SQOOP.
  • Migrated Map Reduce jobs to Spark.
  • Wrote Hive queries to analyses data in Hive Warehouse using Hive Query Language (HQL).
  • Developed Hive and Spark SQL for the business logic.
  • Transformed structured data using Dataframe and HiveQL.

Education

B.Tech. - ECE

Asansol Engineering College
Asansol, West Bengal, India
01.2012

Skills

  • Big Data Analytics and Cloud
  • Project Execution
  • Requirement Gathering
  • Stakeholder Management
  • Customer Engagement
  • Machine Learning
  • Change Requests
  • Relationship Management
  • Reporting & Documentation
  • Liaison & Coordination
  • Big Data Analytics: Apache Hadoop, Apache Oozie, Hive, HDFS, Kafka and Sqoop, Airflow
  • Machine Learning: Linear Regression, Naïve Bayes, Logistic Regression, Decision Trees and KNN, NLP
  • Spark/PySpark: Spark Core, Spark SQL, RDD and Spark ML
  • Azure Cloud: Azure Data Factory, Azure Databricks, Azure Synapse Analytics, Azure Function, Azure Datalake, Azure App Service, Azure Webapp, Azure Cognitive Service, Azure VM
  • Cloud Platforms: AWS and Azure
  • Deep Learning: Computer Vision, Keras
  • NoSQL Database: MongoDB, Cassandra, HBase
  • Data Science: NumPy, Pandas, Seaborn, Matplotlib, Scikit Learn
  • DevOps / Continuous Integration: Jira, Bit Bucket, Jenkins and Confluence, GitHub, Gitlab
  • Languages: Python, Scala, R(Basic)
  • Environment: Linux and Windows 10/7, Guardian
  • Database: Oracle and MySQL, MsSql, Snowflake
  • Developer Tools: Jupyter, Putty, WinSCP, RoboMongo, MongoDB Compass, Beyond Compare, STORQM, NOTEPAD, MS Excel, Ultra Edit, Expeditor, VMWARE
  • IDE: Eclipse, IntelliJ, VS Code, Spyder, Anaconda, PyCharm, Atom
  • Visualization Tool: Power BI(Basic)

Certification

  • Microsoft Certified Azure Fundamentals
  • Microsoft Certified Azure Data Engineer
  • Microsoft Certified AI Fundamental
  • Master Program Certification for Data Science
  • IBM – Python for Data Science
  • Coursera: Machine Learning Specialization

Languages

  • English
  • Hindi
  • Bengali

Personal Information

  • Date of Birth: 03/14/91
  • Nationality: Indian

Timeline

Cloud Data Engineer

Accenture Applied Intelligence
03.2018 - Current

Product Lead

Accenture
03.2018 - Current

Lead Big Data Engineer

American multinational enterprise information technology company
03.2018 - Current

AWS Data Engineer

Publicly-owned corporation of the Australian Government
03.2018 - Current

Big Data Engineer

Indian nationalized banking and financial services company
03.2018 - Current

Big Data Engineer

Mashreq Global Services
10.2016 - 03.2018

System Engineer

Tata Consultancy Services
10.2012 - 10.2016

B.Tech. - ECE

Asansol Engineering College
Sourav Banerjee