Summary
Overview
Work History
Education
Skills
Certification
Languages
Education and Training
Timeline
Generic
Sourav Banerjee

Sourav Banerjee

Bengaluru

Summary

Resident Solution Architect with over 12 years of experience in Big Data Analytics, ETL, and Cloud Data Engineering. Proven track record in software development, enterprise application design, and implementation, with a strong emphasis on building scalable Big Data pipelines and Data Lake architectures.

Demonstrated expertise in Cloud Migration, Data Quality, and Data Management, with deep technical proficiency in technologies such as Scala, Java, Apache Spark, AWS, Azure, GCP, and Databricks.

Key contributor to Overwatch, a Databricks observability solution that generated $2.2M in annual recurring revenue (ARR), while driving strategic partnerships to enhance service offerings and accelerate growth.

Skilled in developing Agentic Applications using leading Agentic AI frameworks to build intelligent, autonomous solutions.

Overview

12
12
years of professional experience
1
1
Certification

Work History

Resident Solution Architect

Leading Data and AI Company
Bengaluru
01.2025 - Current
  • Planned various product configurations to meet diverse customer needs.
  • Defined strategies for migrating existing workloads into the cloud environment.
  • Documented account activities and generated sales reports.
  • Resolved complex system issues related to multi-cloud deployments in production environments.
  • Optimized costs by right-sizing resources based on current usage trends in the cloud environment.
  • Collaborated with sales team to understand customer requirements, boost product sales and provide sales support.
  • Assisted customers with troubleshooting issues related to their applications running on public clouds.
  • Assisted account executives with prospect evaluation and qualification.
  • Researched and implemented new technology platforms to support development initiatives.
  • Developed cloud-based solutions using AWS, Azure, and Google Cloud Platform technologies.
  • Worked with cross-functional teams to achieve goals.
  • Utilized advanced technical skills and expertise to troubleshoot complex problems and implement solutions.

Senior Solution Consultant

Leading Data and AI Company
Bengaluru
08.2022 - 12.2024
  • Lead open-source contributor for Overwatch, Databricks' own Observability Tool, with $2.2M ARR.
  • Performed advanced analytics on structured and unstructured data using SQL, Python, R.
  • Optimized existing queries to improve query performance, and reduce the load on the server.
  • Deployed machine learning models to the production environment for real-time predictions.
  • Built complex reports utilizing multiple sources of information from different systems.
  • Designed, built, and maintained high-performance databases for reporting and analysis purposes.
  • Created ETL scripts to move and transform data from various sources into a centralized repository.
  • Implemented new database technologies, such as NoSQL databases, to store large volumes of data efficiently.
  • Automated manual processes by creating custom scripts and programs using scripting languages like Bash, PowerShell.
  • Cleaned and manipulated raw data.
  • Exceeded customer satisfaction by finding creative solutions to problems.
  • Collaborated closely with team members to achieve project objectives and meet deadlines.
  • Developed tools and applications for monitoring data quality and integrity across all applications and databases.

Senior Big Data Engineer

World Second Largest Logistic Company
Bengaluru
03.2022 - 07.2022
  • Provided technical support to business users on using Big Data tools for their analytical needs.
  • Integrated existing systems with new platforms such as AWS S3 or Azure Blob Storage.
  • Developed new functions and applications to conduct analyses.
  • Cleaned and manipulated raw data.
  • Developed ETL jobs to extract, transform, and load data from various sources into the target system.
  • Recommended data analysis tools to address business issues.

Big Data and Cloud Engineer

Global multinational professional services company
Bengaluru
04.2018 - 03.2022
  • Developed Kafka Streaming Pipeline for data ingestion from Oracle Database to Cassandra.
  • Engineered PySpark and Scala-Spark jobs for transforming data into required formats.
  • Configured EMR cluster and AWS Lambda function for job scheduling and deployment.
  • Created Azure Data Factory Pipeline for efficient data ingestion and processing.
  • Led cross-functional teams in systems integration projects to enhance collaboration.
  • Executed debugging and automation scripts using Python to improve efficiency.
  • Created MongoDB collections for effective data management and ingestion.
  • Communicated with clients to gather requirements for accurate implementation.

Senior Software Engineer

Leading Banking Institutions in the United Arab Emirates
Bengaluru
10.2016 - 03.2018
  • Developed and maintained scalable software applications for various platforms.
  • Optimized existing software systems for improved performance and scalability.
  • Developing Big Data Platform using Stg -> Trans -> Hub Layered Architecture on Supply Chain
  • Documented software designs and architecture for future reference and maintenance.
  • Implemented robust code in multiple programming languages, including C++ and Python.
  • Coordinated with quality assurance teams to ensure software met all testing criteria.
  • Creating a data repository that will help in machine learning and reporting purposes.
  • Acted as a team member, working on automating the end-to-end process of the data lifecycle across the layered architecture.
  • Preparing the model for the classification problem, including data gathering, cleaning, validation, quality checks, exploratory data analysis, and missing value and outlier treatment.
  • Performing data pre-processing using NLTK (Natural Language Toolkit).

· Contributing in requirement analysis and solution discussion

System Engineer

Leading Multinational Techology Company
Mumbai
10.2013 - 09.2016

· Analyzed log data from different sockets & files and provided insights about the risk associated with the user

· Developed Hive scripts for end user / analyst requirements to perform ad hoc analysis

· Solved performance issues in Hive with understanding of Joins, Group and aggregation and how does it translate to Map Reduce jobs.

· Developed simple to complex Spark jobs using Scala; contributed in the requirement and analysis phase

· Managed the importing of data from various data sources; performed transformations using Spark, Hive & Map Reduce

· Engaged in collecting the data from different data sources using SQOOP

· Migrated Map Reduce jobs to Spark

· Wrote Hive queries to analyses data in Hive Warehouse using Hive Query Language (HQL)

· Developed Hive and Spark SQL for the business logic

· Transformed structured data using Dataframe and HiveQL

Education

B.Tech - Electronics And Communication

Asansol Engineering College
06-2012

Skills

  • Big data analytics: Hadoop, Hive, Kafka, and Spark/PySpark
  • Machine learning: supervised, unsupervised, time series forecasting
  • Deep learning: ANN, CNN, and RNN
  • Cloud platforms: Azure, Databricks, and Snowflake
  • NoSQL databases: MongoDB, Cassandra
  • Data science tools: NumPy, Pandas, Scikit-learn, PyTorch
  • DevOps tools: Jira, Jenkins, GitHub, Docker
  • Programming languages: Python and Scala
  • Orchestrator Tool: Airflow, Azure Data Factory
  • Visualization Tool: Power BI
  • ETL Tool: DBT
  • Generative AI: Prompt Engineering, LangChain, RAG
  • Agentic Framework Tool: LangGraph, AutoGen, CrewAI
  • MLOps: MLFlow

Certification

  • Microsoft Certified Azure Fundamentals.
  • Microsoft Certified Azure Data Engineer.
  • Microsoft Certified AI Fundamental
  • Databricks Certified Associate Developer for Apache Spark (Python)
  • Databricks Certified Data Engineer Professional
  • Databricks Certified Machine Learning Professional
  • Databricks Certified SQL Analyst Professional
  • Databricks Certified Generative AI Engineer Associate
  • Databricks Certified Machine Learning Associate

Languages

Bengali
First Language
English
Proficient (C2)
C2
Hindi
Advanced (C1)
C1

Education and Training

other

Timeline

Resident Solution Architect

Leading Data and AI Company
01.2025 - Current

Senior Solution Consultant

Leading Data and AI Company
08.2022 - 12.2024

Senior Big Data Engineer

World Second Largest Logistic Company
03.2022 - 07.2022

Big Data and Cloud Engineer

Global multinational professional services company
04.2018 - 03.2022

Senior Software Engineer

Leading Banking Institutions in the United Arab Emirates
10.2016 - 03.2018

System Engineer

Leading Multinational Techology Company
10.2013 - 09.2016

B.Tech - Electronics And Communication

Asansol Engineering College
Sourav Banerjee