Summary
Overview
Work History
Education
Skills
Certification
Websites
Timeline
Generic

Apratim Paul

Kolkata

Summary

  • Seasoned Data Architect and data engineering professional with over 17 years of experience in architecting and delivering scalable, end-to-end data solutions.
  • Presently contributing to Intuitive Technology Partner, specializing in Data Architecture Design, Data Warehousing, and Big Data Solutions on AWS and Azure platforms.
  • Adept at designing data models using relational, dimensional, and NoSQL techniques, with a proven track record of leading teams, managing project delivery, and mitigating risks.
  • Expert in developing robust, scalable data pipelines, leveraging distributed processing engines like Spark and PySpark, across multi-cloud and on-premises environments.
  • Strong understanding of large-scale data architectures, including data warehouses, data lakes, and advanced analytics platforms.
  • Proficient in data quality, governance, metadata management, data lineage, and Data Vault 2.0 methodologies.
  • Demonstrated experience in driving RFPs, POCs, internal initiatives, and mentoring junior colleagues.
  • Well-versed in modern software engineering best practices, including agile methodologies, code reviews, version control, and testing, ensuring high-quality deliverables throughout the development lifecycle.
  • A strategic thinker with hands-on expertise in AWS and Azure-based cloud data projects, and distributed systems for data storage and processing.

Overview

12
12
years of professional experience
1
1
Certification

Work History

Senior Cloud Data Architect

Intuitive Technology Partners
Kolkata
09.2024 - Current

Project Summary:

The project mainly focuses on building the Assisted Data Engineering (ADE), which is a framework designed to streamline and automate the data transformation process. It empowers users to define the transformation job steps using predefined blocks. These blocks serve as modular components that encapsulate common data manipulation tasks, such as filtering, aggregating, and joining datasets. By utilizing these blocks, users can construct complex data workflows without needing extensive coding knowledge.

Roles:

Currently serving as an individual contributor Senior Cloud Data Architect, with a strong emphasis on hands-on development, coding, and DevOps within AWS environments. Collaborating closely with AWS U.S. teams (AWS ProServ Engagement), I contribute to architecting and engineering scalable, secure, and high-performing data solutions.

Key responsibilities include:

  • Ensuring adherence to data regulatory standards and security controls during data architecture design.
  • Actively involved in requirement gathering, translating business needs into detailed technical specifications.
  • Preparing comprehensive High-Level Design (HLD) and Low-Level Design (LLD) documents.
  • Leading solution presentations, demos, and coordination efforts with various client stakeholders.
  • Supporting project management activities, including effort estimation, and assisting Scrum Masters in sprint planning.
  • Designing scalable data platforms utilizing AWS data services based on specific technical requirements.
  • Developing end-to-end event-driven and batch data ingestion pipelines using AWS Glue, S3, Athena, DynamoDB, Step Functions, PySpark, and Python.
  • Creating CloudFormation templates and integrating them into CI/CD pipelines using Jenkins.
  • Performing Root Cause Analysis (RCA) on performance issues and providing timely resolutions while balancing considerations of security, cost-efficiency, and performance.
  • Documenting best practices, architectural decisions, and development techniques, and maintaining them in Confluence for organizational knowledge sharing.

Assistant Vice President (Cloud Data)

Barclays Global Service Center Private Limited
Pune
04.2024 - 08.2024

Project Summary:

The project TTRT Self-Service Engine revolves around the development of a Self-Service Engine, which serves as an intuitive user interface designed for business users to execute queries using customizable filters. The engine empowers users to easily launch queries without needing to write complex SQL, streamlining their ability to gather data insights.

At the core of the system, a Python-based query engine dynamically forms the actual queries based on the user inputs. These queries are then sent to AWS Athena for execution. The results are subsequently fetched back to an on-premises UI server for display to the user.

In addition to the self-service functionality, the project includes the creation of a Business-As-Usual (BAU) pipeline, built using AWS Glue and PySpark. This pipeline is responsible for establishing a data lake in Amazon S3, curating the data, and preparing it for reporting and ad hoc analysis. The curated data is made available to the data science team for further processing and insights generation.

The overall solution ensures both efficient, user-friendly query execution for business users, and a robust, scalable infrastructure to support advanced analytics and reporting. The combination of AWS services and Python-driven query generation allows the project to deliver a powerful, high-performance data processing, and reporting solution.

Role:

I played the role of an individual contributor as an AWS data architect, as well as leading the delivery team at Barclays.

  • Designed and implemented RESTful APIs using AWS API Gateway to securely expose and manage backend services, ensuring smooth communication between various components of the system.
  • Developed Python-based solutions for dynamic query generation and data processing, enhancing automation and flexibility in query execution for self-service applications.
  • Support the ad hoc request on regulation inquiries and audits from different U.S. and U.K. regulators.
  • Utilized SQL to write efficient queries for data extraction, transformation, and analysis, enabling effective reporting and ad hoc analysis on large datasets.
  • Managed and maintained the AWS Glue Catalog to organize and structure data, facilitating efficient data discovery, schema management, and integration across the data lake.
  • Leveraged Amazon Athena for serverless query execution on large datasets stored in Amazon S3, providing fast, cost-effective analytics without the need for infrastructure management.
  • Developed a self-service query engine that allowed business users to interact with data via an intuitive user interface, enabling them to launch complex queries based on customizable filters without the need for technical expertise.
  • Built and optimized ETL workflows using AWS Glue and Python, ensuring data is curated, transformed, and available for reporting and analysis in real time.
  • Optimized the execution of queries in Athena and improved overall system performance by fine-tuning the data pipeline and query generation process, ensuring quick results for end-users.
  • Worked closely with business and technical teams to gather requirements and deliver a seamless user experience for querying large datasets, supporting both business intelligence and data science initiatives.

Advisory Application Architect

IBM India Pvt Ltd
Kolkata
09.2022 - 04.2024

Project Summary:

The OneReg project aims to modernize an on-premises Rainstor application archival data store by migrating it to AWS, with a focus on processing large-scale data for reporting and predictive analytics. The primary objectives include transforming the data ingestion process using PySpark, Python, and AWS Glue, creating a centralized data lake on Amazon S3, and leveraging AWS Lake Formation for data governance. Additionally, the project utilizes Amazon Athena for ad hoc query execution, and Amazon Redshift as the data warehouse to handle large datasets efficiently.

For reporting purposes, Tableau will be integrated as the reporting layer, providing end users with interactive dashboards and insights. The project also includes training machine learning models for predictive analytics, ensuring that data processing, storage, and analytics capabilities are optimized in the cloud environment.

This modernization will enable the organization to harness the power of AWS services for scalable, efficient, and secure data management and analytics, ultimately empowering data-driven decision-making and advanced predictive insights.

Role:

  • Led the end-to-end data architectural review for new engagements within the IBM Data Analytics account, ensuring compliance with technical and business requirements.
  • Designed scalable and secure data architectures, with a focus on regulatory controls and security best practices.
  • Contributed to RFPs and POCs involving AWS and Azure data services to evaluate and propose suitable solutions.
  • Actively involved in business requirement gathering, and translated these into detailed technical requirements for implementation.
  • Managed sprint planning, estimation, and delivery to ensure alignment with project timelines and goals.
  • Developed end-to-end data ingestion pipelines utilizing AWS services such as Glue, S3, Athena, Redshift, and Step Functions, with PySpark and Python.
  • Coordinated with the Data Science team to support the training and deployment of machine learning models for predictive analytics.
  • Built CloudFormation templates and collaborated with the CI/CD team for deployment to higher environments.
  • Conducted extensive research to streamline user experience, and enhance platform performance.
  • Performed a detailed cost analysis before provisioning services to ensure cost efficiency across the platform.
  • Led Root Cause Analysis (RCA) on performance issues, providing resolutions that balanced security, cost-efficiency, and performance optimization.
  • Documented best practices, implementation strategies, and technical techniques on Confluence, ensuring efficient knowledge sharing, and team collaboration.

Data Architect

Tata Consultancy Services
Kolkata
09.2021 - 09.2022

Project Summary:

The project focused on designing and implementing a centralized Data Lake solution on the Azure Cloud Platform for multiple business verticals of Johnson & Johnson (JnJ), including Medical Devices (MD), Pharmaceuticals, Supply Chain, and Consumer divisions.

Key objectives included:

  • Ingesting and curating data from various verticals into a unified, scalable Azure-based Data Lake.
  • Establishing a centralized Data & Analytics platform to streamline data management and access across the organization.
  • Ensuring data standardization, quality, and governance to support accurate analytics and reporting.
  • Forming a Centre of Excellence (CoE) to drive best practices, promote efficient data handling, and foster collaboration across different JnJ units.

The project enabled JnJ to leverage a robust data foundation, ensuring faster insights, improved decision-making, and optimized business processes across all verticals.

Role:

I played the role of data architect with hands-on expertise in the Azure platform. I led over 16 data engineers in this JnJ project.

  • Architected and delivered scalable, efficient, and secure data solutions on the Azure cloud platform, focusing on building centralized data lakes, data warehouses, and Delta Lake architectures.
  • Led a team of 16+ data engineers, overseeing project delivery, task allocation, and performance mentoring.
  • Designed and implemented end-to-end data ingestion pipelines using Azure Data Factory (ADF), Azure Databricks (PySpark, Python), and integrated data models optimized for both performance and cost.
  • Designed and implemented a centralized Data Lake and Delta Lake architecture on Azure Cloud to consolidate data across multiple JnJ business units.
  • Managed the provisioning and maintenance of Azure Synapse Analytics and Snowflake environments, ensuring high availability and compliance with data governance standards.
  • Collaborated closely with stakeholders to translate business requirements into technical solutions, prepared High-Level and Low-Level Designs (HLD/LLD), and actively participated in RFPs and POCs.
  • Led efforts in Azure DevOps to automate deployments, integrate CI/CD pipelines, and ensure adherence to software engineering best practices.
  • Conducted root cause analysis (RCA) on performance bottlenecks, implemented cost-saving measures, and documented solutions and best practices on Confluence.
  • Coordinated daily stand-ups and sprint planning sessions, ensuring alignment between cross-functional teams, and timely issue resolution.
  • Provided technical leadership and mentorship to junior data engineers, and offshore teams.
  • Conducted performance tuning, cost optimization, and troubleshooting of data pipelines and cloud infrastructure.

Application Development Specialists

Accenture Solutions Pvt Ltd
Kolkata
11.2014 - 08.2021

Project Summary:

The iHub v1 program was initiated by Vodafone UK to develop a centralized and consolidated database solution aimed at enhancing KPI (Key Performance Indicator) reporting across Vodafone’s diverse local market landscape.

Key objectives of the solution included:

  • Automating the collation of monthly KPIs to significantly reduce manual effort and minimize human errors.
  • Improving data freshness and availability, allowing business performance issues to be quickly identified and addressed.
  • Ensuring accuracy, consistency, and reliability of KPIs reported across different local markets.
  • Facilitating transparent and standardized comparisons of performance metrics across all local markets, ensuring consistent and fair evaluations.

The implementation of iHub v1 empowered Vodafone to drive better business insights, promote transparency, and streamline performance reporting processes across its operations.

Roles:

Played the Senior Data Engineer and Offshore Lead Data Engineer, with strong expertise in Data Warehousing, ETL Development, AWS Cloud Services, and Oracle Data Integration (ODI). Experienced in managing end-to-end data solutions, from requirement analysis to production deployment, across both onshore (London) and offshore (India) environments.

Key achievements and responsibilities include:

  • Led the development and maintenance of data warehouses in Amazon AWS Redshift, and designed scalable data marts to support business intelligence needs.
  • Built and managed complex ETL pipelines using AWS Glue workflows, ingested transactional data from multiple regions, and generated cleansed datasets in Parquet and CSV formats, storing them securely in S3.
  • Constructed and maintained a robust Data Lake architecture, integrated it with Redshift via Spectrum, and developed external tables, late binding views, stored procedures, and UDFs.
  • Designed efficient ETL logic and implemented intricate KPI calculations in PL/SQL, steered the ODI development team, and ensured smooth project execution.
  • Gathered business requirements, crafted detailed data models, and prepared comprehensive technical design documents (HLD and LLD).
  • Created and customized ODI interfaces, packages, and knowledge modules; managed deployments, migrations, and version control through Tortoise SVN and GitLab.
  • Performed thorough unit, integration, and UAT testing; conducted root cause analyses, and resolved complex production issues.
  • Guided team members, shared technical expertise, and collaborated closely with end users and stakeholders during UAT and ongoing support phases.
  • Tracked issue logs, maintained status reports, led daily client calls, and escalated critical issues as needed.
  • Implemented continuous improvements by adhering to AWS architectural best practices, optimizing performance, cost, and security.

Application Developer

IBM India Pvt Ltd
Kolkata
01.2013 - 11.2014

Project Summary:

The LOAD Engine is a comprehensive financial information system built on the Oracle E-Business Suite (OeBS) R12 platform, designed to support three major carriers—CNC, ANL, and DELMAS—along with several agents. The system operates within an Oracle-based environment, comprising four key databases: Engine, OeBS, Reporting, and DEA.

  • OeBS serves as the core accounting system, handling financial transactions and records.
  • The Engine database is dedicated to calculating critical financial metrics known as Estimates, which are essential for management reporting and decision-making.
  • The Reporting and DEA databases support data extraction, analytics, and reporting functionalities to provide insights, and ensure data consistency across various business processes.

The system ensures seamless financial data management, robust accounting operations, and real-time reporting capabilities, enabling efficient financial oversight across multiple carriers and agents.

Roles:

Experienced BI Engineer with a strong focus on ODI (Oracle Data Integrator) development and Load Engine application support, played a dual role in both development and AMS (Application Management Support) activities. Adept at developing and updating ODI scenarios to meet evolving business requirements, while ensuring smooth and efficient support of critical applications.

Key responsibilities include:

  • Handled day-to-day AMS activities, including ticket analysis, issue resolution, gap analysis, and follow-ups to ensure SLA compliance.
  • Actively worked on Request for Fixes (RFF) and Request for Services (RFS) for both Load and Ocean Engine applications, translating business needs into technical solutions.
  • Performed root cause analysis in complex areas of the application and delivered effective resolutions.
  • Conducted thorough testing across Unit Testing, Integration Testing, and UAT environments to ensure solution stability.
  • Mentored and guided team members by sharing functional and technical knowledge to enhance team efficiency.
  • Continuously monitoring ticket backlogs, ensuring adherence to SLAs, and improving overall application performance.
  • Prepared KPI reports, GID metrics, and productivity measurements to support audit and quality assurance activities.

A reliable contributor to both the ODI development team and the AMS team, consistently delivering high-quality solutions, and ensuring client satisfaction.

Education

Master of Computer Applications - Information Technology

Sikkim Manipal University
Kolkata, India
11-2011

Bachelor of Commerce (Major) In Computer Application - Commerce, Information Technology

University of Calcutta
Kolkata, India
07-2007

Skills

Data Architecture

Data Engineering

Data Warehousing and Data Modeling

Redshift

Oracle

Azure Synapse

Snowflake

Spark

PySpark

AWS Glue

Databricks, EMR,

Kinesis

Azure Data Factory (ADF)

Athena

Python,

SQL and PL/SQL

PostgreSQL

GitHub, Bitbucket, Jira

Airflow,

Autosys

AWS CloudFormation

Infrastructure as Code (IaC)

Certification

  • AWS Certified Solutions Architect - Associate
  • AWS Data Analytics Specialists
  • AWS Data Engineer Associate
  • Data Engineering on Microsoft Azure – DP203
  • MIT Certified Data Architect Associate
  • Google Cloud Certified Cloud Architect
  • Snowflake SnowPro Core Certified
  • Oracle Database SQL Certified Expert

Timeline

Senior Cloud Data Architect

Intuitive Technology Partners
09.2024 - Current

Assistant Vice President (Cloud Data)

Barclays Global Service Center Private Limited
04.2024 - 08.2024

Advisory Application Architect

IBM India Pvt Ltd
09.2022 - 04.2024

Data Architect

Tata Consultancy Services
09.2021 - 09.2022

Application Development Specialists

Accenture Solutions Pvt Ltd
11.2014 - 08.2021

Application Developer

IBM India Pvt Ltd
01.2013 - 11.2014

Master of Computer Applications - Information Technology

Sikkim Manipal University

Bachelor of Commerce (Major) In Computer Application - Commerce, Information Technology

University of Calcutta
Apratim Paul