Experienced Big Data Solution Architect with over 13 years in the IT industry, including 7+ years specializing in designing and implementing scalable big data architectures. Proven expertise in leveraging modern technologies such as Hadoop, Spark, Kafka, and cloud platforms (AWS, Azure) to build robust, high-performance data solutions. Skilled in translating complex business requirements into actionable data strategies, with a strong focus on data integration, migration, transformation, and advanced data modeling.
Demonstrated ability to lead cross-functional teams, optimize data pipelines, and implement best practices in data engineering and analytics. Adept at architecting cost-effective, future-proof systems that streamline operations, enhance decision-making, and ensure data security. Experienced in data lakehouse platforms, system modernization, and performance tuning. Proficient in machine learning and deep learning concepts, with a solid foundation in data mining, warehousing, and analytics. Recognized for strong communication, leadership, and project management capabilities.
Overview
13
13
years of professional experience
3
3
Certification
Work History
Big Data Architect
Ernst & Young - EY GDS
10.2024 - Current
Led an end-to-end data modernization initiative for a leading insurance company, focusing on report rationalization by identifying key business metrics and attributes across multiple reporting functions.
Conducted comprehensive report usage analysis to identify redundancies and obsolete reports, resulting in the consolidation of reporting assets and the standardization of KPIs and business metrics.
Collaborated closely with business stakeholders and subject matter experts to capture reporting requirements, streamline insights delivery, and define robust, scalable data structures.
Designed and developed Conceptual, Logical, and Physical Data Models aligned with client business requirements, ensuring scalability, consistency, and optimized performance for analytics and reporting.
Automated the report rationalization process using Generative AI by mapping the data lineage of each metric and attribute, enabling faster consolidation while ensuring strict adherence to data privacy and compliance standards.
Architected a cloud-based data lakehouse platform by meticulously gathering client requirements, ensuring alignment with enterprise standards, and fulfilling reporting objectives.
Led and developed the Data Engineering team in designing and developing a comprehensive, generic, and reusable Microsoft Fabric PySpark microservice framework, optimized for seamless deployment across multiple cloud platforms and scalable to support future enhancements. This framework encompasses data cleansing, profiling, quality validation, processing, transformation, reconciliation, orchestration, and CI/CD integration using Git Actions.
Designed and architected a self service semantic data model using powerBI Direct Lake & Direct Query mode to enhance data accessibility and consistency across systems.
Big data solution architect
Labcorp Drug Development India Private Ltd
12.2020 - 10.2024
Handled diverse data formats—including structured and semi-structured data such as JSON, XML, CSV, flat files, batch streams, and API streams—using Apache Spark for efficient processing.
Designed and implemented end-to-end big data solutions across multiple projects by leveraging Databricks (PySpark), Snowflake (SnowPark), Spark, and Kafka, achieving a 30% improvement in data processing efficiency.
Collaborated with cross-functional teams to define data architecture strategies aligned with business objectives, enhancing decision-making capabilities.
Architected scalable data pipelines on cloud platforms such as AWS and Azure, ensuring seamless data ingestion, transformation, and storage.
Developed a reusable and robust PySpark framework composed of Python classes and functions for data ingestion, processing, transformation, and loading. The framework incorporates an Audit Balance & Control (ABC) mechanism to validate data loads and automatically send reconciliation reports via O365 Graph API upon completion.
Led a team of Python and PySpark developers responsible for designing and building data pipelines.
Specialized in performance tuning by leveraging Spark and Databricks features such as clustering and partitioning to optimize processing efficiency.
Designed dimensional data models based on business requirements, including fact and dimension tables, and managed various dimension types such as slowly changing, late-arriving, and role-playing dimensions.
Managed data operations by configuring Databricks workflows, orchestrating job executions, and creating CI/CD pipelines for streamlined code migration.
Senior Big Data Engineer Using PySpark & AWS EMR
Deloitte US India
07.2017 - 12.2020
Created data pipelines for various clients that sends data in structured and semi structured.
Handled clinical trial JSON data from patient's IoT devices by flattening and loaded into the relational table using Spark Scala.
Developed schema evolution framework using Parquet file format, where data Spark stores data in Parquet format and Hive table created on top of it for OLAP (Online analytical processing).
Created data pipeline to store raw data into Snowflake using Python connector.
Work with business to on-board new client for clinical trials by understanding the data transfer agreement.
Developed spark code and scheduled it using databricks workflow.
Maintained and monitored Spark clusters on AWS EMR, ensuring high availability and fault tolerance.
Created framework using Python and Graph API to read the data from mail and store it in RDBMS.
Managing team of 4 with different skill to support business requirements.
• Streamlined report rationalization using Generative AI by automating the mapping of data lineage for each metric and attribute—accelerating report consolidation and significantly reducing manual effort of 350+ FTE hours.
• Designed and implemented a scalable data lakehouse platform using Databricks and Snowflake for a leading clinical research organization, enabling the seamless onboarding of 100+ clinical trials for global pharmaceutical clients—delivering over 1,000 FTE hours in effort savings and accelerating time-to-insight.
Senior Lead Consultant –Automation, Monitoring and Self-Healing at Allstate IndiaSenior Lead Consultant –Automation, Monitoring and Self-Healing at Allstate India