Summary

Overview

Work History

Education

Skills

Certification

Accomplishments

Timeline

Vignesh Janarthanan

Bangalore

Summary

Experienced Big Data Solution Architect with over 13 years in the IT industry, including 7+ years specializing in designing and implementing scalable big data architectures. Proven expertise in leveraging modern technologies such as Hadoop, Spark, Kafka, and cloud platforms (AWS, Azure) to build robust, high-performance data solutions. Skilled in translating complex business requirements into actionable data strategies, with a strong focus on data integration, migration, transformation, and advanced data modeling.
Demonstrated ability to lead cross-functional teams, optimize data pipelines, and implement best practices in data engineering and analytics. Adept at architecting cost-effective, future-proof systems that streamline operations, enhance decision-making, and ensure data security. Experienced in data lakehouse platforms, system modernization, and performance tuning. Proficient in machine learning and deep learning concepts, with a solid foundation in data mining, warehousing, and analytics. Recognized for strong communication, leadership, and project management capabilities.

Overview

years of professional experience

Certification

Work History

Big Data Architect

Ernst & Young - EY GDS

10.2024 - Current

Led an end-to-end data modernization initiative for a leading insurance company, focusing on report rationalization by identifying key business metrics and attributes across multiple reporting functions.
Conducted comprehensive report usage analysis to identify redundancies and obsolete reports, resulting in the consolidation of reporting assets and the standardization of KPIs and business metrics.
Collaborated closely with business stakeholders and subject matter experts to capture reporting requirements, streamline insights delivery, and define robust, scalable data structures.
Designed and developed Conceptual, Logical, and Physical Data Models aligned with client business requirements, ensuring scalability, consistency, and optimized performance for analytics and reporting.
Automated the report rationalization process using Generative AI by mapping the data lineage of each metric and attribute, enabling faster consolidation while ensuring strict adherence to data privacy and compliance standards.
Architected a cloud-based data lakehouse platform by meticulously gathering client requirements, ensuring alignment with enterprise standards, and fulfilling reporting objectives.
Led and developed the Data Engineering team in designing and developing a comprehensive, generic, and reusable Microsoft Fabric PySpark microservice framework, optimized for seamless deployment across multiple cloud platforms and scalable to support future enhancements. This framework encompasses data cleansing, profiling, quality validation, processing, transformation, reconciliation, orchestration, and CI/CD integration using Git Actions.
Designed and architected a self service semantic data model using powerBI Direct Lake & Direct Query mode to enhance data accessibility and consistency across systems.

Big data solution architect

Labcorp Drug Development India Private Ltd

12.2020 - 10.2024

Handled diverse data formats—including structured and semi-structured data such as JSON, XML, CSV, flat files, batch streams, and API streams—using Apache Spark for efficient processing.
Designed and implemented end-to-end big data solutions across multiple projects by leveraging Databricks (PySpark), Snowflake (SnowPark), Spark, and Kafka, achieving a 30% improvement in data processing efficiency.
Collaborated with cross-functional teams to define data architecture strategies aligned with business objectives, enhancing decision-making capabilities.
Architected scalable data pipelines on cloud platforms such as AWS and Azure, ensuring seamless data ingestion, transformation, and storage.
Developed a reusable and robust PySpark framework composed of Python classes and functions for data ingestion, processing, transformation, and loading. The framework incorporates an Audit Balance & Control (ABC) mechanism to validate data loads and automatically send reconciliation reports via O365 Graph API upon completion.
Led a team of Python and PySpark developers responsible for designing and building data pipelines.
Specialized in performance tuning by leveraging Spark and Databricks features such as clustering and partitioning to optimize processing efficiency.
Designed dimensional data models based on business requirements, including fact and dimension tables, and managed various dimension types such as slowly changing, late-arriving, and role-playing dimensions.
Managed data operations by configuring Databricks workflows, orchestrating job executions, and creating CI/CD pipelines for streamlined code migration.

Senior Big Data Engineer Using PySpark & AWS EMR

Deloitte US India

07.2017 - 12.2020

Created data pipelines for various clients that sends data in structured and semi structured.
Handled clinical trial JSON data from patient's IoT devices by flattening and loaded into the relational table using Spark Scala.
Developed schema evolution framework using Parquet file format, where data Spark stores data in Parquet format and Hive table created on top of it for OLAP (Online analytical processing).
Created data pipeline to store raw data into Snowflake using Python connector.
Work with business to on-board new client for clinical trials by understanding the data transfer agreement.
Developed spark code and scheduled it using databricks workflow.
Maintained and monitored Spark clusters on AWS EMR, ensuring high availability and fault tolerance.
Created framework using Python and Graph API to read the data from mail and store it in RDBMS.
Managing team of 4 with different skill to support business requirements.
Technologies: Apache Spark, Python, Pandas, AWS EMR & PySpark

Senior Python Developer

Oracle

11.2016 - 07.2017

Company Overview: in-house project
Developing Selenium and Open CV Python scripts to automate the testing process.
Developed multiple scripts to monitor the technical infrastructure health like, disk space, RAM and concurrent users.
Developed scripts to perform the automated deployment process.
Priority ticket handing.
Developed script to created Service Now tickets using the Python Flask API and Python O365 Graph API.
In-house project
Technologies: Python Pandas, Python Selenium, Python O365 Graph API, SQL, and Siebel CRM

Senior developer

Accenture

10.2012 - 11.2016

Performed Import and Export of data into Siebel CRM using shell scripting and Siebel EIM process.
Created Oracle tables to load the stage data, perform transformation and execute EIM job to load the data to base tables.
Incident management and production support of Siebel system related to data processing.
Handled data processing team by understanding the requirements from client.
Involved in Writing DDL and DML scripts to transform Data and populate in Target table.
Involved in applying transformation with SQL based on business Logic in the Mapping sheet.
Technologies: Siebel CRM, SQL, Python pandas, Linux, Windows, Batch Scripting and Shell Scripting

Education

SRM University

Chennai, India

04-2012

Skills

Databricks
Snowflake
PySpark
Microsoft Fabric
Data Modelling
Python Programming
AWS data services
Kafka
Glue

SQL
Scala
ETL development
Data migration
Data modeling
Data warehousing
Big data analytics
Real-time processing
Data pipeline design

Certification

GitHub Co-pilot Certified
Spark, Big data Certified

Accomplishments

• Streamlined report rationalization using Generative AI by automating the mapping of data lineage for each metric and attribute—accelerating report consolidation and significantly reducing manual effort of 350+ FTE hours.

• Designed and implemented a scalable data lakehouse platform using Databricks and Snowflake for a leading clinical research organization, enabling the seamless onboarding of 100+ clinical trials for global pharmaceutical clients—delivering over 1,000 FTE hours in effort savings and accelerating time-to-insight.

Timeline

Big Data Architect

Ernst & Young - EY GDS

10.2024 - Current

Big data solution architect

Labcorp Drug Development India Private Ltd

12.2020 - 10.2024

Senior Big Data Engineer Using PySpark & AWS EMR

Deloitte US India

07.2017 - 12.2020

Senior Python Developer

Oracle

11.2016 - 07.2017

Senior developer

Accenture

10.2012 - 11.2016

SRM University

Vignesh Janarthanan

Summary

Overview

Work History

Big Data Architect

Big data solution architect

Senior Big Data Engineer Using PySpark & AWS EMR

Senior Python Developer

Senior developer

Education

Skills

Certification

Accomplishments

Timeline

Big Data Architect

Big data solution architect

Senior Big Data Engineer Using PySpark & AWS EMR

Senior Python Developer

Senior developer

Similar Profiles

Ranjith PRanjith P

SHIREEN SYEDSHIREEN SYED

Saptarshi RoySaptarshi Roy

Muneer MansooriMuneer Mansoori

UMESHA NUMESHA N