Summary
Overview
Work History
Education
Skills
Languages
Personal Information
Professionalsummary
Timeline
Generic
Sreenath S

Sreenath S

Bangalore

Summary

Dynamic Lead Data Engineer with a proven track record at Hexaware/IQVIA, enhancing big data pipelines and architectures using Spark, Scala, and Hive. Achieved a 3x speed increase in data processing, demonstrating exceptional problem-solving and collaboration skills. Expert in data warehousing and advanced analytics, committed to delivering high-quality, efficient solutions.

Overview

11
11
years of professional experience

Work History

Lead Data Engineer

Hexaware /IQVIA
06.2022 - Current

Client: IQVIA

Project: Phoenix

Project overview: US Phoenix is a project that migrates IQVIA's US core business processes to an open-source technology-based big data platform using technologies such as Apache Spark, Scala, Hive, and SQL.

Roles and Responsibilities:

  • Build and optimize big data pipelines, architectures, and data sets.
  • Enhanced Spark job performance through advanced optimizations, including efficient partitioning, caching, and leveraging Catalyst optimizations, achieving a 3x speed improvement over the previous pipeline, while reducing processing time and resource utilization.
  • Designed and implemented scalable data pipelines on Azure Cloud using Azure Databricks for big data processing, Azure Data Lake Storage (ADLS) for efficient data storage, and Azure Data Factory for seamless orchestration and automation of data workflows.
  • Undertook a Spark Scala project from inception to implementation, based on requirements provided by the statistics (stats) team.
  • Gained hands-on experience with Camunda for workflow management and orchestration.
  • Leveraged Git for version control, ensuring traceability and collaboration among team members.
  • Developed robust code in Spark Scala to handle large-scale data processing tasks efficiently.
  • Created shell scripts to automate the consolidation of small HDFS files, optimizing storage, and improving data processing performance.

Technologies used:

PySpark, Apache Kafka, Scala, Hive, and SQL.

ApplicaMon:

Business Event Monitoring

The business event is IQVIA's data processing capability language. This is the event source that will provide an overview of what is going on in.

IQVIA managed the enterprise with respect to its data assets and processes. A BEM Logger library is created, which can be integrated to.

any other applica9on. The logger module will generate the unique IDs, construct JSON, and post it to the BEM REST endpoint.

Roles & Responsibilities:

  • Worked on identifying business/application events for each application in the project.
  • Created a logging library that will act as a central system to provide unique IDs and is responsible for sharing between.
  • Different applications using the KaJa module.
  • Responsible for verifying the event attributes that are being generated by the BEM logger library.
  • Integrated the module to multiple apps and provided support from development to production.

Senior Data Engineer

TCS
08.2019 - 04.2022

Client: CBA

Project : FCT

Application data migration to Azure:

The on-premise data is transferred to Azure for further analysis and reporting purposes. The data was moved to the ADLS Gen 2 location using the Azure Data Factory service, and transformations are done using the Azure Databricks platform. The Databricks notebooks were created and shared with data analysts for ad hoc requests.

Technologies used:

Azure Data Factory, ADLS Gen 2, Azure Databricks, PySpark.

Application: In-Cycle QC Report Module

The reporting module is designed to generate reports based on various requirements from data analysts and the data management team on an ad-hoc basis. It is a Spark application written in Scala. Reports are being generated from Hive using Spark SQL. Some of the reports include Frequency Distribution, Drill Down Report, etc. The tool will read the metadata and generate a report accordingly. The user needs to set up metadata based on their reporting requirements. This is being used by different asset teams to generate reports on top of millions of records.

Technologies used:

Apache Spark, Scala, Hive, SQL

Big Data Developer

Legato healthcare technology (Carelon Global solution)
09.2018 - 08.2019

Application: COAF Cloud Migration

The overall purpose of the project is to assist state regulators in the performance of their regulatory oversight function with regard.

To the insurance industry. To rebuild the report, we are extracting the data from the source CDL layer and performing the transformation.

On all layers generating.

The report, as per the mapping document.

Responsible for the data analysis.

  • Worked extensively on Hadoop components such as HDFS, Spark, KaJa Job Tracker, Task Tracker, Name Node, Data Node, and YARN.
  • Analyzed data that needs to be loaded into Hadoop, and I contacted the respective source teams to get the table information and connection details.
  • Created Hive tables and partitioned data for better performance.
  • Implemented Hive UDFs and performed performance tuning for better results.
  • I analyzed the hive tables using Hue and Zeppelin Notebook.
  • Performed unit testing.

Hadoop Developer

Infosys Pvt. Ltd
02.2014 - 09.2018

Applica9on: Atlas datalake

New Default Management platform that enables seamless workflow, automated control, and real-time tracking. It will create.

Efficiencies throughout consumer default by eliminating multiple systems and related maintenance, eliminating ancillary databases.

and facilitating multi-product contact strategies. It will enable the use of decision engines to drive practices with greater regulatory.

compliance. Provide the required data from various sources to collection 360 (CM3) application, and also ingest CACS data to ATLAS.

Datalake and load into PDOA database for reporting purposes.

Roles & Responsibilities:

  • Develop ETL processes, scripts, and workflows to automate the data extraction, transformation, and loading tasks.
  • Responsible for the design, development, build, maintenance, and performance tuning of the ETL pipeline.
  • Use ETL tools and programming languages (e.g., SQL, Ab Initio, Spark) to create and maintain ETL jobs.
  • Involved in the ETL phase of the project.
  • Creating packages on the SQL Server Integration Services platform.
  • Performed SQL tuning for the project.

Education

EPGDM (PG) - Business Analyst

Alliance University
Bangalore
11.2024

BTech - IT

Government college of Trivandrum(bartonhill)
Kerala
01.2013

12th -

Govt Higher Secondary School
kulathummel, Kerala
01.2008

10th -

GHSS plavoor
Kerala
01.2006

Skills

  • Spark
  • pyspark
  • Scala
  • Azure
  • Databricks
  • Kafka
  • Azure data lake Gen2
  • Azure Data Factory
  • Hive
  • Hadoop
  • Data warehousing
  • Data analysis
  • Advanced analytics
  • Big data technologies
  • Data integration
  • Scala
  • Python
  • SQL
  • Bash

Languages

  • English
  • Malayalam
  • Hindi

Personal Information

Date of Birth: 02/03/90

Professionalsummary

10, 6, Azure, Apache Spark, KaJa, SQL, Data warehousing, Data modelling, Python, Scala

Timeline

Lead Data Engineer

Hexaware /IQVIA
06.2022 - Current

Senior Data Engineer

TCS
08.2019 - 04.2022

Big Data Developer

Legato healthcare technology (Carelon Global solution)
09.2018 - 08.2019

Hadoop Developer

Infosys Pvt. Ltd
02.2014 - 09.2018

EPGDM (PG) - Business Analyst

Alliance University

BTech - IT

Government college of Trivandrum(bartonhill)

12th -

Govt Higher Secondary School

10th -

GHSS plavoor
Sreenath S