Summary

Overview

Work History

Education

Skills

Websites

Timeline

Keerthana Annur

Chennai

Summary

Seasoned Senior Data Engineer with background in 7.5+ years of experience in Big data/Hadoop technology stack, specializing in Apache Spark, PySpark, Scala, Spark Core, Spark SQL, Spark Streaming, Hive, Sqoop, PIG, HBase, Kafka. Proficient in AWS (Glue, S3, Redshift), GCP (Big Query, Data Proc), Python, Talend Data Integration ETL and Talend Data Quality. Skilled in optimizing Spark applications and creating complex ETL mappings using Talend tools.

A proven track record in developing, testing, and maintaining data architectures. Possess strong skills in database management systems, Big Data processing frameworks, data modeling and warehousing. Have successfully led teams in creating innovative data solutions to improve system efficiency and business decision-making processes. Demonstrated impact through enhanced data availability and accuracy in previous roles.

Overview

years of professional experience

Work History

Senior Data Engineer

Walmart Global Tech India

Chennai

03.2024 - Current

Designed and developed ETL pipelines using Spark Scala to efficiently process large datasets, handling data volumes of 90+ million records.
Built and managed cloud-native solutions on Google Cloud Platform (GCP), leveraging BigQuery for analytics, and Dataproc for batch data processing.
Integrated Kafka with Spark Streaming for real-time analytics on streaming data sets.
Optimized Spark jobs with partitioning, caching, and dynamic resource allocation to ensure high performance and scalability.
Automated data workflows using Automic Automation and integrated them into CI/CD pipelines for seamless deployments.
Implemented version control and collaboration practices using Git, ensuring traceable code changes and repository hygiene.
Managed Apache Kafka streams to facilitate real-time data ingestion and Spark Streaming applications to support real-time analytics.
Developed data pipelines that interact with Apache Hive and Apache Hudi, ensuring data is efficiently stored, queried, and version-controlled in data lakes.
Collaborated closely with cross-functional teams, including data scientists and analysts, to build data products and design robust data models.
Established CI/CD pipelines for data workflows to ensure continuous integration and delivery in cloud environments.
Implemented monitoring, logging, and alerting mechanisms for real-time data pipelines to ensure high availability and timely issue resolution.
Worked on Hudi datasets for incremental data ingestion, providing near-real-time updates for business reporting.
Developed tools and applications for monitoring data quality and integrity across all applications.
Managed version control and deployment of data applications using Git.
Analyzed user requirements, designed, and developed ETL processes to load enterprise data into the Data Warehouse.
Collaborated with cross-functional teams to gather requirements and translate business needs into technical specifications for data solutions.
Developed and implemented data models, database designs, data access and table maintenance codes.
Streamlined data flow from diverse sources using ETL tools such as Automic, and Airflow.

Data Engineer

E&Y Global Delivery Services India LLP

Hyderabad

06.2021 - 01.2024

Responsible for building scalable distributed data solutions using Spark for on-premise and AWS Cloud data sources
Developed Spark programs to perform data transformations, creating spark dataframes and running spark SQL in Scala
Developed Sqoop Jobs to ingest customer and product data into HDFS data lakes
Sqooping the data which includes both full load and incremental load from RDBMS to HDFS (Landing Zone)
Developed HQL scripts to create external tables and analyze incoming and intermediate data for analytics applications in Hive
Worked on various hive optimizations like partitioning, bucketing, vectorization, indexing
Responsible for developing custom UDFs, UDAFs and UDTFs in Hive
Worked with various files format like CSV, JSON, ORC, AVRO and Parquet
Optimized the performance of spark job in a way to process millions of records in minutes of time which includes spark joins with massive tables
Capturing Change Data Capture (CDC) from source based on Primary Key field using Spark and finally loaded into Data Lake Hive Tables
Optimized spark jobs using various optimization techniques like broadcasting, executor tuning, persisting etc
Performed Data profiling activities on the data sources to check the data quality and publish data quality scores using Talend ETL under 6 DQ dimensions - Completeness, Accuracy, Validity, Consistency, Uniqueness and Timeliness
Orchestrating Hadoop and Spark jobs using Airflow to create dependency of jobs and run multiple Jobs in sequence for processing data
Provided work guidance and technical assistance to the team members
Created technical design document which show cases the end-to-end flow of project modules
Interacting with the business teams to understand the business problems and designing applications with Hadoop ecosystem.

Data Engineer

Capgemini Technology Services Pvt Limited

Bangalore

09.2019 - 06.2021

Developed Spark framework to ingest the data from sources to HDFS.
Developed a code based on business requirements to process the huge amount of data effectively and efficiently.
Post processing, we have aggregated all the data and stored into data warehouse.
Used Power-BI tool as reporting and visualization for business analysis.
Performed Import and Export of data into HDFS and Hive using SQOOP and managed data within the environment..
Involved in creating Hive tables, data loading and writing hive queries
Was responsible for Optimizing Hive queries that helped in saving Cost to the project.
Developed code (validating data with payment details and applied joined operations) to get required data and inserted into Hive table.
Implemented complex logics and conditions in Spark to validate the data.

ETL Developer

KPMG India

Bangalore

09.2016 - 09.2019

Developed, optimized, and managed data pipelines using the Hadoop Ecosystem to handle large-scale data ingestion, storage, and processing.
Designed and implemented ETL workflows using Talend to extract data from multiple sources, transform complex datasets, and load into data warehouses.
Utilised SQL for data extraction, transformation, and querying, ensuring data quality and correctness.
Implemented Apache Hive queries for efficient querying on large datasets stored in HDFS and created partitioned tables to enhance query performance.
Automated data ingestion workflows using Apache Sqoop to import/export data between relational databases and Hadoop clusters.
Conducted performance tuning and troubleshooting of data pipelines to reduce latency and optimize resource utilization.

Education

Bachelor Of Technology -

Jawaharlal Nehru Technological University

06-2016

High School Diploma -

Board Of Intermediate Public Examination

06-2011

Skills

BIG DATA TECHNOLOGIES : Hadoop, Spark, Hive, Hudi, Sqoop, Kafka, HBase, Scala Spark, Spark Streaming
LANGUAGES : Scala, Python, Spark SQL, Spark Core, SQL, HiveQL, Shell Scripting
CLOUD TECHNOLOGIES: Amazon Web Services (AWS) - S3, Redshift, Glue Google Cloud Platform (GCP) - Big Query, Data Proc
ETL TOOLS: Talend Data Integration, Talend Data Quality

IDE: Eclipse & IntelliJ
VERSION CONTROL: Git, Nexus & SVN
DATABASE: MySQL, Vertica, Oracle, MS SQL Server
SCHEDULING: Airflow, Automic, Autosys

Websites

https://www.linkedin.com/in/keerthanaannur/

Timeline

Senior Data Engineer

Walmart Global Tech India

03.2024 - Current

Data Engineer

E&Y Global Delivery Services India LLP

06.2021 - 01.2024

Data Engineer

Capgemini Technology Services Pvt Limited

09.2019 - 06.2021

ETL Developer

KPMG India

09.2016 - 09.2019

Bachelor Of Technology -

Jawaharlal Nehru Technological University

High School Diploma -

Board Of Intermediate Public Examination

Keerthana Annur

Summary

Overview

Work History

Senior Data Engineer

Data Engineer

Data Engineer

ETL Developer

Education

Bachelor Of Technology -

High School Diploma -

Skills

Websites

Timeline

Senior Data Engineer

Data Engineer

Data Engineer

ETL Developer

Bachelor Of Technology -

High School Diploma -

Similar Profiles

Ashutosh PrasadAshutosh Prasad

Tanzeel KhanTanzeel Khan

Kajal KambojKajal Kamboj

Rajya Lakshmi KornapatiRajya Lakshmi Kornapati

Vijay K. SVijay K. S