Summary
Overview
Work History
Education
Skills
Websites
Timeline
Generic
Keerthana Annur

Keerthana Annur

Chennai

Summary

Seasoned Senior Data Engineer with background in 7.5+ years of experience in Big data/Hadoop technology stack, specializing in Apache Spark, PySpark, Scala, Spark Core, Spark SQL, Spark Streaming, Hive, Sqoop, PIG, HBase, Kafka. Proficient in AWS (Glue, S3, Redshift), GCP (Big Query, Data Proc), Python, Talend Data Integration ETL and Talend Data Quality. Skilled in optimizing Spark applications and creating complex ETL mappings using Talend tools.

A proven track record in developing, testing, and maintaining data architectures. Possess strong skills in database management systems, Big Data processing frameworks, data modeling and warehousing. Have successfully led teams in creating innovative data solutions to improve system efficiency and business decision-making processes. Demonstrated impact through enhanced data availability and accuracy in previous roles.

Overview

8
8
years of professional experience

Work History

Senior Data Engineer

Walmart Global Tech India
Chennai
03.2024 - Current
  • Designed and developed ETL pipelines using Spark Scala to efficiently process large datasets, handling data volumes of 90+ million records.
  • Built and managed cloud-native solutions on Google Cloud Platform (GCP), leveraging BigQuery for analytics, and Dataproc for batch data processing.
  • Integrated Kafka with Spark Streaming for real-time analytics on streaming data sets.
  • Optimized Spark jobs with partitioning, caching, and dynamic resource allocation to ensure high performance and scalability.
  • Automated data workflows using Automic Automation and integrated them into CI/CD pipelines for seamless deployments.
  • Implemented version control and collaboration practices using Git, ensuring traceable code changes and repository hygiene.
  • Managed Apache Kafka streams to facilitate real-time data ingestion and Spark Streaming applications to support real-time analytics.
  • Developed data pipelines that interact with Apache Hive and Apache Hudi, ensuring data is efficiently stored, queried, and version-controlled in data lakes.
  • Collaborated closely with cross-functional teams, including data scientists and analysts, to build data products and design robust data models.
  • Established CI/CD pipelines for data workflows to ensure continuous integration and delivery in cloud environments.
  • Implemented monitoring, logging, and alerting mechanisms for real-time data pipelines to ensure high availability and timely issue resolution.
  • Worked on Hudi datasets for incremental data ingestion, providing near-real-time updates for business reporting.
  • Developed tools and applications for monitoring data quality and integrity across all applications.
  • Managed version control and deployment of data applications using Git.
  • Analyzed user requirements, designed, and developed ETL processes to load enterprise data into the Data Warehouse.
  • Collaborated with cross-functional teams to gather requirements and translate business needs into technical specifications for data solutions.
  • Developed and implemented data models, database designs, data access and table maintenance codes.
  • Streamlined data flow from diverse sources using ETL tools such as Automic, and Airflow.

Data Engineer

E&Y Global Delivery Services India LLP
Hyderabad
06.2021 - 01.2024
  • Responsible for building scalable distributed data solutions using Spark for on-premise and AWS Cloud data sources
  • Developed Spark programs to perform data transformations, creating spark dataframes and running spark SQL in Scala
  • Developed Sqoop Jobs to ingest customer and product data into HDFS data lakes
  • Sqooping the data which includes both full load and incremental load from RDBMS to HDFS (Landing Zone)
  • Developed HQL scripts to create external tables and analyze incoming and intermediate data for analytics applications in Hive
  • Worked on various hive optimizations like partitioning, bucketing, vectorization, indexing
  • Responsible for developing custom UDFs, UDAFs and UDTFs in Hive
  • Worked with various files format like CSV, JSON, ORC, AVRO and Parquet
  • Optimized the performance of spark job in a way to process millions of records in minutes of time which includes spark joins with massive tables
  • Capturing Change Data Capture (CDC) from source based on Primary Key field using Spark and finally loaded into Data Lake Hive Tables
  • Optimized spark jobs using various optimization techniques like broadcasting, executor tuning, persisting etc
  • Performed Data profiling activities on the data sources to check the data quality and publish data quality scores using Talend ETL under 6 DQ dimensions - Completeness, Accuracy, Validity, Consistency, Uniqueness and Timeliness
  • Orchestrating Hadoop and Spark jobs using Airflow to create dependency of jobs and run multiple Jobs in sequence for processing data
  • Provided work guidance and technical assistance to the team members
  • Created technical design document which show cases the end-to-end flow of project modules
  • Interacting with the business teams to understand the business problems and designing applications with Hadoop ecosystem.

Data Engineer

Capgemini Technology Services Pvt Limited
Bangalore
09.2019 - 06.2021
  • Developed Spark framework to ingest the data from sources to HDFS.
  • Developed a code based on business requirements to process the huge amount of data effectively and efficiently.
  • Post processing, we have aggregated all the data and stored into data warehouse.
  • Used Power-BI tool as reporting and visualization for business analysis.
  • Performed Import and Export of data into HDFS and Hive using SQOOP and managed data within the environment..
  • Involved in creating Hive tables, data loading and writing hive queries
  • Was responsible for Optimizing Hive queries that helped in saving Cost to the project.
  • Developed code (validating data with payment details and applied joined operations) to get required data and inserted into Hive table.
  • Implemented complex logics and conditions in Spark to validate the data.

ETL Developer

KPMG India
Bangalore
09.2016 - 09.2019
  • Developed, optimized, and managed data pipelines using the Hadoop Ecosystem to handle large-scale data ingestion, storage, and processing.
  • Designed and implemented ETL workflows using Talend to extract data from multiple sources, transform complex datasets, and load into data warehouses.
  • Utilised SQL for data extraction, transformation, and querying, ensuring data quality and correctness.
  • Implemented Apache Hive queries for efficient querying on large datasets stored in HDFS and created partitioned tables to enhance query performance.
  • Automated data ingestion workflows using Apache Sqoop to import/export data between relational databases and Hadoop clusters.
  • Conducted performance tuning and troubleshooting of data pipelines to reduce latency and optimize resource utilization.

Education

Bachelor Of Technology -

Jawaharlal Nehru Technological University
06-2016

High School Diploma -

Board Of Intermediate Public Examination
06-2011

Skills

  • BIG DATA TECHNOLOGIES : Hadoop, Spark, Hive, Hudi, Sqoop, Kafka, HBase, Scala Spark, Spark Streaming
  • LANGUAGES : Scala, Python, Spark SQL, Spark Core, SQL, HiveQL, Shell Scripting
  • CLOUD TECHNOLOGIES: Amazon Web Services (AWS) - S3, Redshift, Glue Google Cloud Platform (GCP) - Big Query, Data Proc
  • ETL TOOLS: Talend Data Integration, Talend Data Quality
  • IDE: Eclipse & IntelliJ
  • VERSION CONTROL: Git, Nexus & SVN
  • DATABASE: MySQL, Vertica, Oracle, MS SQL Server
  • SCHEDULING: Airflow, Automic, Autosys

Timeline

Senior Data Engineer

Walmart Global Tech India
03.2024 - Current

Data Engineer

E&Y Global Delivery Services India LLP
06.2021 - 01.2024

Data Engineer

Capgemini Technology Services Pvt Limited
09.2019 - 06.2021

ETL Developer

KPMG India
09.2016 - 09.2019

Bachelor Of Technology -

Jawaharlal Nehru Technological University

High School Diploma -

Board Of Intermediate Public Examination
Keerthana Annur