Summary
Overview
Work History
Education
Skills
Websites
Timeline
Generic

Bhavitha N

Bangalore

Summary

Results-driven Big Data Engineer with 3.8 years of hands-on experience leveraging Hadoop, Spark, Hive, and associated ecosystem tools to process and manage large datasets efficiently. Strong expertise in Spark RDD transformations, preprocessing, caching, and performance optimization, along with a solid understanding of Spark's integration with Hadoop and Hive. Proficient in Hive schema evolution, SQL-style querying, and managing various serialized data formats such as Avro, Parquet, and ORC. Extensive experience using Sqoop for data movement between Hadoop and relational databases. Skilled in developing large-scale PySpark pipelines on AWS EMR while integrating Snowflake, Web APIs, and S3. Experienced in orchestrating workflows using Apache Airflow to enhance operational efficiency.

Overview

4
4
years of professional experience

Work History

Big Data Engineer

Tata Consultancy Services
Bangalore
07.2022 - Current
  • Designed and developed large-scale PySpark data pipelines on AWS EMR to ingest and process data from S3, Snowflake, and multiple Web APIs, ensuring consistent and reliable data delivery for downstream teams.
  • Built and executed over 8 PySpark extraction jobs to read from S3, Web APIs, and Snowflake, applying transformations such as filtering, aggregations, window functions, joins, and flattening to prepare intermediate datasets.
  • Developed a final master PySpark job to consolidate all intermediate outputs, apply business logic, and write curated datasets to S3, with future integration planned for Snowflake and Elasticsearch.
  • Optimized PySpark job performance by tuning Spark configurations, caching strategies, partitioning, and shuffle operations, resolving memory issues, achieving 30% faster runtime and improving overall pipeline stability.
  • Handled semi-structured data formats including JSON, Avro, and Parquet, ensuring efficient serialization and compatibility across processing stages.
  • Managed data ingestion from Snowflake and addressed schema evolution issues by updating PySpark logic to align with source-side structural changes.
  • Utilized Sqoop for data movement between Hadoop clusters, relational databases, and cloud storage, supporting both full and incremental loads.
  • Automated end-to-end pipeline orchestration using Apache Airflow, designing DAGs to manage EMR cluster creation, job execution, termination, and daily scheduling at 5 AM EST.
  • Enabled EMR auto-scaling to handle memory-intensive workloads and improve cluster reliability during peak processing.
  • Resolved data skew issues using salting techniques to ensure balanced partition distribution and smoother Spark execution.
  • Leveraged Airflow clear-task functionality to restart failed pipelines, reducing rerun effort and resulting in a 15% reduction in infrastructure costs.
  • Collaborated with data scientists, analysts, and business teams to gather requirements and deliver robust, production-ready data infrastructure.
  • Authored documentation and best practices for Spark workflows, improving maintainability and consistency across the team.
  • Followed Agile methodology with 2-week sprints, tracked work in JIRA, participated in daily standups, and ensured timely delivery of assigned stories.
  • Committed code to GitHub, raised pull requests, and worked with DevOps teams for production deployments.
  • Technologies: Spark, Python, HDFS, Hive, Sqoop, SQL, AWS

Education

Bachelor's -

Vignan's Foundation For Science, Technology & Research
Guntur
06-2022

High school -

Sri Chaitanya Junior College
Chittoor
06-2018

High school -

R.K.Model School
Chittoor
06-2016

Skills

  • Data Eco System : Hadoop, Sqoop, Hive, Apache Spark and AWS
  • Cloud Skills : AWS
  • Distribution : Cloudera 512
  • Databases : MySQL
  • Languages : Python, SQL
  • Operating Systems : Linux and Windows
  • IDE and Build Tools : Intellij IDEA
  • Project Management : Jira, Agile methodology
  • Version Control : GIT

Timeline

Big Data Engineer

Tata Consultancy Services
07.2022 - Current

Bachelor's -

Vignan's Foundation For Science, Technology & Research

High school -

Sri Chaitanya Junior College

High school -

R.K.Model School
Bhavitha N