Bhavitha N - Big Data Engineer - Tata Consultancy Services

Summary

Results-driven Big Data Engineer with 3.8 years of hands-on experience leveraging Hadoop, Spark, Hive, and associated ecosystem tools to process and manage large datasets efficiently. Strong expertise in Spark RDD transformations, preprocessing, caching, and performance optimization, along with a solid understanding of Spark's integration with Hadoop and Hive. Proficient in Hive schema evolution, SQL-style querying, and managing various serialized data formats such as Avro, Parquet, and ORC. Extensive experience using Sqoop for data movement between Hadoop and relational databases. Skilled in developing large-scale PySpark pipelines on AWS EMR while integrating Snowflake, Web APIs, and S3. Experienced in orchestrating workflows using Apache Airflow to enhance operational efficiency.

Overview

4

years of professional experience

Work History

Big Data Engineer

Tata Consultancy Services

Bangalore

07.2022 - Current

Designed and developed large-scale PySpark data pipelines on AWS EMR to ingest and process data from S3, Snowflake, and multiple Web APIs, ensuring consistent and reliable data delivery for downstream teams.
Built and executed over 8 PySpark extraction jobs to read from S3, Web APIs, and Snowflake, applying transformations such as filtering, aggregations, window functions, joins, and flattening to prepare intermediate datasets.
Developed a final master PySpark job to consolidate all intermediate outputs, apply business logic, and write curated datasets to S3, with future integration planned for Snowflake and Elasticsearch.
Optimized PySpark job performance by tuning Spark configurations, caching strategies, partitioning, and shuffle operations, resolving memory issues, achieving 30% faster runtime and improving overall pipeline stability.
Handled semi-structured data formats including JSON, Avro, and Parquet, ensuring efficient serialization and compatibility across processing stages.
Managed data ingestion from Snowflake and addressed schema evolution issues by updating PySpark logic to align with source-side structural changes.
Utilized Sqoop for data movement between Hadoop clusters, relational databases, and cloud storage, supporting both full and incremental loads.
Automated end-to-end pipeline orchestration using Apache Airflow, designing DAGs to manage EMR cluster creation, job execution, termination, and daily scheduling at 5 AM EST.
Enabled EMR auto-scaling to handle memory-intensive workloads and improve cluster reliability during peak processing.
Resolved data skew issues using salting techniques to ensure balanced partition distribution and smoother Spark execution.
Leveraged Airflow clear-task functionality to restart failed pipelines, reducing rerun effort and resulting in a 15% reduction in infrastructure costs.
Collaborated with data scientists, analysts, and business teams to gather requirements and deliver robust, production-ready data infrastructure.
Authored documentation and best practices for Spark workflows, improving maintainability and consistency across the team.
Followed Agile methodology with 2-week sprints, tracked work in JIRA, participated in daily standups, and ensured timely delivery of assigned stories.
Committed code to GitHub, raised pull requests, and worked with DevOps teams for production deployments.
Technologies: Spark, Python, HDFS, Hive, Sqoop, SQL, AWS

Education

Bachelor's -

Vignan's Foundation For Science, Technology & Research

Guntur

06-2022

High school -

Sri Chaitanya Junior College

Chittoor

06-2018

High school -

R.K.Model School

Chittoor

06-2016

Skills

Data Eco System : Hadoop, Sqoop, Hive, Apache Spark and AWS
Cloud Skills : AWS
Distribution : Cloudera 512
Databases : MySQL
Languages : Python, SQL

Operating Systems : Linux and Windows
IDE and Build Tools : Intellij IDEA
Project Management : Jira, Agile methodology
Version Control : GIT

Websites

https://www.linkedin.com/in/bhavitha-n

Timeline

Big Data Engineer

Tata Consultancy Services

07.2022 - Current

Bachelor's -

Vignan's Foundation For Science, Technology & Research

High school -

Sri Chaitanya Junior College

High school -

R.K.Model School