Summary
Overview
Work History
Education
Skills
Certification
PUBLICATIONS
Accomplishments
Work Availability
Timeline
Generic

SURUTHI GOBIKA S

Data Engineer
Pollachi,TN

Summary

Dynamic Data Engineer with over three years of experience in building and optimizing data solutions using AWS services. Proficient in Python, SQL, and a suite of AWS tools such as S3, EC2, Lambda, EMR, Glue, and Airflow, with a strong emphasis on designing and enhancing complex data pipelines and ETL processes. Recognized for delivering impactful contributions through seamless data integration and robust solution development. A collaborative team player thriving in fast-paced environments, adept at adapting to evolving requirements while driving collective success.

Overview

3
3
years of professional experience
3
3
Certifications

Work History

Data Engineer

AECO Energy
08.2024 - Current
  • Developed custom PySpark (Python) functions to process semi-structured data (Avro, Parquet, ORC) using Hive, handling over 2 TB of data daily from S3, web APIs, and Snowflake.
  • I wrote and optimized complex SQL queries (joins, CTEs, window functions, and aggregations) for data validation, reconciliation, and analytics across Snowflake and downstream datasets.
  • Worked extensively with Snowflake as a cloud data warehouse source, integrating data via the Snowflake Spark Connector, handling schema changes, NULL logic, and performing SQL-based validations.
  • Improved PySpark job completion time by 20% by identifying and resolving Spark performance bottlenecks, data skew, and EMR resource contention issues.
  • Reduced AWS EMR costs by 30% by leveraging EC2 Spot Instances, enabling dynamic allocation, and optimizing executor memory and cluster sizing for long-running Spark jobs.
  • Automated 8 Spark jobs using Apache Airflow DAGs (instead of Step Functions) to orchestrate extraction, transformation, and master aggregation jobs on EMR, with proper dependency handling.
    Designed and managed Hive external tables on AWS S3, supporting over 15 TB of data, and enabling S3 as an intermediate data lake for downstream analytics teams.
  • Increased data processing efficiency by 25% by optimizing joins, window functions, partitioning, and anti-join logic in PySpark when processing relational and semi-structured data from S3 and Snowflake.
  • Supported the modernization of legacy data platforms by assisting in on-premises to cloud migration, resulting in a 40% reduction in processing time, and improved overall system scalability.

Data Engineer

Tata Consultancy Services
07.2022 - 07.2024
  • Transformed large-scale datasets using Spark DataFrame transformations, reducing query time by 30%, and improving data accessibility for downstream analytics teams.
  • Managed and automated 8+ ETL data pipelines using PySpark on AWS EMR and AWS S3, improving data pipeline efficiency and reliability by 40%.
  • Utilized version control and CI/CD pipelines using Git for Airflow DAGs, reducing deployment errors by 25%, and enhancing collaboration among team members.
  • Optimized Sqoop imports and exports between AWS EMR and Snowflake, increasing data transfer speed by 35%, and ensuring timely ingestion and delivery of analytics data.
  • Governed Hive partitions and buckets, improving query performance by 45% through optimized data organization, aligned with reporting requirements.
  • Debugged and resolved performance bottlenecks in Spark jobs, reducing overall data processing time by more than 10 hours per week, and enhancing system reliability and throughput.
  • Expertise in using Spark DataFrame transformations and actions to process large-scale structured and semi-structured data sets, including filtering, mapping, reducing, grouping, and aggregating data.
  • Conducted root cause analysis on data discrepancies, providing timely resolutions to enhance reliability.
  • Client: Phoenix Group Holdings, UK

Education

B.E. - Electronics and Communication

P.A. College of Engineering and Technology
Pollachi
07.2022

Skills

AWS (S3, EC2, Lambda, EMR, Glue, Redshift,Airflow)

Certification

Data manipulation using Pandas, Udemy

PUBLICATIONS

  • IoT enabled paddy field monitoring and disease detection system, ICADSIS 2022 Conference, Indian Journal of Natural Sciences, Vol.No 13, Issue No.73, 08/22
  • A Review of Advancements in Battery Technologies for Electric Vehicles, Future Electric Vehicular Mobility and Its Challenges (ICFEVMC-2021), AICTE

Accomplishments

Demonstrated strong proficiency in Big Data technologies by clearing the iON Proctored Assessment, earning a ₹40K incentive.

Work Availability

monday
tuesday
wednesday
thursday
friday
saturday
sunday
morning
afternoon
evening
swipe to browse

Timeline

Data Engineer

AECO Energy
08.2024 - Current

Data Engineer

Tata Consultancy Services
07.2022 - 07.2024

B.E. - Electronics and Communication

P.A. College of Engineering and Technology
SURUTHI GOBIKA SData Engineer