Summary

Overview

Work History

Education

Skills

Certification

PUBLICATIONS

Accomplishments

Work Availability

Timeline

SURUTHI GOBIKA S

Data Engineer

Pollachi,TN

Summary

Dynamic Data Engineer with over three years of experience in building and optimizing data solutions using AWS services. Proficient in Python, SQL, and a suite of AWS tools such as S3, EC2, Lambda, EMR, Glue, and Airflow, with a strong emphasis on designing and enhancing complex data pipelines and ETL processes. Recognized for delivering impactful contributions through seamless data integration and robust solution development. A collaborative team player thriving in fast-paced environments, adept at adapting to evolving requirements while driving collective success.

Overview

years of professional experience

Certifications

Work History

Data Engineer

AECO Energy

08.2024 - Current

Developed custom PySpark (Python) functions to process semi-structured data (Avro, Parquet, ORC) using Hive, handling over 2 TB of data daily from S3, web APIs, and Snowflake.
I wrote and optimized complex SQL queries (joins, CTEs, window functions, and aggregations) for data validation, reconciliation, and analytics across Snowflake and downstream datasets.
Worked extensively with Snowflake as a cloud data warehouse source, integrating data via the Snowflake Spark Connector, handling schema changes, NULL logic, and performing SQL-based validations.
Improved PySpark job completion time by 20% by identifying and resolving Spark performance bottlenecks, data skew, and EMR resource contention issues.
Reduced AWS EMR costs by 30% by leveraging EC2 Spot Instances, enabling dynamic allocation, and optimizing executor memory and cluster sizing for long-running Spark jobs.
Automated 8 Spark jobs using Apache Airflow DAGs (instead of Step Functions) to orchestrate extraction, transformation, and master aggregation jobs on EMR, with proper dependency handling.
Designed and managed Hive external tables on AWS S3, supporting over 15 TB of data, and enabling S3 as an intermediate data lake for downstream analytics teams.
Increased data processing efficiency by 25% by optimizing joins, window functions, partitioning, and anti-join logic in PySpark when processing relational and semi-structured data from S3 and Snowflake.
Supported the modernization of legacy data platforms by assisting in on-premises to cloud migration, resulting in a 40% reduction in processing time, and improved overall system scalability.

Data Engineer

Tata Consultancy Services

07.2022 - 07.2024

Transformed large-scale datasets using Spark DataFrame transformations, reducing query time by 30%, and improving data accessibility for downstream analytics teams.
Managed and automated 8+ ETL data pipelines using PySpark on AWS EMR and AWS S3, improving data pipeline efficiency and reliability by 40%.
Utilized version control and CI/CD pipelines using Git for Airflow DAGs, reducing deployment errors by 25%, and enhancing collaboration among team members.
Optimized Sqoop imports and exports between AWS EMR and Snowflake, increasing data transfer speed by 35%, and ensuring timely ingestion and delivery of analytics data.
Governed Hive partitions and buckets, improving query performance by 45% through optimized data organization, aligned with reporting requirements.
Debugged and resolved performance bottlenecks in Spark jobs, reducing overall data processing time by more than 10 hours per week, and enhancing system reliability and throughput.
Expertise in using Spark DataFrame transformations and actions to process large-scale structured and semi-structured data sets, including filtering, mapping, reducing, grouping, and aggregating data.
Conducted root cause analysis on data discrepancies, providing timely resolutions to enhance reliability.
Client: Phoenix Group Holdings, UK

Education

B.E. - Electronics and Communication

P.A. College of Engineering and Technology

Pollachi

07.2022

Skills

AWS (S3, EC2, Lambda, EMR, Glue, Redshift,Airflow)

AWS Step Functions

AWS CloudWatch

EC2 Spot Instances

Snowflake

Python

SQL

Linux

Cloudera

Apache Hadoop

HDFS

Hive

Sqoop

Apache Spark

PySpark

AVRO

ORC

Visual Studio Code

JIRA

Databricks

PyCharm

IntelliJ IDEA

Certification

Data manipulation using Pandas, Udemy

PUBLICATIONS

IoT enabled paddy field monitoring and disease detection system, ICADSIS 2022 Conference, Indian Journal of Natural Sciences, Vol.No 13, Issue No.73, 08/22
A Review of Advancements in Battery Technologies for Electric Vehicles, Future Electric Vehicular Mobility and Its Challenges (ICFEVMC-2021), AICTE

Accomplishments

Demonstrated strong proficiency in Big Data technologies by clearing the iON Proctored Assessment, earning a ₹40K incentive.

Work Availability

monday

tuesday

wednesday

thursday

friday

saturday

sunday

morning

afternoon

evening

swipe to browse

Timeline

Data Engineer

AECO Energy

08.2024 - Current

Data Engineer

Tata Consultancy Services

07.2022 - 07.2024

B.E. - Electronics and Communication

P.A. College of Engineering and Technology

SURUTHI GOBIKA S

Summary

Overview

Work History

Data Engineer

Data Engineer

Education

B.E. - Electronics and Communication

Skills

Certification

PUBLICATIONS

Accomplishments

Work Availability

Timeline

Data Engineer

Data Engineer

B.E. - Electronics and Communication

Similar Profiles

APURVA ASHOK SARODEAPURVA ASHOK SARODE

Pawan Narasimha AvvariPawan Narasimha Avvari

Dharani MedapatiDharani Medapati

AMOL JADHAOAMOL JADHAO

SAI KRISHNA BITLINGSAI KRISHNA BITLING