Summary
Overview
Work History
Education
Skills
Certification
Projects
Timeline
Generic

Sathish Kumar Kutikela

Hyderabad

Summary

Over 5+ years of IT experience in Big Data development using Hadoop, Hive, Spark, SQL, PySpark, and Python. Experienced with AWS cloud services, including S3, EMR, Glue, and Lambda. Proficient with the cloud data warehouse Snowflake. Skilled in writing Spark SQL code and developing Spark scripts with necessary optimizations. Designed and created Hive external and managed tables with performance tuning. Built pipelines for data transfer between Snowflake and AWS S3 storage. Worked with various file formats: Avro, ORC, Parquet, JSON, and CSV. Maintained code in GitHub and triggered jobs through AWS Step Functions. Experienced with the Cloudera Distribution CDP platform. Strong problem-solving and troubleshooting skills; effective team player. Proficient in Agile methodology, actively participating in daily stand-ups, sprint planning, and retrospectives.

Overview

5
5
years of professional experience
1
1
Certification

Work History

DATA ENGINEER

Neostats Analytics Solution Pvt Ltd
08.2024 - 02.2025
  • Developed and optimized Spark applications for data processing
  • Develop Spark and HiveQL jobs for data ingestion, transformation, and analytics.
  • Design and develop ETL workflows using Spark SQL to create denormalized tables.
  • Write and optimize SQL queries for comprehensive data analysis.
  • Perform complex data processing using Apache Spark SQL.

CONSULTANT

Capgemini Technology Services India Limited
12.2021 - 05.2022
  • Developed Spark SQL scripts for data transformation and analysis
  • Migrated data stage jobs to Spark SQL and Snowflake SQL
  • Optimized resource utilization for data processing jobs.

SENIOR ASSOCIATE

Wipro Pvt Ltd
01.2020 - 01.2022
  • Designed and developed ETL workflows using Spark SQL and Hive
  • Created Hive external tables with dynamic partition
  • Performed data integrity and quality checks in Hadoop
  • Loaded and transformed structured data in various file formats using Hive

Education

B. Tech - Electronics Communication Engineering

Jawaharlal Nehru Technological University
Hyderabad

Skills

  • Hadoop
  • Hive
  • Spark
  • MySQL
  • Oracle
  • Cloudera CDP
  • AWS
  • S3
  • Glue
  • Lambda
  • EMR
  • Step Functions

Certification

  • Snowflake Training - Tutorials point, Completed comprehensive training on Snowflake's cloud data platform, including architecture, data loading, and query optimization.
  • SQL Practice and Challenges - Hacker Rank, Engaged in SQL challenges and practice problems to enhance query writing and optimization skills.

Projects

#Project3: SCB-Athena RB Data Hub

Duration: 08/01/24 to 02/28/25

Technologies: Spark, Hive, SQL, HQL, Python, PySpark, Control-M, PyCharm, S3, Glue, Lambda

Description: Athena is a strategic program to create a centralized data hub for business analytics, starting with the retail unit and expanding to other business areas. Built on the Hive Big Data platform, it manages large-scale data processing while following strict guidelines to ensure data accuracy, performance, and compliance 

Roles and responsibilities:

  • Design and implement scalable ETL workflows using Spark SQL to handle large datasets and complex transformations
  • Develop and optimize Spark and PySpark jobs for data ingestion, transformation, aggregation, and analytics
  • Write and fine-tune SQL and HiveQL queries to ensure high-performance data analysis and reporting.
  • Implement data partitioning, bucketing, and dynamic partitions in Hive for efficient query execution and data storage, create and manage Hive external tables and HDFS directories for streamlined data organization
  • Leverage AWS services like S3, Glue, and Lambda to automate and enhance data pipelines.
  • Monitor, troubleshoot, and optimize Spark job performance, cluster utilization, and ETL pipelines.
  • Use Control-M to schedule, monitor, and maintain workflows for seamless operations.
  • Collaborate with business teams to gather requirements and deliver scalable, analytics-ready data solutions
  • Ensure data governance, security, and compliance across all workflows
  • Document processes, workflows, and best practices to support knowledge sharing and team onboarding

#Project 2: Cloud Data Matrix

Duration:

Technologies: Spark, Snowflake, SQL, Python, PyCharm, S3, Glue, Lambda, Step Functions.

The Cloud Data Matrix is the central platform for data exchange within the organization, facilitating transformation and processing. Data stage jobs have been migrated to Spark SQL and Snowflake SQL, enhancing platform efficiency and adaptability for optimal data management and exchange 

Roles and responsibilities

Write and optimize SQL queries for comprehensive data analysis. 

Perform complex data processing using Apache Spark SQL. 

Perform data transformations in Snowflake using SQL (joins, aggregations). 

Use AWS Glue to run Spark jobs for data transformation and processing through Snow SQL. 

Create ETL jobs with AWS Glue to extract data from various sources and store it in Amazon S3. 

Create S3 buckets with data lake storage, lifecycle policies, and secure access controls. Build pipelines to load and unload data between Snowflake and AWS S3 storage. 

Process data into Snowflake for analytical querying. Integrate Lambda functions with AWS services and monitor performance. 

Ensure pipelines have error handling and monitoring mechanisms using AWS CloudWatch. 

Project 1: Unilever Insight Hub

Duration:

Technologies: Apache Spark, Hive, SQL, HDFS, Sqoop, IntelliJ IDE, and Shell Scripting. Description: Unilever analyzes customer data from various sources, including social media, online reviews, loyalty programs, and sales transactions, to understand consumer preferences, behavior patterns, and sentiment. This data is used to tailor marketing campaigns, develop targeted product offerings, and improve overall customer experience, while also identifying inefficiencies and potential risks for better decision-making and enhanced supply chain management

Roles and responsibilities

Design and develop ETL workflows using Spark SQL to create denormalized tables

create Hive external tables with appropriate dynamic partitions, 

Create HDFS directories to store data and Hive tables 

Implement data integrity and data quality checks in Hadoop using Hive and Linux scripts

collect, aggregate, and move data from servers to HDFS.

create Hive tables, load data, and write Hive queries running in MapReduce way.

Load and transform structured data into various file formats (Avro, Parquet) in Hive

Use Sqoop to export analyzed data to relational databases for report generation

perform actions and transformations (wider and narrow transformations) based on project requirements

Timeline

DATA ENGINEER

Neostats Analytics Solution Pvt Ltd
08.2024 - 02.2025

CONSULTANT

Capgemini Technology Services India Limited
12.2021 - 05.2022

SENIOR ASSOCIATE

Wipro Pvt Ltd
01.2020 - 01.2022

B. Tech - Electronics Communication Engineering

Jawaharlal Nehru Technological University
Sathish Kumar Kutikela