Summary

Overview

Work History

Education

Skills

Certification

Projects

Timeline

Sathish Kumar Kutikela

Hyderabad

Summary

Over 5+ years of IT experience in Big Data development using Hadoop, Hive, Spark, SQL, PySpark, and Python. Experienced with AWS cloud services, including S3, EMR, Glue, and Lambda. Proficient with the cloud data warehouse Snowflake. Skilled in writing Spark SQL code and developing Spark scripts with necessary optimizations. Designed and created Hive external and managed tables with performance tuning. Built pipelines for data transfer between Snowflake and AWS S3 storage. Worked with various file formats: Avro, ORC, Parquet, JSON, and CSV. Maintained code in GitHub and triggered jobs through AWS Step Functions. Experienced with the Cloudera Distribution CDP platform. Strong problem-solving and troubleshooting skills; effective team player. Proficient in Agile methodology, actively participating in daily stand-ups, sprint planning, and retrospectives.

Overview

years of professional experience

Certification

Work History

DATA ENGINEER

Neostats Analytics Solution Pvt Ltd

08.2024 - 02.2025

Developed and optimized Spark applications for data processing
Develop Spark and HiveQL jobs for data ingestion, transformation, and analytics.
Design and develop ETL workflows using Spark SQL to create denormalized tables.
Write and optimize SQL queries for comprehensive data analysis.
Perform complex data processing using Apache Spark SQL.

CONSULTANT

Capgemini Technology Services India Limited

12.2021 - 05.2022

Developed Spark SQL scripts for data transformation and analysis
Migrated data stage jobs to Spark SQL and Snowflake SQL
Optimized resource utilization for data processing jobs.

SENIOR ASSOCIATE

Wipro Pvt Ltd

01.2020 - 01.2022

Designed and developed ETL workflows using Spark SQL and Hive
Created Hive external tables with dynamic partition
Performed data integrity and quality checks in Hadoop
Loaded and transformed structured data in various file formats using Hive

Education

B. Tech - Electronics Communication Engineering

Jawaharlal Nehru Technological University

Hyderabad

Skills

Hadoop
Hive
Spark
MySQL
Oracle
Cloudera CDP

AWS
S3
Glue
Lambda
EMR
Step Functions

Certification

Snowflake Training - Tutorials point, Completed comprehensive training on Snowflake's cloud data platform, including architecture, data loading, and query optimization.
SQL Practice and Challenges - Hacker Rank, Engaged in SQL challenges and practice problems to enhance query writing and optimization skills.

Projects

#Project3: SCB-Athena RB Data Hub

Duration: 08/01/24 to 02/28/25

Technologies: Spark, Hive, SQL, HQL, Python, PySpark, Control-M, PyCharm, S3, Glue, Lambda

Description: Athena is a strategic program to create a centralized data hub for business analytics, starting with the retail unit and expanding to other business areas. Built on the Hive Big Data platform, it manages large-scale data processing while following strict guidelines to ensure data accuracy, performance, and compliance

Roles and responsibilities:

Design and implement scalable ETL workflows using Spark SQL to handle large datasets and complex transformations
Develop and optimize Spark and PySpark jobs for data ingestion, transformation, aggregation, and analytics
Write and fine-tune SQL and HiveQL queries to ensure high-performance data analysis and reporting.
Implement data partitioning, bucketing, and dynamic partitions in Hive for efficient query execution and data storage, create and manage Hive external tables and HDFS directories for streamlined data organization
Leverage AWS services like S3, Glue, and Lambda to automate and enhance data pipelines.
Monitor, troubleshoot, and optimize Spark job performance, cluster utilization, and ETL pipelines.
Use Control-M to schedule, monitor, and maintain workflows for seamless operations.
Collaborate with business teams to gather requirements and deliver scalable, analytics-ready data solutions
Ensure data governance, security, and compliance across all workflows
Document processes, workflows, and best practices to support knowledge sharing and team onboarding

#Project 2: Cloud Data Matrix

Duration:

Technologies: Spark, Snowflake, SQL, Python, PyCharm, S3, Glue, Lambda, Step Functions.

The Cloud Data Matrix is the central platform for data exchange within the organization, facilitating transformation and processing. Data stage jobs have been migrated to Spark SQL and Snowflake SQL, enhancing platform efficiency and adaptability for optimal data management and exchange

Roles and responsibilities

Write and optimize SQL queries for comprehensive data analysis.

Perform complex data processing using Apache Spark SQL.

Perform data transformations in Snowflake using SQL (joins, aggregations).

Use AWS Glue to run Spark jobs for data transformation and processing through Snow SQL.

Create ETL jobs with AWS Glue to extract data from various sources and store it in Amazon S3.

Create S3 buckets with data lake storage, lifecycle policies, and secure access controls. Build pipelines to load and unload data between Snowflake and AWS S3 storage.

Process data into Snowflake for analytical querying. Integrate Lambda functions with AWS services and monitor performance.

Ensure pipelines have error handling and monitoring mechanisms using AWS CloudWatch.

Project 1: Unilever Insight Hub

Duration:

Technologies: Apache Spark, Hive, SQL, HDFS, Sqoop, IntelliJ IDE, and Shell Scripting. Description: Unilever analyzes customer data from various sources, including social media, online reviews, loyalty programs, and sales transactions, to understand consumer preferences, behavior patterns, and sentiment. This data is used to tailor marketing campaigns, develop targeted product offerings, and improve overall customer experience, while also identifying inefficiencies and potential risks for better decision-making and enhanced supply chain management

Roles and responsibilities

Design and develop ETL workflows using Spark SQL to create denormalized tables

create Hive external tables with appropriate dynamic partitions,

Create HDFS directories to store data and Hive tables

Implement data integrity and data quality checks in Hadoop using Hive and Linux scripts

collect, aggregate, and move data from servers to HDFS.

create Hive tables, load data, and write Hive queries running in MapReduce way.

Load and transform structured data into various file formats (Avro, Parquet) in Hive

Use Sqoop to export analyzed data to relational databases for report generation

perform actions and transformations (wider and narrow transformations) based on project requirements

Timeline

DATA ENGINEER

Neostats Analytics Solution Pvt Ltd

08.2024 - 02.2025

CONSULTANT

Capgemini Technology Services India Limited

12.2021 - 05.2022

SENIOR ASSOCIATE

Wipro Pvt Ltd

01.2020 - 01.2022

B. Tech - Electronics Communication Engineering

Jawaharlal Nehru Technological University

Similar Profiles

ANIKET VAIDYAANIKET VAIDYA
Managing Consultant [Advance Analytics Project] at NeoStats Analytics LLCManaging Consultant [Advance Analytics Project] at NeoStats Analytics LLC
Saurav PandeySaurav Pandey
Machine Learning Engineer at Quantiphi Analytics Solution Pvt LtdMachine Learning Engineer at Quantiphi Analytics Solution Pvt Ltd
Deepa CTDeepa CT
PMO at Tredence Analytics Solution Pvt LimitedPMO at Tredence Analytics Solution Pvt Limited
Sunil KumarSunil Kumar
Sr. QA Engineer at SAIS IT Services Pvt. Ltd.Sr. QA Engineer at SAIS IT Services Pvt. Ltd.
PANDU RANGA RAO UPANDU RANGA RAO U
Senior Network Engineer at Bosch Global Software Technologies Pvt. Ltd.Senior Network Engineer at Bosch Global Software Technologies Pvt. Ltd.