Summary
Overview
Work History
Education
Skills
Hobbies and Interests
Languages
Timeline
Generic

Venkata Sai Kalyan Mahankali

Hyderabad

Summary

Dynamic Data Engineer with 3+ years of experience specializing in cloud technologies, data integration, and automation using AWS Cloud Services, PySpark, and Apache Airflow. Passionate about building scalable data solutions and automating processes to improve data flow and accessibility. Strong background in ETL pipeline design, data curation, and real-time processing, with a commitment to enhancing data quality and usability. Adept at problem-solving and driving data-driven decision-making, with a focus on creating high-performance systems that support business growth.

Overview

3
3
years of professional experience

Work History

Data Engineer

Modak Analytics
Hyderabad
03.2025 - Current
  • Performed data normalization using various data cleaning and transformation techniques to ensure consistency and quality.
  • Performed knowledge extraction to standardize and structure data for efficient loading into the Neo4j graph database.

Data Engineer

Modak Analytics
Hyderabad
05.2023 - 03.2025
  • Implemented data modeling for tables in RDS, designing architecture for scalability, efficiency, and schema consistency.
  • Developed and optimized crawlers for structured and unstructured data, automating metadata extraction, schema validation, and ingestion workflows, and storing metadata dynamically in RDS.
  • Extracted metadata at multiple levels, ensuring schema-aware ingestion pipelines and seamless data tracking.
  • Designed and optimized Apache Airflow pipelines, building DAGs to automate workflows, managing dependencies, and ensuring fault tolerance.
  • Ingested structured and unstructured data, ensuring proper handling, storage, and processing while maintaining data integrity. Mapped source data types to Hive for accurate schema tracking at ingestion.
  • Developed robust data validation and monitoring, including source drift detection, schema drift analysis, and automated data quality checks.
  • Performed data profiling and threshold verification, using Python for small datasets and Spark for large datasets, reducing CPU utilization while ensuring data quality.
  • Implemented S3-based data quality checks before Hive ingestion, reducing data inconsistencies by 90% and cutting reprocessing by 70%.
  • Optimized cost efficiency in Apache Airflow, reducing infrastructure costs by 40% through resource-aware execution and parallelization.
  • Resolved inefficiencies in dynamic task mapping within task groups, where excessive workers were created for each dataset. Developed custom worker operator to limit active workers, dynamically select datasets from RDS, and prevent duplicate processing, reducing resource overhead and improving performance.
  • Designed template-based DAG flow in Apache Airflow (e.g., HTTP, S3) to standardize workflows, enabling automatic DAG generation from RDS inserts based on template type. Reduced manual effort and enabled bulk updates for consistency and easier maintenance.
  • Monitored and troubleshot Apache Airflow pipelines, implementing orchestration strategies for efficient processing and dynamic resource allocation.

Data Engineer

Modak Analytics
Hyderabad
08.2022 - 05.2023
  • Crawled multiple data sources, established connections, extracted metadata, and leveraged it to build ingestion pipelines.
  • Connected to various data sources, extracted metadata, and ensured seamless ingestion into the system.
  • Diagnosed and resolved access and network issues while ingesting data from structured and unstructured sources, such as FTP, SFTP, NFS File Share, SMB, Unix Shares, and AWS S3 Buckets.
  • Performed data cleaning and transformation based on data science requirements, developing pipelines to meet analytical and operational needs.

Education

Bachelor of Technology - Information Technology

Sree Vidyanikethan Engineering College
Tirupati
05-2022

Skills

  • PySpark
  • Python
  • SQL
  • ETL
  • Airflow
  • DataBricks
  • AWS (S3, IAM, EC2, Lambda, Glue, Athena, Kinesis, SNS)
  • Git
  • Linux

Hobbies and Interests

  • Watching Movies/Series
  • Playing Cricket and Badminton
  • Playing Mobile Games

Languages

  • English, Full Professional Proficiency
  • Hindi, Professional Working Proficiency
  • Telugu, Native or Bilingual Proficiency

Timeline

Data Engineer

Modak Analytics
03.2025 - Current

Data Engineer

Modak Analytics
05.2023 - 03.2025

Data Engineer

Modak Analytics
08.2022 - 05.2023

Bachelor of Technology - Information Technology

Sree Vidyanikethan Engineering College
Venkata Sai Kalyan Mahankali