Summary
Overview
Work History
Education
Skills
Timeline
Generic

Harshanth DS

Chennai

Summary

Having 8+ years of IT experience and relevant of 5.8 years in Big Data Ecosystems, expertise in Big Data technologies – Hadoop, Hive, Spark, SQL, Scoop and Ozzie. Adept at leveraging Google Cloud Platform (GCP) for scalable data solutions including data ingestion, migration, processing and storage. Productive employee with proven track record of successful project management and producing quality outcomes through leadership and team motivation.


  • Proficient in creating and managing Hive tables including managed, external, and partitioned tables, with support for schema evolution and handling of Avro, Parquet, and ORC file formats.
  • Skilled in data migration from on-premise RDBMS and mainframe systems to GCP BigQuery using Sqoop, including full and incremental loads with built-in data validation and cleansing.
  • Developed scalable data pipelines on GCP using BigQuery, Dataflow, Dataproc, Pub/Sub, and Cloud Composer (Airflow) to support large-scale data ingestion, transformation, and enrichment.
  • Adept at building pipelines involving data munging, wrangling, curation, enrichment, and applying analytical and windowing functions to generate business-ready datasets.
  • Expertise in Apache Spark using RDDs and DataFrames to process structured and unstructured data, with strong proficiency in Spark SQL for writing complex transformations and business logic.
  • Performed performance optimization using partitioning, bucketing, compression, predicate pushdown, memory tuning, and broadcast variables to ensure efficient Spark and Hive execution.
  • Experienced in automating and scheduling data workflows using Oozie and Cloud Composer, supporting both batch and near real-time processing requirements.
  • Hands-on experience handling semi-structured data (CSV, JSON, XML) in Spark and Hive, including serialization/deserialization, and managing storage with HDFS and GCS.

Overview

9
9
years of professional experience

Work History

Senior Data Engineer

Tiger Analytics
12.2024 - Current

Client : PesiCo

Domain: Data Governance

Team Size: 10

Environment: Python, FastAPI, OpenAI, Azure OpenAI, LangChain, LangGraph, Pandas, Pydantic, REST APIs, Git, GitHub Actions.

  • Developed an Assisted Data Module for PepsiCo to automate metadata generation and source-to-target mapping using Generative AI (LLM) with FastAPI, LangChain, and LangGraph, enhancing data lineage traceability and accelerating onboarding of new data assets.
  • Built LLM-powered agents to extract schema intelligence and semantic relationships from source systems, enabling smart metadata enrichment and integration workflows.
  • Designed and optimized data ingestion workflows using Cloud Dataflow to process and load streaming data from sources like Apache Kafka or Cloud Pub/Sub into BigQuery in near real-time.
  • Implemented Change Data Capture (CDC) mechanisms to capture real-time data changes from relational databases and stream them into BigQuery for up-to-date analytics and reporting.

Principal Software Engineer - Data

Maveric Systems Private Limited
08.2023 - 11.2024

Client : Citi Bank

Domain: Banking

Team Size: 20

Environment: Hadoop, HDFS, Hive, PySpark, Spark SQL, Cloudera, GCP, AirFlow, Terraform, Qlik Replicate, DDIT

Project Details:

  • Built end-to-end data pipelines using munging, wrangling, curation, enrichment, and windowing techniques; created views for Data Governance and handled SCD Types 1, 2, and 3.
  • Migrated Hadoop Spark jobs to GCS and DataProc; designed optimized Spark jobs for large-scale processing.
  • Used Cloud Composer (Airflow) for orchestration; integrated with BigQuery, GCS, and Pub/Sub; stored outputs in the curated layer.
  • Used Qlik Replicate to load Oracle/MySQL into BigQuery; leveraged Terraform for GCP resources; built ingestion framework for Rocket data into EAP, creating 3 key tables.

Big Data Developer

Optimum Infosystems PVT LTD
12.2021 - 08.2023

Client : Standard Chartered

Domain: Banking

Team Size: 20

Environment: Hadoop, Hive, SQL, PySpark, Airflow, Jupyter, Putty, Oracle, Unix, GIt, Control-M

Project Details:

  • Conducted TPSA-MFU security risk assessments to evaluate Information and Cyber Security (ICS) controls for third parties with access to the bank's non-public information.
  • Managed MOSAIC | HR Feed Sourcing, extracting HR data from the COO Data Lake and loading it into the MOSAIC system for streamlined data integration.
  • Led LK ITRS regulatory project for the Central Bank of Sri Lanka, ensuring compliance by reporting all cross-border and domestic foreign currency transactions using ITRS-specified formats.
  • Implemented ACC Dashboard, integrating scan and schedule data to provide a high-level compliance view of CAAT controls across applications.

Big Data Developer

CUBE45 ECOMMERCE PVT LTD
11.2019 - 12.2021

Client : ICICI Lombard

Domain: Ecommerce, Yield , Price Management

Team Size: 10

Environment: Hadoop, Hive, SQL, Scala Spark, Airflow, Jupyter, Putty, Oracle, Unix, Git

Project Details:

  • Collaborated on a sales analysis application for Co-operative Group Ltd., one of the UK's largest retailers, focusing on analyzing sales and profits across major brands and promotional schemes using Hadoop's historical data.
  • Skilled in Hive script development, including creating and altering databases and tables, with a strong understanding of Hive execution for performance optimization.
  • Handled various file formats such as CSV, AVRO, XML, JSON, and PARQUET for efficient data processing.
  • Developed and optimized Sqoop jobs for incremental data loads from heterogeneous RDBMS to HDFS.

Database Developer

Cube45 ECommerce Pvt. Ltd
09.2018 - 10.2019

EWatch, IT Infrastructure Management Platform

Client: Fullerton

Domain: Retail

Team Size: 20

Environment: Oracle, Sqoop, Hive, SQL, Oozie, Hadoop, HDFS

Project Details:

  • Contributed to MWatch, an integrated IT infrastructure management platform, by writing DDL and DML scripts for data transformation and populating target tables.
  • Automated data loading tasks in HDFS using Control-M workflows and shell scripts, and developed Oozie workflows for scheduled batch processing and reporting.
  • Experienced in HiveQL, creating external tables, moving data between layers, and performing data enrichment tasks like filtering, sorting, and aggregation.

Software Executive

Cube45 ECommerce Pvt. Ltd
07.2016 - 08.2018

Talisma (CRM) Management System

Client: ICICI Lombard

Package: Customer Relationship Management (CRM)

Team Size: 10

Environment: Vb. Net, Win form, SQL Server 2000, Oracle 9i, CRM tool

Project Details:

  • Worked on Talisma CRM to customize and extend its capabilities for the insurance domain, utilizing ActiveX technologies and the Talisma SDK to meet specific business requirements.
  • Encapsulated core business logic within components to ensure seamless integration and functionality across the CRM platform.

Education

Bachelor of Science (B.Sc.) - Computer Science

Annamalai University
2016

Skills

    Big Data Ecosystem: Hadoop, Spark, MapReduce, HDFS, Hive, Oozie, Sqoop, NiFi

    Cloud Platforms: GCP (BigQuery, Dataproc, GCS, Cloud Composer, Pub/Sub, Dataflow), Azure (App Service), Azure OpenAI

    Programming Languages: Python

    Frameworks & Tools: FastAPI, LangChain, LangGraph, Pandas, Pydantic, REST APIs, Git, GitHub Actions,

    Databases: Oracle, SQL Server

    Workflow Orchestration: Airflow, Cloud Composer, LangGraph, Oozie

    Methodologies: Agile

    Operating Systems: Linux/Unix, Windows

Timeline

Senior Data Engineer

Tiger Analytics
12.2024 - Current

Principal Software Engineer - Data

Maveric Systems Private Limited
08.2023 - 11.2024

Big Data Developer

Optimum Infosystems PVT LTD
12.2021 - 08.2023

Big Data Developer

CUBE45 ECOMMERCE PVT LTD
11.2019 - 12.2021

Database Developer

Cube45 ECommerce Pvt. Ltd
09.2018 - 10.2019

Software Executive

Cube45 ECommerce Pvt. Ltd
07.2016 - 08.2018

Bachelor of Science (B.Sc.) - Computer Science

Annamalai University
Harshanth DS