Summary
Overview
Work History
Education
Skills
Websites
Accomplishments
Timeline
Generic

ADITYA VERMA

Gurgaon

Summary

Practical database engineer possessing in-depth knowledge of data manipulation techniques and computer programming paired with expertise in integrating and implementing new software packages and new products into system. Offering several-year background managing various aspects of development, design and delivery of database solutions. Tech-savvy and independent professional bringing outstanding communication and organizational abilities.

Overview

8
8
years of professional experience

Work History

Senior Data Engineer

Gupshup Technologies
Gurgaon
09.2022 - Current
  • Building realtime scalable CDP (Customer Data Platform) pipelines to ingest and process large datasets using Datalake "HUDI" on AWS Kubernetes with using tech stack like Flink JAVA , Hudi, Pyspark, MongoDB, CI/CD (gitlab),GitHub
  • Delta lake Implementation using Open Source Technologies like APACHE HUDI
  • Reduced Time Lag of the ETL pipeline from 6 hours to 15 mins using bucket level partitioning, increased pipeline efficacy for real time Campaign triggers optimised
  • Faster Data Ingestion into NOSQL Database like MongoDB using efficient indexing strategies
  • Supporting 800 million profiles in CDP using efficient scaling and coding practices
  • Optimal Scaling of the entire pipeline by running load tests to figure out the bottle necks and thresholds and scaling kafka and flink jobs by implementing efficient Kafka and Flink job Partitioning

Senior Data Engineer/Architect

Greenlight Planet
Gurgaon
03.2021 - 09.2022
  • Developing and Designing Big Data Pipelines on AWS stack for Real time and Batch Processing using PySpark, Aws glue,EMR, lambda,S3 and Redshift, Python ,Airflow
  • Optimizing Redshift Queries using appropriate distribution keys and sortkeys to improve CPU usage from 100% to 70%-65%
  • Fixed major issues like table locking in redshift using Apache Airflow and Implementing Lambda Functions to trigger Jobs on Aws glue for near real time data in minibatches in S3
  • Faster availability of the data to agents for the business decision which helped 20 % more profit on field by optimizing Redshift Queries
  • Implementing EMR spot instances as an alternative to GLUE to lower down redshift utilization and for running big batch jobs and optimizing the cost and Redshift dependency

Data Engineer

Sirionlabs
Gurgaon
02.2020 - 03.2021
  • Built End to End data pipeline to build interactive dashboard based on region for different clients using Kafka, Apache Nifi, Apache Druid and Apache Superset
  • Implemented row level security in the backend of the Apache Superset to avoid data breaching and provide access to data respective to their own clients
  • Optimized the Segments to their respective sizes constructed in Apache Druid for faster data query to achieve real time dashboard refreshes and performance
  • Analyzed Apache Druid Database vs Imply Database to achieve in depth knowledge about the queries executed on Database in order improve performance of the queries Analyzed Application, Ngnix server logs using filebeat and ELK stack and made Visualization using Kibana

Premier Field Engineer(Data & AI)

Microsoft
WA
03.2019 - 06.2019
  • Helped with MIP Azure Databricks labs and Demos to find bugs and update the outdated material Playing a key contributor role in on-site Big Data consulting projects requiring up to 75% travel Designed scripts for automating the workflow for loading data into respective Hive Tables for Dashboard creation

Data Engineer

AT&T Big Data LA/SF
CA
01.2017 - 01.2019
  • Built Complete pipeline for ETL with Hadoop eco-system Using Sqoop, Hive, Pig, Flume, Linux, Kafka, HBase Generated Spark event logs to fix the memory leak which made increase of 947% in the efficiency
  • Developed SQOOP scripts for incremental data ingestion from the Relational Data Source into HDFS
  • Optimized Hive Tables for faster execution by using ORC format and compression like snappy to 5 times faster Pig Script for faster execution ,transferring the output back to HDFS for transferring Data to HBASE and other NoSQL databases

Education

M.S. In Information Systems -

California State University
Los Angeles
03.2016

B.S. in Electronics Communications -

MMU, Mullana
07.2013

Skills

  • Big Data Ecosystems: Hadoop, Sqoop, Hive, Pig, Flume, Oozie, Kafka, kinesis, Map-Reduce, Spark SQL
  • Shell Programming Languages: Java, Python
  • Big Data Platform: Windows Azure, Cloudera CDH5X, Hortonworks Sandbox, Aws s3, Aws Glue
  • Relational Database: MySQL, SQL
  • NoSQL Database: HBase, Cassandra, Apache Druid, Amazon Redshift
  • Business Intelligence Tools: Tableau, MS Excel, Power Query, GIS tools, Apache Superset
  • Tools: GitHub, GitLab, Jenkins

Accomplishments

  • Path finder award for optimizing the End-to-End flow with Efficient Partitioning and Load testing

Timeline

Senior Data Engineer

Gupshup Technologies
09.2022 - Current

Senior Data Engineer/Architect

Greenlight Planet
03.2021 - 09.2022

Data Engineer

Sirionlabs
02.2020 - 03.2021

Premier Field Engineer(Data & AI)

Microsoft
03.2019 - 06.2019

Data Engineer

AT&T Big Data LA/SF
01.2017 - 01.2019

M.S. In Information Systems -

California State University

B.S. in Electronics Communications -

MMU, Mullana
ADITYA VERMA