Summary

Overview

Work History

Education

Skills

Websites

Accomplishments

Timeline

ADITYA VERMA

Gurgaon

Summary

Practical database engineer possessing in-depth knowledge of data manipulation techniques and computer programming paired with expertise in integrating and implementing new software packages and new products into system. Offering several-year background managing various aspects of development, design and delivery of database solutions. Tech-savvy and independent professional bringing outstanding communication and organizational abilities.

Overview

years of professional experience

Work History

Senior Data Engineer

Gupshup Technologies

Gurgaon

09.2022 - Current

Building realtime scalable CDP (Customer Data Platform) pipelines to ingest and process large datasets using Datalake "HUDI" on AWS Kubernetes with using tech stack like Flink JAVA , Hudi, Pyspark, MongoDB, CI/CD (gitlab),GitHub
Delta lake Implementation using Open Source Technologies like APACHE HUDI
Reduced Time Lag of the ETL pipeline from 6 hours to 15 mins using bucket level partitioning, increased pipeline efficacy for real time Campaign triggers optimised
Faster Data Ingestion into NOSQL Database like MongoDB using efficient indexing strategies
Supporting 800 million profiles in CDP using efficient scaling and coding practices
Optimal Scaling of the entire pipeline by running load tests to figure out the bottle necks and thresholds and scaling kafka and flink jobs by implementing efficient Kafka and Flink job Partitioning

Senior Data Engineer/Architect

Greenlight Planet

Gurgaon

03.2021 - 09.2022

Developing and Designing Big Data Pipelines on AWS stack for Real time and Batch Processing using PySpark, Aws glue,EMR, lambda,S3 and Redshift, Python ,Airflow
Optimizing Redshift Queries using appropriate distribution keys and sortkeys to improve CPU usage from 100% to 70%-65%
Fixed major issues like table locking in redshift using Apache Airflow and Implementing Lambda Functions to trigger Jobs on Aws glue for near real time data in minibatches in S3
Faster availability of the data to agents for the business decision which helped 20 % more profit on field by optimizing Redshift Queries
Implementing EMR spot instances as an alternative to GLUE to lower down redshift utilization and for running big batch jobs and optimizing the cost and Redshift dependency

Data Engineer

Sirionlabs

Gurgaon

02.2020 - 03.2021

Built End to End data pipeline to build interactive dashboard based on region for different clients using Kafka, Apache Nifi, Apache Druid and Apache Superset
Implemented row level security in the backend of the Apache Superset to avoid data breaching and provide access to data respective to their own clients
Optimized the Segments to their respective sizes constructed in Apache Druid for faster data query to achieve real time dashboard refreshes and performance
Analyzed Apache Druid Database vs Imply Database to achieve in depth knowledge about the queries executed on Database in order improve performance of the queries Analyzed Application, Ngnix server logs using filebeat and ELK stack and made Visualization using Kibana

Premier Field Engineer(Data & AI)

Microsoft

03.2019 - 06.2019

Helped with MIP Azure Databricks labs and Demos to find bugs and update the outdated material Playing a key contributor role in on-site Big Data consulting projects requiring up to 75% travel Designed scripts for automating the workflow for loading data into respective Hive Tables for Dashboard creation

Data Engineer

AT&T Big Data LA/SF

01.2017 - 01.2019

Built Complete pipeline for ETL with Hadoop eco-system Using Sqoop, Hive, Pig, Flume, Linux, Kafka, HBase Generated Spark event logs to fix the memory leak which made increase of 947% in the efficiency
Developed SQOOP scripts for incremental data ingestion from the Relational Data Source into HDFS
Optimized Hive Tables for faster execution by using ORC format and compression like snappy to 5 times faster Pig Script for faster execution ,transferring the output back to HDFS for transferring Data to HBASE and other NoSQL databases

Education

M.S. In Information Systems -

California State University

Los Angeles

03.2016

B.S. in Electronics Communications -

MMU, Mullana

07.2013

Skills

Big Data Ecosystems: Hadoop, Sqoop, Hive, Pig, Flume, Oozie, Kafka, kinesis, Map-Reduce, Spark SQL
Shell Programming Languages: Java, Python
Big Data Platform: Windows Azure, Cloudera CDH5X, Hortonworks Sandbox, Aws s3, Aws Glue
Relational Database: MySQL, SQL

NoSQL Database: HBase, Cassandra, Apache Druid, Amazon Redshift
Business Intelligence Tools: Tableau, MS Excel, Power Query, GIS tools, Apache Superset
Tools: GitHub, GitLab, Jenkins

Websites

https://www.linkedin.com/in/aditya-verma-462648a6

Accomplishments

Path finder award for optimizing the End-to-End flow with Efficient Partitioning and Load testing

Timeline

Senior Data Engineer

Gupshup Technologies

09.2022 - Current

Senior Data Engineer/Architect

Greenlight Planet

03.2021 - 09.2022

Data Engineer

Sirionlabs

02.2020 - 03.2021

Premier Field Engineer(Data & AI)

Microsoft

03.2019 - 06.2019

Data Engineer

AT&T Big Data LA/SF

01.2017 - 01.2019

M.S. In Information Systems -

California State University

B.S. in Electronics Communications -

MMU, Mullana

ADITYA VERMA

Summary

Overview

Work History

Senior Data Engineer

Senior Data Engineer/Architect

Data Engineer

Premier Field Engineer(Data & AI)

Data Engineer

Education

M.S. In Information Systems -

B.S. in Electronics Communications -

Skills

Websites

Accomplishments

Timeline

Senior Data Engineer

Senior Data Engineer/Architect

Data Engineer

Premier Field Engineer(Data & AI)

Data Engineer

M.S. In Information Systems -

B.S. in Electronics Communications -

Similar Profiles

Preetham KenganalPreetham Kenganal

Amandeep Amandeep null

Preetham KenganalPreetham Kenganal

Nikhil NandaNikhil Nanda

Kuljit SinghKuljit Singh