Summary

Overview

Work History

Education

Skills

Certification

Accomplishments

Languages

Education

Timeline

Kunal Gautam

Principal Data Architect

Delhi

Summary

Kunal Gautam is a seasoned Principal Data Architect at AWS with over 16 years of global experience. Kunal possesses analytical mindset and diverse blend of experiences encompassing BigData development, Entrepreneurship, and Consulting. driving digital transformation for 45+ enterprises making them truly data driven organization spread across India, Europe, US, and the UK. He has successfully built Big Data platforms and setup teams for industry-leading Unicorns like Inmobi and Glispa, expanded Big Data consulting practices for Cloudera and AWS Germany, and co-founded the data-driven startup Windx. Kunal is a renowned thought leader, having delivered keynote speeches at major events such as Apache BigData summit-Budapest(2015, 100+ participants), Linux Con Korea (2015, 150+ Participants) and Kaggle-Munich(2018, 50+participants), Analyticon Seattle [2022,2023, 150+ attendees ], AWS Reinvent-Las Vegas [2022,2024 400+ attendees], AWS AKO Seattle [2019, 100+ Participants] and published 9 AWS Big Data Blogs and 140 BigData Personal Blogs(~1M views).

He is also a prolific writer, with 9 AWS Big Data blogs and 140 personal blogs amassing over 1M views. Known for his ability to deliver high-performance, scalable data solutions, Kunal has helped organizations worldwide unlock the full potential of their data to become truly data-driven.

Overview

years of professional experience

2011

years of post-secondary education

Certifications

Languages

Work History

Principal Data Architect

Amazon Web Services

Munich, Gurgaon

01.2019 - Current

Automotive Connected Vehicle Platform [~$8M, 200+ developers, Greenfield Project] -Pioneered Software Defined Vehicle revolution in India. Design, architected and lead end-to-end delivery of Connected Vehicle platform solution built on AWS using services like API Gateway, Apache Flink, Spark, Hudi, Athena, Redshift, Kafka, DynamoDB, Lambda, , Step Function, SageMaker, CDK for Hero Motors and Tata Motors encompassing 1M+ end users. Managed successful collaboration with a geo distributed team [India, Germany, and the US] of 7 Engineering Managers, 30+ AWS consultants, 10+ AWS service teams, 8 customer business stakeholders, and ~200 partner developers.
Manufacturing MSIL [ 1M$+, greenfield project] : Built an unique Dynamic and Parallel MLOp extendible Framework for MSIL. The solution enabled creating a MLOps pipeline to Build, Train and Infer 100K ML Models at scales encompassing phases of data cleaning, feature set creation, training, evaluation, model cataloging, inference and model monitoring. Lead the initiative with 5+ Customer stakeholder, 10+ partner developers , 5+ AWS Consultant to Architect, implement, test deploy decreasing customer time to production from a month to week.
Telecommunication Indus Tower [ 1.5 M$+, greenfield project]: Designed Data platform for Indus Tower by collaborating with 20+ partner developer , 3 Stake holder to build a petabyte scale data lake enabling batch, real time reporting from data ingested across 3.5 lakh towers (Billion+ events daily) which supported 500+ end consumer. The solution brought down RCA time by 30% and partner cost attribution reports by 70%.
Designed and build Continental Advanced Driver Assistance System [~$1.5M] Lead a team of 10+ AWS consultants, 20+ partner, 5+ customer stake holders spread across the globe. The solution help creating and Sassfied a Single Click deployable Geo Distributed data lake was able to ingest (100TB+ daily video data), store, catalog and run simulation over the video data in an elastic and scalable fashion. The solution resulted a cost saving of [>$1M+ annually] in terms on on-prem hardware cost and decrease time to market the solution.
Migration of Snapdeal (E-commerce, $2M+ ) data platform: Lead migration of Snap deal data platform from on Premise to AWS with zero downtime in a record time of less than 2 months. Lead a team comprising of Partner team(15+) + Snapdeal Data platform Team (15+) , AWS Service team (10+). Implemented POC, defined cutoff strategy and supported successful migration of 500+ node hadoop cluster [to EMR],1700+ Apache Kafka Topic [ 6 TB of data to MSK],1000+ custom batch Spark Jobs, 500 TB of data migrated from on-prem to AWS. Migration resulted in decreased hardware procurement time by 10X, gain cost and performance optimization by 30%.
Hindustan Uniliver Limited :

Systems Architect

Hortonworks Gmbh

10.2017 - 12.2018

Worked across a diverse range of industries and projects, enabling clients to ingress on their Big Data journey.

Worked with worlds largest Re-insurance and insurance clients in EMEA.
Conducted Global workshops on Spark, Map Reduce and Hive to accelerate the adoption of the Big Data Solutions.
Played key role in stabilizing the Hortonworks Data Platform at the client by finding a Memory leak issue with Hive Thrift server (HIVE-20192) after a deep dive and thorough analysis which had a direct impact in making a ~10 Million Euros Data Lake project a success.
Stabilizing Hive thrift server had a global impact among all the Hortonworks Data Platform customer using Hive as well as the open source community hence having indirect Multi Billion Euros business impact globally.
Training teams about the big data tech stack , intricacies involved and various stages from architecting, development , testing to implementation of complex distributed production environments
Use case development and job performance optimization.
Troubleshooting, optimization and administration of core Hadoop/HDP services, incl. Ranger, Ambari, HDFS, YARN, ZooKeeper, Oozie, Zeppelin, Hive, MapReduce, Tez, Sqoop, Spark, HBase

Big Data Engineer

Glispa Gmbh

06.2016 - 09.2017

Glispa Global Group is a mobile ad tech pioneer, empowering clients to activate global audiences and move markets. The most unique aspect of Glispa is its ability to work in Performance domain and yet able to generate revenue over 100+ Million Euros.

Deep Dive into Spark, Map Reduce, Druid, Hbase, ElasticSearch YARN, Oozie and HDFS.
Providing multiple Big Data talks in the organization to accelerate the adoption of big data.
Helping the Team in Designing and optimizing Glispa Audience Platform.
Data Analysis to find frauds and analyses quality of data.
Decreased the overall latency of the data feedback loop from 10 days to 6 hours.
Decreased Aerospike cluster from 12 nodes to 2 Node resulting in huge financial savings.
Introduced Idea of data Driven development which involved "Analyzing data to validate the effectiveness of a product, campaign both in real time and ability to forecast".

Technical Lead

Inmobi

02.2012 - 07.2016

Built the next generation near real-time Geo-distributed user platform that ingests ~10 billion events daily from 1.5 billion unique users interacting with Inmobi network. Solved multiple complex challenging problems around user hygiene, identifying high quality user signals from noisy data.

Designed and architect-ed multiple critical components of user platform, having direct revenue impact in online consumption path.

Revamped and introduced incremental mode of user targeting profile generation, replication and population across multiple clusters. Drastically improved performance by orders of magnitude.
Comparative evaluation of various NoSQL stores (Aerospike, Cassandra, HBase) for low-latency and high-throughput use-cases at scale. Key influencer in architectural decision making.
Being the First developer in the USER team of InMobi provided the unique opportunity to directly impact business outcomes using Big Data Solutions. Have worked extensively in architecting, developing, testing and deploying Recommendation solution at PetaByte scale.
Have excellent understanding of the Hadoop architecture, map-reduce paradigm, Hbase,Thrift service, Spark, Kafka.
Extensively worked on Distributed Key Value stores (Hbase,Aeropike) at InMobi.
Worked on optimization of Batch User Profiling MR job.
Was part of design/execution of data processing pipeline to process ~100 TB of data in a day.
Designed User Profile Extraction data pipeline to extract “user information” from Ad-serve logs. The data pipeline was able to extract information for 1.5 Billion users on daily Basis using tools like Map Reduce, Pig, Kafka, Aerospike and oozie.
InMobi being in the startup phase then required one to have lot of dedication, on-demand, work, team driving ,team binding and leadership attitude in order to deliver data driven
products. As and when required I have demonstrated each of these qualities in order to deliver the product to the customers.

System Software Developer

Akamai Technologies

07.2010 - 01.2012

Worked in Performance Analytics Team which builds a geo-districbuted tool for web site
performance metric analyzer known as Site-analyzer. Site-analyzer is a geo-distributed
web crawler and in the process of crawling gathers performance metrics of given websites.
The complete code base is written in C/C++.

Architected and developed a "YARN" like module for Akamai which was used for distributed compression of Video files, which can be used for seamless streaming to mobile devices.
Worked on Load balancing module and optimized the code flow to support affinity among the load balanced agents.
Modified DNS resolver (Akamai in house implementation) to support IPV6. Modified DNS packet structure for AAAA query and parsing IPV6 address in response
Understood the architectural perspective of Site-analyzer and how object oriented language like C++ enables one to map the architecture to a working product in the most efficient way.
Exposure to time critical code such as load balancing and IPV6 Dns resolver helped me understand memory layout of C/C++ programs and how to perform memory optimizations.

Education

Bachelor of Engineering - Information Science And Engineering

Rashtreeya Vidyalaya College Of Engineering

Mar 2006 - 03.2010

Skills

Big Data Platforms : AWS Cloud, Hortonworks

Programming Language : Java, Python

Technology : Amazon EMR, Glue, MSK, MSF, RedShift, Quicksight, DynamoDB, Lambda, Step Function, Hadoop, Spark, Flink, MapReduce, HBase

Pre-sales : Negotiation, BOM creation, Effort estimation, Delivery pricing and timeline , creating Statement of Work, Business Value Propisition

Tech : Architecting, Implementation, Cross Team collaboration

NOSQL : Apache Hbase, Amazon DynamoDB

Data Warehouse : Amazon Redshift, Apache Hive

Programming Language : Java, C, C, Python

Distributed Processing : Apache Spark, Apache Map Reduce,

Streaming :Apache kafka, Apache Spark Streaming, Apache Flink, Amazon Kinesis

Machine Learning: Amazon Sagemaker, Amazon Rekognition

Automation : CDK

Certification

AWS Solution Architect

Accomplishments

Blogs

BytePadding.com : http://bytepadding.com/
Amazon Transportation Service and Hudi https://aws.amazon.com/blogs/big-data/how-amazon-transportation-service-enabled-near-real-time-event-analytics-at-petabyte-scale-using-aws-glue-with-apache-hudi/
Amazon Keyspaces integration with AWS Glue https://aws.amazon.com/blogs/big-data/how-william-hill-migrated-nosql-workloads-at-scale-to-amazon-keyspaces/
Apache HUDI on EMR https://aws.amazon.com/blogs/big-data/new-features-from-apache-hudi-0-9-0-on-amazon-emr/
Facebook page : https://www.facebook.com/bytepadding/

Open Source Code Aerospike

IntegrationTest Framework : https://github.com/maverickgautam/Aerospike-unit

Profile

Stack OverFlow : https://stackoverflow.com/users/4839157/krazygautam
Hortonwork Community : https://community.hortonworks.com/users/44302/kgautam.html

Keynote Speaker at Multiple Data Conferences Globally

Keynote speaker at AWS worldwide tech kickoff @Chicago : Jan 2020 Creating modern data lake using Apache Hudi on EMR
Key Note Speaker at Kaggle @Munich : Sep 2018 Using distributed processing for speeding up Data Science algorithms https://www.meetup.com/Kaggle-Munich/events/251620630/
Key Note Speaker at Kaggle @Munich . Jul 2018 Spark for Data Science https://www.meetup.com/Kaggle-Munich/events/250963570
KeyNote Speaker at Linux Con @Korea Oct 2015 https://korealinuxforum2015.sched.com/speaker/kunal_gautam.6ltm0ru
KeyNote Speaker at BigData Confernce @Budapest Sep 2015 https://apachebigdata2015.sched.com/speaker/kunal_gautam.6ltm0ru

Languages

Excellent,5,Basic,1,Excellent,5

Education

Bangalore

Timeline

Principal Data Architect

Amazon Web Services

01.2019 - Current

Systems Architect

Hortonworks Gmbh

10.2017 - 12.2018

Big Data Engineer

Glispa Gmbh

06.2016 - 09.2017

Technical Lead

Inmobi

02.2012 - 07.2016

System Software Developer

Akamai Technologies

07.2010 - 01.2012

Bachelor of Engineering - Information Science And Engineering

Rashtreeya Vidyalaya College Of Engineering

Mar 2006 - 03.2010