Software Engineer with 10+ years of industry experience, currently working as SDE 2 in the EMR Open Data Analytics Hive team at AWS, driving performance and feature enhancements for Apache Hive on EMR. Previously contributed to large-scale data platforms at BookMyShow, Moonfrog Labs, and 1MG. Experienced in designing and optimizing distributed systems, big data pipelines, and cloud infrastructure. Skilled in Java, Golang, Python, and passionate about big data, distributed systems, software architecture, scalability, and open source.
Working in dataplane team for (Apache) Hive under open data analytics org of AWS EMR.
User Profiles - Designed and developed scalable grpc APIs for various aggregate profiles like Behavior , transaction , content profiles powering features like user personalization and targeted ads. Currently operating under a load about 40k rpm .
User Segmentation Framework- Contributed to the development of in-house developed user segmentation framework designed for facilitating features like user bucketing and funneling.
Developed query engine over elastic for transforming different logical combination of business funnels into elastic queries .
Data pipeline - Acting as a key player in the team responsible for adding various features , improving ,maintaining Moonfrog's data pipeline cluster handling ~40M/ minute unique data events , ~3.2TB/day data by volume,
Major Feature contributed to include
Design and development of autoscaling capability for the stateful, distributed data pipeline .which helped to handle traffic surge efficiently when pipeline traffic suddenly became nearly 4X during Lockdown (from serving 15B events/day to ~58B events/day)
Leagues Service as a Platform - Developed leagues service(a feature for increasing user engagement ) as a platform and also integrated and released for TPG game , currently serving ~4M DAU (Daily Active Users)
Migration of Stat server to Kubernetes - Centralised stat server cluster (handling ~ 20k request per sec) containerised and deployed on Kubernetes using AWS EKS , helpful for saving redundant maintenance effort and cost.
Data Pipeline - Responsible for adding various features , improving ,maintaining Moonfrog's high scale data pipeline cluster handling ~40M/ minute unique data events , ~3.2TB/day data by volume, maintained in redshift backed in s3.
Data lake query Capability - Migrated existing CSV data to parquet using AWS EMR and added direct query capability from data lake(s3) using AWS Athena.
SDKs and Dashboards - Developed various SDKs like Stats client SDK , League's SDK , stat server SDK , RTS SDK(for tracking game concurrents ) etc and in house Dashboards(for tracking different business metrics)
Worked in Preorder team(responsible for everything backend till order placement), in a Microservice environment as a sole owner of major business units or services.
Major projects developed or contributed to :-
Backend services for Apps- Worked on various services responsible for serving initial configs, articles, handling push notifications etc for 1MG app.
Payments - Worked on payments , 1MG wallet, involving third party wallet integrations and payment handling on 1MG app and website.
Microservice Framework- Enhanced in-house developed Microservice framework Vyked by adding various features like graceful service restarts , improved logging, timeouts etc.
Catalog - Worked on 1Mg catalog, service responsible for serving OTC categories and products also developed portal for adding and modifying categories and products for the category managers.
Databases - Postgres, Sql, Memsql, Influx, Couchbase
Queuing & Scheduling - NSQ, redis , Kafka, Aws SQS
Big Data - Hive ,Tez ,Trino ,Spark ,AWS Redshift , Es-Hadoop, Yarn
Infra , build & deployment Tools- Good exposure to AWS stack (EC2, EKS, ECS,ECS, ELBs, EBS,Route 53, AMIs,Security Groups ,IAMs, Lambda etc),Kubernetes , Docker (responsible for introducing and setting up cluster from scratch using EKS),Terraform (responsible for introducing and using for pipeline autoscaling),Jenkins
Monitoring & alerting - Familiarity with grafana, ELK stack, prometheus, Nagios, Monit, Supervisor, Aws SNS, Aws Cloudwatch