Summary
Overview
Work History
Education
Skills
Languages
Certification
Timeline
Generic

TARANPREET SINGH

Bengaluru

Summary

Experienced and results-driven IT professional with over 5 years of expertise spanning telecom, social media, and GenAI domains. Proficient in leveraging cutting-edge tools such as Terraform for infrastructure management, Kubernetes for container orchestration, and LGTM observability solutions. Proven track record of optimizing system reliability, reducing costs, and enhancing application performance through innovative problem-solving and strategic implementations. Adept at fostering cross-functional collaboration, driving efficient CI/CD pipelines, and scaling AI/ML infrastructure to meet demanding performance benchmarks. Committed to ensuring high availability, efficiency, and adaptability across diverse technological landscapes.

Overview

5
5
years of professional experience
1
1
Certification

Work History

Senior Site Reliability Engineer

Asapp
Bengaluru
12.2022 - Current
  • Executed a seamless migration from Datadog to Grafana Cloud, resulting in a 60% reduction in observability costs annually, saving the company $500K.
  • Led the migration of infrastructure from single-tenant to multi-tenant architecture, resulting in significant cost savings, enhanced system stability, and improved efficiency. Overall, it helps the company improve performance by 40% and cost savings by 33%.
  • Implemented GPU-based scaling for ML models, achieving a 30% reduction in inference time.
  • Ensured high availability of services by developing comprehensive disaster recovery plans and backup procedures.
  • Optimized performance by tuning system parameters and troubleshooting issues.
  • Implemented CI/CD pipelines for seamless ML model deployment and updates, which helped to improve the deployment time and performance by 30%.
  • Monitored and maintained AI models in production environments, which results in proactive action if needed for the application.
  • Built and maintained a robust data pipeline infrastructure for ML model training and inference.
  • Collaborated closely with data scientists to understand requirements and deliver infrastructure solutions.
  • Developed monitoring and logging solutions to track AI model performance and infrastructure health.
  • Scaled AI model performance by 40% using GPU-based scaling.
  • Developed disaster recovery plans, ensuring high service availability.
  • Built CI/CD pipelines for ML models, and collaborated with data scientists to optimize infrastructure.

DevOps Engineer

SHARECHAT
Bengaluru
06.2021 - 11.2022
  • Reduced the observability cost by $50k yearly by implementing the optimal and highly available solutions.
  • Optimized the inter- and intra-network cost by enabling compression and features from Kubernetes (topology-aware routing), which resulted in a 20% saving in monthly bills.
  • Developed an observability stack from scratch to monitor and log the health of production systems, reducing downtime by 50%.
  • Implemented the alerting using the pipeline in a production environment, resulting in a 90% decrease in downtime.
  • Successfully automated software deployments using Helm template rendering, reducing deployment time by 25%.
  • Implemented an in-house logging solution, which was able to stream and store millions of logs, filter them, and convert them to a readable format with meaningful information.
  • Experience in setting up Confluent Kafka as an event streaming platform.
  • Created the Reliability Dashboard for the organization.
  • Create the SLO dashboard for the organization.

DevOps Engineer

Jio
Mumbai
07.2019 - 07.2021
  • Migrated a monolith application into a microservices architecture.
  • Successfully containerized a 12-service monolith application using Docker, easing the code development and deployment pipeline.
  • Automate Spark application deployments across multiple environments.
  • Develop a CI/CD pipeline using Jenkins to ensure robust and scalable deployment.
  • Ability to work closely with teams in order to ensure high quality and timely delivery of builds and releases.
  • Successfully scaled the big data platform to support 10X growth in data volume, while maintaining performance levels.
  • Perform in-depth post-mortem analyses, and propose solutions to improve system reliability.
  • Reduced system downtime by implementing a proactive monitoring and alerting system that identified potential.
  • Issues before they became problems.
  • Reduced the time for each build by 30% through optimizing the existing process and infrastructure.
  • Monitored and maintained 99.99% uptime for all systems under my purview.
  • Automated server provisioning and configuration management, resulting in faster deployment, and easier maintenance.
  • Upgraded and maintained system software, resulting in improved system performance and security

Education

Bachelor of Technology - Computer Science

Guru Nanak Dev Engineering College
Ludhiana
05.2019

Diploma - Computer Science

Guru Nanak Dev Polytechnic College
Ludhiana
06.2016

Skills

  • Amazon AWS
  • GCP
  • GenAI
  • Kubernetes
  • Terraform
  • Observability
  • Kafka
  • Infrastructure as Code
  • Continuous Integration and Deployment (CI/CD)
  • MLOps
  • Prometheus
  • Python
  • Data infrastructure
  • Scripting languages
  • Linux administration

Languages

  • English
  • Hindi

Certification

AWS Certified Solutions Architect - Associate

Certified Kubernetes Administrator

Timeline

Senior Site Reliability Engineer

Asapp
12.2022 - Current

DevOps Engineer

SHARECHAT
06.2021 - 11.2022

DevOps Engineer

Jio
07.2019 - 07.2021

Bachelor of Technology - Computer Science

Guru Nanak Dev Engineering College

Diploma - Computer Science

Guru Nanak Dev Polytechnic College
TARANPREET SINGH