Summary
Overview
Work History
Education
Skills
Timeline
Certification
Languages
Personal Information
Profile Summary
Generic
RISHI KAUSHAL

RISHI KAUSHAL

Senior DevOps Manager
Noida

Summary

Strategic and results-driven MLOps Engineer, Data Engineer, and DevOps Manager with 13 years of experience in Cloud, DevOps, and Data Engineering, including 5 years of expertise in MLOps & Data Engineering. Adept at optimizing AI/ML models, automating deployments, and building scalable cloud infrastructures. Seeking a leadership role to drive AI-driven automation, scalable data solutions, and efficient DevOps processes.

MLOps & Data Engineer with 5+ years of experience in designing end-to-end ML pipelines, AI model deployment, and scalable data architectures. DevOps & Cloud Engineering Expert with 8+ years of experience in AWS, Azure, Kubernetes, Terraform, CI/CD, and automation. AI/ML Model Deployment & Monitoring expertise with Kubeflow, MLflow, Airflow, Databricks, and Unity Catalog. Proven track record in migrating on-prem to cloud, building scalable data pipelines, and optimizing AI workloads. Leadership experience in managing DevOps, Data Engineering, and MLOps teams, driving automation and efficiency.

Overview

13
13
years of professional experience
3
3
Certificates
2009
2009
years of post-secondary education

Work History

Senior DevOps Manager | Cloud & MLOps Leader

EPAM Anywhere
10.2023 - 03.2024
  • Reduced deployment times by 70% through automation
  • Achieved zero-downtime releases using blue-green and canary deployments
  • Improved microservices performance through distributed tracing and log analytics
  • Reduced infrastructure provisioning time by 80% with automated IaC workflows
  • Ensured 99.99% uptime by implementing automatic failover and disaster recovery
  • Migrated infrastructure from on-premise to Azure using Terraform & PowerShell scripts
  • Deployed applications using Azure WebApp & Azure DevOps
  • Deployed applications on EKS cluster
  • Wrote Ansible modules for configuration management of infrastructure
  • Led the implementation of a cloud-based data warehousing solution on Azure Synapse Analytics
  • Designed and developed data pipelines to extract, transform, and load data from multiple sources into the data warehouse
  • Project: CI/CD Pipeline Automation - Developed a fully automated CI/CD pipeline using Jenkins and Terraform, enabling rapid and reliable deployments to AWS
  • Integrated Kubernetes for container orchestration, improving deployment efficiency by 40%
  • Implemented monitoring dashboards with NewRelic and Sumologic to track build status and performance metrics
  • Project: Kubernetes-Based Microservices Architecture - Designed and deployed a microservices architecture on AWS EKS, ensuring high availability and scalability
  • Utilized Python and Terraform for automating infrastructure setup and maintenance
  • Enhanced security by implementing VPC peering, VPN connectivity, and AWS WAF
  • Project: Automated Model Drift Detection and Retraining Pipeline - Built a drift detection system to monitor ML models in production
  • Integrated data distribution monitoring and automated retraining triggers
  • Used MLflow for model versioning and Unity Catalog for metadata management
  • Implemented a feature store for consistent data preprocessing across training and inference
  • Project: Scalable Automated Training and Deployment Pipeline for NLP Models - Built an end-to-end automated training pipeline for large-scale NLP models
  • Leveraged Databricks AutoML for hyperparameter tuning and MLflow for tracking metrics
  • Used Airflow DAGs to automate feature extraction, model training, and deployment
  • Deployed models via Databricks Model Serving with versioning and rollback capabilities
  • Project: Reinforcement Learning for Personalized Recommendations in Streaming Services - Developed a Q-learning-based reinforcement learning model for personalized content recommendations
  • Used Unity Catalog to manage datasets across multiple teams
  • Optimized training with Spark for distributed processing and tracked RL policies using MLflow
  • Integrated GitLab CI/CD for model deployment and rollback strategies

Associate Technical Delivery Manager

Accolite Digital Pvt. Ltd.
02.2021 - 08.2023
  • Deployment speed improved by 80%, enabling faster feature rollouts
  • Reduced AWS cloud spending by 30% through intelligent cost optimization strategies
  • Increased system reliability & uptime, ensuring faster recovery during outages
  • Reduced MTTR by 50% by automating incident resolution
  • Ensured compliance with industry standards (SOC 2, HIPAA, GDPR)
  • Managed DevOps team along with Qubole Data Lake System on cloud platforms
  • Deployed latest versions of Qubole Data Lake using Jenkins/CLI/UI and resolved outages
  • Resolved customer queries about Big Data clusters (Hadoop/Hive/Presto/Spark/Airflow)
  • Optimized performance and scalability of data solutions by tuning SQL queries, partitioning data, and implementing caching mechanisms
  • Migrated infrastructure from on-premise to Azure using Terraform & PowerShell scripts
  • Deployed applications using Azure WebApp & Azure DevOps
  • Deployed applications on EKS cluster
  • Led the implementation of a cloud-based data warehousing solution on Azure Synapse Analytics
  • Designed and developed data pipelines to extract, transform, and load data from multiple sources into the data warehouse
  • Separated non-production & production environments, optimizing costs and applying security policies
  • Replaced monitoring tools with ELK stack and Snowflake ETL with Spark Big Data Engine for cost optimization
  • Assisted developers in optimizing code and services using the MERN stack
  • Developed a website using Ruby on Rails, Material UI, and React.js for referral emails to customers
  • Improved efficiency and performance of healthcare platforms through code and service optimization
  • Project 1: Infrastructure as Code with Terraform - Created and managed AWS infrastructure using Terraform, reducing manual configuration errors by 50%
  • Developed Python scripts for automating AWS resource management and monitoring tasks
  • Implemented disaster recovery solutions, ensuring 99.99% uptime for critical applications
  • Project 2: Continuous Delivery and Monitoring - Led the design and implementation of a continuous delivery pipeline using AWS CodePipeline and Jenkins
  • Integrated security and compliance checks into the pipeline, improving system reliability
  • Set up monitoring tools such as OpenSearch/Kibana and CloudWatch to provide real-time insights into system performance
  • Project 3: Kubernetes and AWS Networking - Deployed and managed Kubernetes clusters on AWS EKS, focusing on networking and security
  • Implemented load balancing, auto-scaling, and VPN connectivity for secure, scalable application deployments
  • Automated infrastructure provisioning using Terraform and integrated monitoring tools for proactive issue detection
  • Project 4: Application and Database Reliability - Ensured application and database reliability through automated backup and recovery processes
  • Developed Python-based tools for monitoring and alerting, reducing downtime by 30%
  • Collaborated with development teams to optimize database performance and security
  • MLOps Project: Healthcare Model Integration - Collaborated with data scientists to deploy machine learning models for healthcare analytics
  • Used MLflow to manage the end-to-end machine learning lifecycle, including experimentation, reproducibility, and deployment
  • Ensured models were integrated into the existing infrastructure seamlessly, maintaining high availability and performance
  • MLOps Project: AI Model Deployment Pipeline - Developed and implemented a CI/CD pipeline for deploying AI models using Kubeflow
  • Automated model training, validation, and deployment processes, ensuring reliable and reproducible results
  • Integrated monitoring and logging using Prometheus and Grafana to track model performance and detect anomalies

Senior Software Systems Engineer

GSPANN Technologies Pvt. Ltd.
02.2018 - 08.2020
  • Increased scalability and fault tolerance with Kubernetes-based microservices
  • Reduced mean time to detect (MTTD) and resolve (MTTR) incidents with real-time monitoring and alerts
  • Achieved data residency compliance by restricting regional traffic using GeoDNS and Traffic Manager policies
  • Eliminated 80% of repetitive alerts by filtering non-critical issues
  • Reduced security vulnerabilities by 70% by integrating security into CI/CD
  • Conducted production deployment activities with Jenkins, Udeploy, and Zeus
  • Automated production server activities using Python
  • Utilized Jenkins pipelines for code uploads in various stages
  • Updated properties file of vendor using version control tools like Git
  • Developed/upgraded software for casino gaming clients
  • Wrote deployer for Microservices using Ansible, Ansible modules in Python
  • Managed build work using Jenkins, worked in Docker Swarm & Docker overlay networks
  • Conducted POC on deploying apps using Openshift's OC new-app utility
  • Project 1: Continuous Integration and Deployment - Architected a CI/CD pipeline using Jenkins, Terraform, and AWS CodeDeploy, reducing deployment times by 50%
  • Developed automation scripts in Python and PowerShell to streamline build and deployment tasks
  • Implemented containerization strategies with Docker, enabling consistent application environments
  • Project 2: Cloud Security and Compliance - Designed and implemented security measures using AWS Config, CloudTrail, and GuardDuty
  • Conducted regular security assessments and vulnerability management, improving compliance by 20%
  • Integrated security monitoring with Sumologic and CrowdStrike, enhancing threat detection and response
  • Project 3: Model Calibration and Explainability Framework for Fraud Detection - Designed a calibration framework to improve model confidence scores for fraud detection models
  • Used Platt Scaling and Isotonic Regression to enhance probability estimates
  • Integrated MLflow for tracking experiments and automated hyperparameter tuning
  • Deployed the solution in Kubeflow to scale across multiple fraud detection pipelines
  • Project 4: CI/CD Pipeline Automation - Developed a fully automated CI/CD pipeline using Jenkins and Terraform, enabling rapid and reliable deployments to AWS
  • Integrated Kubernetes for container orchestration, improving deployment efficiency by 40%
  • Implemented monitoring dashboards with NewRelic and Sumologic to track build status and performance metrics
  • Project 5: Kubernetes-Based Microservices Architecture - Designed and deployed a microservices architecture on AWS EKS, ensuring high availability and scalability
  • Utilized Python and Terraform for automating infrastructure setup and maintenance
  • Enhanced security by implementing VPC peering, VPN connectivity, and AWS WAF

Linux Security Specialist

IBM India Pvt. Ltd.
03.2015 - 01.2018
  • Zero downtime migration to AWS with a 99.99% availability SLA
  • Cut storage expenses by 60% by automating data lifecycle policies
  • Improved application scalability by 5x using Kubernetes & auto-scaling
  • Achieved 99.99% compliance with automated policy checks & audit reporting
  • Improved API response times by 40% by optimizing queries & enabling caching (Redis)
  • Provided application support on UNIX & Linux variants
  • Implemented automation, security, and DevOps tools for development & operations coordination
  • Optimized UNIX servers in the Banking domain for performance, security, and cost optimization
  • Configured DevOps environment with Packer, Consul, Elastic Beanstalk, and Terraform
  • Supported Elasticsearch & Cassandra databases using tools like JIRA
  • Conducted version control with Git and built & tested code with Jenkins
  • Automated tasks using Python scripting and Docker

Associate Systems Administrator

Safenet Infotech Pvt. Ltd.
04.2011 - 11.2014
  • Reduced time-to-market by 40% through automated CI/CD pipelines
  • Reduced on-call escalations by 50% through intelligent alerting & auto-healing
  • Achieved 99.98% uptime, reducing incidents & improving business continuity
  • Saved $500K+ per quarter via cloud optimizations
  • Reduced deployment time from 1 hour to 5 minutes
  • Providing Technical support for VMware applications like VMware HA, FT, DRS, update manager, vSwitches etc in production environment
  • Facilitating the Vmware ESXi servers in Production by using NetApp as storage
  • Diagnosing the hardware faults in Dell Severs and machines, inventory, resource allocation and software license usage
  • Directing the Netbackup backup server and maintaining the tape repository
  • Monitoring and maintain lab network
  • A trigger is generated for all critical servers whenever they are down
  • Inventory management of all lab infrastructure
  • Ensuring Maximum uptime and reduce escalation
  • Providing Support all lab related tickets including servers of AIX , Linux and Solaris etc
  • Producing weekly status reports on performance , escalated calls and open tickets and make action plan for closing pending tickets
  • Ensuring all critical data is backed up and restoration is done wherever necessary to validate and confirm
  • Managing a clear communication with engineering teams on the downtime , backups and pending issues related to various engineering labs
  • Project name: IPV6 connectivity Setup b/w Solaris + Linux servers - Enabling ipv6 packet forwarding on RHEL ipv6 router.enabling ipv6 interfaces on the given servers & confirming communication among them
  • Project name: Nagios Monitoring of unix servers - Configuring Nagios server & installing plugins on servers to be remotely monitored
  • Project name: Configuration of code reviewing Klocwork Server - virtualization of physical license server,klocwork server & its database server for users to review their code
  • Project name: Inventory Manager Java Software create & deploy - Wrote an inhouse application in Java and Google-Web-toolkit to manage the inventory of the organization

Education

Bachelor's Degree - Electronics

Punjab Technical University
Jalandhar, Punjab

Skills

MLOps & DevOps Engineering

Timeline

Senior DevOps Manager | Cloud & MLOps Leader

EPAM Anywhere
10.2023 - 03.2024

Associate Technical Delivery Manager

Accolite Digital Pvt. Ltd.
02.2021 - 08.2023

Senior Software Systems Engineer

GSPANN Technologies Pvt. Ltd.
02.2018 - 08.2020

Linux Security Specialist

IBM India Pvt. Ltd.
03.2015 - 01.2018

Associate Systems Administrator

Safenet Infotech Pvt. Ltd.
04.2011 - 11.2014

Bachelor's Degree - Electronics

Punjab Technical University

Certification

Certified Data Engineer, MANGTAS

Languages

English
Hindi

Personal Information

Date of Birth: 09/27/81

Profile Summary

5+ years, 8+ years, True, True
RISHI KAUSHALSenior DevOps Manager