Summary
Overview
Work History
Education
Skills
Certification
Languages
Disclaimer
Timeline
Generic

Vivek Chandrakant Mohite

Pune

Summary

A highly organized individual with 11+ years of overall experience, out of which 5+ years of experience as an Application Support Engineer and basic performance checks of Databases and applications. To work with maximum potential in challenging and dynamic environment, with an opportunity of working with diverse group of people and enhancing my professional skills with learning for my career growth.

Professional engineer with strong foundation in system reliability and optimization. Known for delivering robust solutions that enhance system performance and reduce downtime. Collaborative team player focused on achieving results and adapting to changing environments. Skilled in automation, incident management, and continuous improvement.

Overview

13
13
years of professional experience
1
1
Certification

Work History

Site Reliability Engineer

WHILEONE TECHSOFT PVT LTD.
04.2025 - Current
  • Company Overview: Cerebras Systems Inc. is an American Artificial Intelligence company. Cerebras builds computer systems for complex AI deep learning applications. Today, Cerebras stands alone as the World's fastest AI inference and training platform. Organizations across fields like Medical Research, Cryptography, Energy, and Agentic AI use our CS-2 and CS-3 systems to build on-premises Supercomputers.
  • Oversaw comprehensive monitoring of diverse AI models to guarantee optimal performance and reliability.
  • Performed ongoing evaluations of system performance and infrastructure reliability to facilitate proactive issue identification.
  • Utilized Grafana to monitor queue length, time to first token (TTFT), and real-time latency metrics for enhanced system performance.
  • Analyzed queue time anomalies at p50, p90, p95, and p100 to enhance identification and resolution of inference delays.
  • Addressed urgent alerts to facilitate timely incident resolution through systematic tracking and analysis.
  • Evaluated and rectified technical issues, including replica restart loops, load balancer access problems, and queue spikes, to maintain optimal system performance and reliability.
  • Managed on-call responsibilities, driving incident resolution efforts and conducting thorough post-mortem analyses to mitigate future risks.
  • Collaborated with on-call engineers to identify and isolate malfunctioning systems, ensuring minimal disruption during tainting and system swapping operations.
  • Coordinated efforts during incident windows to detect suspicious nodes by leveraging cluster tools, user node logs, and movement dashboards.
  • Managed large-scale inference workloads utilizing Cerebras AI inference system across advanced models, including OpenAI's ChatGPT, Meta's LLaMA-4, Scout, DeepSeek, LLaMA 3.1-8B, LLaMA 3.3-70B, DeepSeek-R1, Distil-LLaMA-70B, Mistral AI, and Perplexity AI.
  • Utilized cluster movement dashboard to ensure precise tracking of CG locations and oversee resource redistribution during job restarts or replica failures.
  • Performed in-depth system diagnostics to maintain peak performance through the application of Linux utilities and tools including cs event show, cs system show, csctl, journalctl, and systemctl status.
  • Conducted live testing with curl to ensure precise routing and preserve the integrity of model responses.
  • Identified and addressed end-to-end test failures to ensure system reliability, focusing on API connection errors, timeouts, and session threshold breaches exceeding 120.
  • Collaborated with teams to develop runbook, automation scripts, and standard operating procedures for streamlined operations.
  • Developed and implemented strategies aimed at driving continuous improvements in reliability and observability to support operational excellence for critical AI/ML workloads and distributed systems.
  • Cryptography, Energy, and Agentic AI use our CS-2 and CS-3 systems to build on-premises Supercomputers.
  • Implemented monitoring solutions using Prometheus and Grafana for proactive incident management.
  • Oversaw incident response efforts, ensuring efficient resolution of critical outages through strategic team coordination.
  • Performed comprehensive root cause analysis to drive improvements in system design and documentation processes.
  • Facilitated knowledge transfer to junior engineers on effective site reliability techniques and operational excellence strategies.
  • Facilitated ongoing improvement of site reliability engineering practices by conducting regular reviews and sharing insights with team members.
  • Designed and implemented a streamlined onboarding process for new engineers, fostering increased productivity and collaboration within the team.
  • Worked with stakeholders and team members on quality assurance efforts for hardware components.
  • Collaborated with cross-functional teams for identification and resolution of validation issues.
  • Improved deployment efficiency, automating processes using CI/CD pipelines.

Senior Software Analyst

APISERO INDIA PVT LTD.
04.2023 - 10.2023
  • Facilitated acquisition process by NTT Data to enhance organizational capabilities.
  • Delivered comprehensive L1 and L2 support to enhance operational efficiency of Mule Manage project in healthcare and cloud computing domains.
  • Implemented and maintained 24/7 surveillance protocols for cardiovascular system APIs, safeguarding operational integrity and functionality.
  • Assisted in support activities for healthcare-related products at L1/L2 level in cardiovascular systems. Aided in API design and debugging to improve system functionality. Supported troubleshooting of Mule flows and performed DataWeave transformation analysis. Helped manage reporting service operations for timely delivery of insights.
  • Achieved seamless operation of Autolus process APIs and system APIs through effective monitoring. Enhanced system performance by resolving issues promptly. Improved user experience by maintaining high API availability and functionality.
  • Oversaw data transfer processes between systems utilizing Mulesoft scheduler. Collaborated with teams to verify successful file and table transfers on Anypoint platform. Led troubleshooting initiatives to resolve data transfer issues effectively.
  • Directed backup monitoring operations for Rackspace servers, leveraging MuleSoft APIs and Foundry for AI to optimize data protection workflows.
  • Analyzed MuleSoft API logs alongside FAIR diagnostics to assess server backup performance, enabling predictive maintenance and anomaly detection.
  • Utilized FAIR diagnostics integrated with MuleSoft oversight to confirm successful cloud backups, proactively detecting incomplete transfers and triggering automated alerts.
  • Leveraged FAIR's AI capabilities alongside MuleSoft API logs to verify backup occurrences in real-time, minimizing data loss risks and improving operational efficiency.

Application Support Engineer

MONTCREST SOFTWARE PVT. LTD.
01.2020 - 02.2023
  • Delivered technical assistance for data applications at L1 and L2 levels to enhance operational efficiency.
  • Achieved consistent application availability by delivering daily reports and health checks to clients. Enhanced job monitoring processes using Control-M tool to optimize performance.
  • Utilized SQL and PL/SQL skills to design and enforce joins and constraints for optimized database performance.
  • Executed basic shell scripting in Unix/Linux environments to create and modify crontab entries for automated data cleanup.
  • Coordinated efforts with Linux, development, middleware, network administration, and application support teams to address and troubleshoot database and application-related challenges.

Senior Executive

TATA COMMUNICATIONS LTD.
08.2018 - 12.2019
  • Executed management of Linux, Unix, and Windows operating systems, facilitating uninterrupted service delivery and improved user satisfaction.
  • Implemented troubleshooting protocols for server issues to ensure compliance with SLA requirements.
  • Executed comprehensive evaluations of server performance utilizing health check commands.
  • Oversaw user access management and executed log reviews to uphold system integrity across CLI and GUI servers.
  • Executed troubleshooting and resolution strategies for server-related issues, enhancing operational efficiency for field engineers.
  • Managed over 20 tickets per day.

Executive

TATA COMMUNICATIONS LTD.
03.2015 - 08.2018
  • Monitoring & Management of EMS/NMS by which the entire SDH transmission network of Maharashtra & Goa circle, worked on LINUX system.
  • Planning and Implementation of network restructuring for network optimization.
  • Through fault analysis to avoid repeat outages and troubleshooting.

Executive

SHRI SIDDHIVINAYAK ENTERPRISES
04.2013 - 03.2015
  • Performed detailed assessments of switch performance and implemented required checks to maintain operational integrity.
  • Oversaw real-time alarm monitoring to ensure optimal performance of access network infrastructure.
  • Executed daily and periodic health checks for switch nodes to ensure optimal performance.

Education

Master of Engineering - Communication Networks

SAVITRIBAI PHULE PUNE UNIVERSITY
Pune, MH
06.2016

Bachelor of Engg - Electrical, Electronics And Communications Engineering

SHIVAJI UNIVERSITY
Kolhapur, MH
06.2012

Diploma in Electronics - Digital Electronics

MSBTE
Mumbai, MH
01.2007

Secondary School Certificate - Marathi

Z.P. School
Satara
01.2003

Skills

  • Artificial Intelligence: AI Inference, Jira Service Management, HAProxy (Load Balancer), New Relics
  • Cloud-native AWS: EC2, API Gateway, Lambda, AWS SNS, SQS,
  • Networking on AWS : VPC, Route 53, Direct Connect, Site-to-Site VPN, CCNA Routing and Switching, CCNA Security
  • Storage on AWS : S3, EBS, EFS
  • Monitoring : CloudWatch, CloudTrail, Prometheus, Grafana, Pagerduty, Slack
  • Security on AWS : Security group, NACL, IAM, Certificate Manager, KMS, AWS Firewall Manager, Secrets Manager, Security Hub
  • Containers on AWS : Microservices-ECS Fargate, ECR, Kubernetes, Helm Charts,
  • Databases: RDS, Redis, Influx, Cost on AWS : Service Quotas, Cost explorer
  • DevOps and IaC : Chef, CloudFormation, Service Catalog, Jenkins CI-CD, Bitbucket, Git, JIRA, Confluence
  • Migration on AWS : Application Discovery Service, AWS Application Migration Service,
  • ITIL : Incident Management, Problem Management, Change Management ITIL v4 Foundation, Prince 2 Foundation
  • Operating Systems and Tools: Linux, Windows, Open Shift

Certification

  • CCNA R & S, 2016-06-01
  • ITIL FOUNDATION (V2), 2017-02-01
  • CCNA SECURITY, 2018-05-01
  • MCD -LEVEL-1, 2023-08-01
  • Prince 2 Foundation, 2019-08-01

Languages

Marathi
English
Hindi

Disclaimer

I hereby declare that all the information given above is true to my knowledge & belief.

Timeline

Site Reliability Engineer

WHILEONE TECHSOFT PVT LTD.
04.2025 - Current

Senior Software Analyst

APISERO INDIA PVT LTD.
04.2023 - 10.2023

Application Support Engineer

MONTCREST SOFTWARE PVT. LTD.
01.2020 - 02.2023

Senior Executive

TATA COMMUNICATIONS LTD.
08.2018 - 12.2019

Executive

TATA COMMUNICATIONS LTD.
03.2015 - 08.2018

Executive

SHRI SIDDHIVINAYAK ENTERPRISES
04.2013 - 03.2015

Master of Engineering - Communication Networks

SAVITRIBAI PHULE PUNE UNIVERSITY

Bachelor of Engg - Electrical, Electronics And Communications Engineering

SHIVAJI UNIVERSITY

Diploma in Electronics - Digital Electronics

MSBTE

Secondary School Certificate - Marathi

Z.P. School
Vivek Chandrakant Mohite