Summary
Overview
Work History
Education
Skills
Certification
Timeline
background-images

Tausif Shaikh

Pune

Summary

IT professional with 13 years of comprehensive experience in infrastructure management, and operations, including 4 years specializing in Site Reliability Engineering (SRE). Proven expertise in designing, implementing, and maintaining highly available, scalable, and resilient systems. Skilled in automation, monitoring, incident management, and performance optimization to ensure seamless service delivery and improved system reliability.

Overview

13
13
years of professional experience
1
1
Certification

Work History

Senior Technical Solutions Engineer

Persistent Systems
Pune
01.2025 - Current
  • Containerized applications using Docker and orchestrated deployments with Kubernetes, improving scalability and resource utilization.
  • Implemented monitoring solutions with Prometheus and Grafana, enhancing system visibility and reducing downtime by 40%.
  • Managed deployment CICD pipelines using Jenkins and Docker, reducing deployment time by 50%.
  • Troubleshooting Docker, Kubernetes, APACHE/Tomcat issues.
  • Integrated Git for version control in CI/CD workflows, enabling seamless collaboration and code review processes, which improved code quality and team productivity.
  • Managed scalable Kubernetes clusters on AWS EKS, ensuring high availability and performance for production applications.
  • Manage source code repositories in Git, handling branching, commits, and merges to ensure smooth development workflows.

Cloud Engineer

Cisco
Pune
08.2022 - 11.2024
  • SRE on-call for Production support.
  • Used Jenkins for the Code/App deployment related activities on linux based VMs.
  • Troubleshooting docker issues, Kubernetes, APACHE/Tomcat.
  • Resolve API-related issues and ensure smooth deployments in production environments.
  • Used Grafana, Splunk and AppDynamics dashboards to troubleshoot API related issues.
  • Automate and streamline deployment tasks using shell scripts and Ansible playbooks.
  • Managed CICD pipelines using Jenkins.
  • Deployed and managed Elastic Kubernetes Cluster (EKS) on AWS and troubleshooting.
  • Created grafana dashboards and integrated with Prometheus.
  • Setup alerts in Prometheus for kubernetes clusters.
  • Created dashboards in Splunk tool for production related activities.
  • Analyzed transaction snapshots, diagnostic sessions, and performance metrics in AppDynamics to isolate root causes of slow response times and application errors.

Technical Support Lead

Persistent Systems
Pune
04.2019 - 07.2022
  • Manage over 20,000 Linux servers across multiple data center's, ensuring high availability and performance.
  • Provide 24x7 Site Reliability Engineer on-call support for critical production issues, ensuring high availability and system reliability of an application.
  • Server Migrations of Linux servers from On-Prem to cloud.
  • Managed user permissions, file systems, and security policies in accordance with organizational standards.
  • Supported the migration of on-premise infrastructure to OpenStack, improving scalability and reducing costs.
  • Creating volumes and managing file systems using Logical Volume Manager (LVM) and extending and reducing the file system using LVM.
  • Led bridge calls with global IT teams and business partners to resolve critical business incidents and outages.
  • Collaborated with development teams to automate CI/CD pipelines, integrating Linux-based solutions with Jenkins and Git.
  • Drive knowledge management across the supported applications and ensure full compliance.
  • Used Splunk and AppDynamics to troubleshoot API issues and log analysis.
  • Managed Kubernetes Cluster using Rancher Tool.
  • Continuously improve systems and applications' reliability, scalability, and performance through root cause analysis, code and architecture review, and proactive monitoring.
  • Participate and respond to critical incidents promptly and efficiently, performing troubleshooting and incident management as needed.
  • Used Grafana, Splunk and AppDynamics to monitor metrics in the production environment.
  • Monitor systems and applications for performance, availability, and security, and respond to issues quickly and efficiently.
  • Collaborate with development and product teams to ensure that applications and systems are designed and implemented with reliability, scalability, and performance in mind.
  • Managed Prometheus-based alerting systems for Linux, Kubernetes, and application-level issues in production environments.
  • Implemented configuration management using Ansible for efficient system administration.
  • Support the resolution of incidents and problems within the team. Assist with the resolution of complex incidents.
  • Created and managed GitHub repositories, overseeing branching, code merges, tagging, and user permissions.

Technical Solutions Engineer

Mojo Networks
Pune
07.2017 - 01.2019
  • Monitored and provisioned wireless devices, including sensors and access points, on the Mojo Cloud Server.
  • Actively monitored cloud servers using Nagios for performance and uptime.
  • Generated reports from the cloud server to track active and inactive device counts.
  • Managed cloud infrastructure upgrades and maintenance services.
  • Troubleshooting Linux server related issues and Linux administration tasks.
  • Provided Level 1 customer support via calls and emails, resolving technical issues promptly.
  • Collaborated with the product management team to improve products based on customer feedback, addressing issues, and implementing new feature requests.
  • Application support for projects like Comcast Wi-Fi Pro business and WatchGuard Technologies. Ensures that all incidents are managed robustly and effectively and that any business impact is identified and minimized.

Application Support Analyst

ADP (Automatic Data Processing)
Pune
04.2014 - 01.2017
  • Provided Level 1 customer support via calls and emails, resolving technical issues promptly.
  • Weekly call with engineering/L3 teams to discuss pending issues.
  • Monitor application performance using Splunk and depth log analysis.
  • Application deployment on the Enterprise servers.
  • Troubleshooting of Web/App during production issues and reporting bugs to the L3 team.

Sr. Associate

WNS
Pune
02.2012 - 04.2014
  • Answered employee's inquiries in person, email and via telephone.
  • Installation, configuration and maintenance of Windows OS.
  • Troubleshooting of WLAN and LAN issues.
  • Diagnosed and resolved PC problems and software issues.

Education

Bachelor's degree - undefined

Pune University
01.2011

Skills

  • Centos (Linux)
  • KVM
  • Basic Shell scripting
  • YAML
  • CICD Jenkins
  • Ansible Tower
  • Docker
  • Kubernetes
  • AWS
  • OpenStack
  • Apache Tomcat
  • GIT
  • Prometheus
  • Grafana
  • Splunk
  • AppDynamics

Certification

  • Azure Fundamentals, I330-363, 2022
  • AWS Solutions Architect, NLJR3SH1F1Q11YKW, 2020
  • Linux Administration, 2020
  • Generative AI from Udemy
  • DevOps Project 1 - CI/CD with Git Jenkins Ansible

Timeline

Senior Technical Solutions Engineer

Persistent Systems
01.2025 - Current

Cloud Engineer

Cisco
08.2022 - 11.2024

Technical Support Lead

Persistent Systems
04.2019 - 07.2022

Technical Solutions Engineer

Mojo Networks
07.2017 - 01.2019

Application Support Analyst

ADP (Automatic Data Processing)
04.2014 - 01.2017

Sr. Associate

WNS
02.2012 - 04.2014

Bachelor's degree - undefined

Pune University
Tausif Shaikh