Summary
Overview
Work History
Education
Skills
Disclaimer
Timeline
Generic

Suvajit Pradhan

Bangalore

Summary

Having 10+ years of experience in IT, have been involved in various projects of service delivery, project automation, operation and maintainance, product implimentation and prouction support. In-depth experience in the full DevOps lifecycle, including Continuous Integration (CI), Continuous Delivery (CD), configuration management, and continuous monitoring to optimize deployment pipelines and improve the quality and speed of software delivery. Led end-to-end DevOps initiatives, automating workflows and deploying microservices across AWS and on-premises infrastructures. Experties in creating dashboards and alerts in monitoring tools like Splunk and Dynatrace to cater the application monitoring requirements. Designed and implemented automated Ansible playbooks for infrastructure provisioning, application deployment, and configuration management, reducing deployment time by 50%. Experience in configuration and installation of Tomcat web servers. Spearheaded the configuration and management of Docker containers and Kubernetes clusters, enabling efficient orchestration and scaling of containerized applications. Proactively monitored application and infrastructure health using Splunk, Dynatrace and Grafana, setting up real-time alerts to detect issues before they impact users. Proficient in writing shell scripts and Python to automate routine tasks, thereby reducing manual efforts and enhancing operational efficiency. Experience in Version control tool like GIT and maintenance of source code repositories like GITHub. Expertise in hands-on SCM strategy and experience with IT infrastructure administration by delivering high standards of IT solutions and services. Experienced in change and release management, enhancement and maintenance of software applications. Have experience in managing various AWS cloud services like EC2, EBS, AMI, S3, cloud watch, cloud formation, autoscaling, Route53 etc. Implemented and managed Kafka and Redis clusters for real-time messaging and caching, improving system performance and reliability. Configuration of Grafana dashboard for continuous monitoring. Skilled in configuring Elastic Load Balancers (ELB) to distribute incoming traffic efficiently across multiple servers, ensuring high availability and optimal resource utilization. Experience in handling hundreds of servers in the production environment of the application.

Overview

11
11
years of professional experience

Work History

Senior Site Reliability Engineer

Publicis Sapient
07.2022 - Current
  • Led and managed a high-performing team of Site Reliability Engineers (SREs) in an offshore location, overseeing end-to-end production support operations
  • Ensured seamless service reliability, minimized downtime, and implemented proactive monitoring and incident response processes
  • Streamlined deployment processes with the introduction of CI/CD pipelines via Azure DevOps, improving deployment frequency by 100% and reducing lead time for changes
  • Overseeing the planning, execution, and validation of deployments to ensure seamless and successful application releases
  • Conducted rigorous post-deployment validation checks to guarantee system integrity and ensured adherence to quality standards
  • Perform configuration and change management activities, including version control, system builds, and release management
  • Implemented comprehensive, scalable monitoring solutions leveraging Splunk and Dynatrace to deliver robust, real-time application performance insights
  • Enabled proactive issue detection, optimized system health, and enhanced incident resolution through advanced data analytics and visualization, ensuring high availability and operational efficiency
  • Automated routine and manual tasks using Python and Shell scripting, eliminating 100% of manual effort while significantly improving operational efficiency and accuracy
  • Optimized alert management, reducing over 700 alerts to fewer than 100, significantly minimizing noise and improving alert accuracy
  • Managed the ServiceNow incident queue, ensuring timely resolution and closure of incidents within defined SLA targets
  • Led prioritization calls with the product team to evaluate and prioritize open defects and enhancement requests, to drive efficient issue resolution and continuous product improvement

Site Reliability Engineer

L&T Tech
03.2020 - 07.2022
  • Handle the complete Operations and Maintenance of the Application
  • Perform the deployment, version upgradation, version release of various applications
  • Developing and supporting day to day release builds and deployments
  • Provided technical leadership in managing a highly available, fault-tolerant production environment of over 200 servers, handling both cloud and on-premise resources
  • Coordinate, communicate and facilitate the deployment plans
  • Configured and maintain Jenkins to implement the CI process and integrate the tool with Maven to automate the build process
  • Coordinate in branching, merging, and maintaining branching strategy using GIT
  • Creating playbooks in Ansible to deploy packages and automate installation process
  • Provisioning of complete infrastructure of the application during new service installation
  • Configuration and maintenance of ELB, SLB, Web servers and middleware of the services and microservices
  • Debug the build issues, troubleshoot and fix the problems in timely manner
  • Expertise in installations, configurations and supporting the developers to debug the source code
  • Responsible for VM provisioning, releasing and upgrading OS and security patches
  • Experience in writing bash shell scripts to automate the tasks
  • Act on security vulnerabilities of the services
  • Act as a single point of contact for all the queries about the assigned service
  • Closely work with R&D team to consistently review and improve the deployment process
  • Building docker images from the docker file
  • Manage and configure AWS services as per business need
  • (ELB, EC2, Route53, IAM, VPC, Auto Scaling)
  • Configuration and maintenance of middleware clusters such as Redis and Kafka
  • Writing shell scripts to automate tasks and processes
  • Setting up the network infrastructure like ACL, subnets, VPCs and middlewares for applications
  • Using Prometheus and Grafana monitor the performance of application and fetch the performance reports of the VMs

Implementation Engineer

Appnomic Systems Pvt Ltd
09.2016 - 03.2020
  • Implement the performance monitoring tools at the client Environment in linux platform
  • Creating database, adding user to data base, taking backup from dump, restoring from dump file and deleting
  • Manage docker containers and volumes
  • Login to docker and manage all the docker and nomad services
  • Writing Bash Shell Scripts to capture various system parameters and configure them based on the client’s requirement
  • Ensure required version of MySQL, Python and Apache Tomcat and other packages are installed
  • Install and update them if not installed already
  • Configure the DR environment
  • Perform the DC-DR database sync and migration of database
  • Installing the required version of Java and setting the Java home path
  • Create users in MySQL DB, and grant them the privileges
  • Generate SSL certificate and import them for the secure connection among the services
  • Check the port status in firewall and add the required ports accordingly
  • Update the schema version of MySQL, and latest version of Tomcat released by the tool
  • Create non-root user in RHEL and add them to the group
  • Also provide sudo access to the users based on requirements
  • Create Websphere Application Server non-root user in the WAS admin console and add them to monitor group to enable to the WAS KPI data collection from the WAS server
  • Do the same steps for other Application servers liker Weblogic and JBOSS servers
  • Create passwordless connections among the servers for easy access and create alias
  • Configure several services like cassandra, HA Proxy, keycloak and reporting services
  • Troubleshoot, if issue in data collection reported
  • Run the logs in debug mode to check for the error logs
  • Making changes in httpd.conf file of IHS/APACHE server to get desire output in the log

System Engineer

Teamware Solutions
01.2015 - 01.2016
  • Company Overview: Contract to hire for Tata Consultancy Services
  • Working on Service Now ticketing tool
  • Handling escalated tickets from L1
  • Creation and setup of mailboxes in Exchange Management Console (EMC)
  • Creation of distribution lists and shared mailboxes in outlook
  • Taking backup of mailboxes, migration of mailboxes
  • Creation of AD accounts, deletion and updating the account
  • Resets and unlock the AD accounts
  • Deals with, RSA Token, Symantec Encryption Server
  • Works on the issue related to SCCM, lotus notes, Good App, IE, Outlook, MS Office, Lync etc
  • Also deals with the VPN related issues
  • Creation of incidents and service requests with the proper documentations
  • Tracing for the tickets to record the progress
  • Works on the Macros and Addins related issues
  • Handles with the network connectivity issues
  • Creating the GOOD account and troubleshooting
  • Updates the incidents and service requests in regular basis keeping track of SLA
  • Constantly stay in touch with different teams in order to close the case in time and maintain SLA
  • Contract to hire for Tata Consultancy Services

Technical Service Associate

Minacs Ltd
01.2014 - 01.2015
  • Worked on CRM tool
  • Troubleshoot network, outlook and MS Office related issues
  • Work the incidents and services requests and ensure the closure within the stipulated time
  • Do audit on the open cases and track them to closure
  • Provide support to wide range of applications
  • Find out the Root cause analyse for the issue based on the logs/dumps

Education

B.Tech - Computer Science and Engineering

Biju Patnaik University
Odisha
01.2013

Higher Secondary (10+2 STD) - PCMB Stream

CHSE
Baleshwer, Odisha
05-2009

High school -

HSE Board
Baleshwer, Odisha
05-2007

Skills

  • Azure Devops
  • Jenkins
  • Ansible
  • Terraform
  • Shell script
  • Python
  • Grafana
  • Splunk
  • Dynatrace
  • GIT
  • Docker
  • Kubernetes
  • AWS
  • APIs
  • WebSphere
  • Apache Tomcat
  • Nginx
  • Linux

Disclaimer

I hereby declare that all the above furnished details are true to best of my knowledge and belief.

Timeline

Senior Site Reliability Engineer

Publicis Sapient
07.2022 - Current

Site Reliability Engineer

L&T Tech
03.2020 - 07.2022

Implementation Engineer

Appnomic Systems Pvt Ltd
09.2016 - 03.2020

System Engineer

Teamware Solutions
01.2015 - 01.2016

Technical Service Associate

Minacs Ltd
01.2014 - 01.2015

B.Tech - Computer Science and Engineering

Biju Patnaik University

Higher Secondary (10+2 STD) - PCMB Stream

CHSE

High school -

HSE Board
Suvajit Pradhan