Summary
Overview
Work History
Education
Skills
Personal Information
Projects
Accomplishments
Work Availability
Work Preference
Languages
Websites
Timeline
Generic
Souvik Sarkhel

Souvik Sarkhel

SDE II
Bangalore

Summary

Seasoned DevOps and SRE professional with 10+ years of experience driving reliability, automation, and scalability

across large-scale platforms like Disney+ Hotstar. Proven track record of building end-to-end CI/CD pipelines

(Jenkins, Spinnaker, Istio), automating infrastructure with Terraform, and integrating DevSecOps practices. Expert

in observability solutions using ELK, Grafana, CloudWatch, and custom telemetry tools to enhance system reliability

and incident response. Deep expertise in Big Data ecosystems (HDP, CDH, Ambari, Cloudera Manager), including

deployment automation and marketplace product development. Adept at optimizing operational efficiency, disaster

recovery planning, and performance tuning in hybrid cloud and containerized environments.

Overview

10
10
years of professional experience

Work History

SDE II

Disney+ Hotstar
08.2021 - Current
  • Working on Engineering Productivity team to build observability & monitoring solution for Disney+ Hotstar

Lead Engineer

Informatica
03.2021 - 07.2021
  • Currently leading a team of 4 engineers which implements the CI/CD processes for an upcoming cloud based microservices product

Senior Software Engineer

Informatica
07.2018 - 03.2021
  • Part of the Devops team which implements the CI/CD processes for an upcoming cloud based microservices product
  • Working on various R&D issues for a Big Data product.

Technical Lead

Nokia
03.2017 - 07.2018
  • Associated with Nokia from last 1 Year and worked with various open source technologies as DevOps Engineer with various Big data distributions like CDH, HDP in both bare-metal and containerized environments.

Senior Systems Engineer

Infosys Limited
01.2015 - 02.2017
  • Worked primarily as an AWS Developer besides being involved with Python, Docker, Big data technologies and configuration management tools like Chef.

Education

Bachelors of Technology (Computer Science) -

MCKV
Howrah, West Bengal

Senior Secondary Education - undefined

D.A.V Model School
Kharagpur, West Bengal

Secondary Education - undefined

S.H.H.S
Kharagpur, West Bengal

Skills

CI/CD Tools: Jenkins, Github Actions, Spinnaker, Harness, GoCD, Argo Workflow

undefined

Personal Information

Total Experience: 10 years 5 months

Projects

  • SRE:
    - Automated Observability and monitoring stack and setup Alerting along with Incident management process to reduce MTTD and MTTR.
    - Event processing pipeline using Flink to process realtime events and store in a TSDB for alerting using Anomaly Detection.
    - Design and implement a system that can be able to centralise the alert definition.
  • Capacity Planning for AWS Managed ELK:
    - Determined instance types, number of Datanodes, and Master nodes.
    - Calculated required disk space and shards for a PROD ELK cluster.
    - Implemented index lifecycle management based on performance team inputs.
  • AWS Infrastructure DR:
    - Designed zero downtime switchover plan for AWS services in case of regional outage.
    - Utilized Terraform, AWS CLI, and Jenkins for automation and implementation.
  • Jenkins DevSecOps Pipeline:
    - Implemented static code analysis using SonarQube.
    - Conducted container vulnerability scans with Stackrox.
    - Ensured source code security with Veracode and Blackduck scans.
    - Conducted REST API testing using Burp and automated Jira ticket creation on failure.
  • Jenkins as a Service on Kubernetes:
    - Deployed Jenkins on demand in Kubernetes with EBS storage.
    - Enabled dynamic agent launching as needed.
    - Executed service tests on Kubernetes via Spinnaker pipeline.
  • CD Phase:
    - Created complex Spinnaker pipelines using Dinghyfile.
    - Orchestrated landscape deployments with Spinnaker and Terraform.
    - Utilized Helm3 for microservices deployment.
    - Implemented Canary Deployments and Istio custom gateway deployment.
  • CI Phase:
    - Developed Jenkins pipelines as code and shared libraries.
    - Monitored Jenkins node metrics and created alerts.
    - Visualized build telemetry using ELK stack.
  • Infrastructure Automation using Terraform and Monitoring/Visualization:
    - Developed Terraform modules for deploying AWS resources.
    - Set up Cloudwatch alerts and dashboards in Kibana.
    - Ensured monitoring and visualization with Cloudwatch and Grafana.
  • Solving R&D Issues:
    - Improved Metadata Scanner performance using OrientDb Graph.
    - Enhanced API performance with multi-threading in Java.
    - Developed Java API with Spring for product operations.
  • Kerberos Setup in Kubernetes:
    - Developed Docker images for KDC master-slave setup in Kubernetes.
  • Custom Service for CDH Stack:
    - Developed Spark Thriftserver service for CDH with Kerberos Authentication (https://github.com/cloudera/cm_csds/pull/5)
    - Handled performance tuning in Spark ThriftServer.
  • Datalake Development:
    - Deployed CDH distribution on bare metal and containerized environments.
    - Implemented Kerberization and service discovery.
    - Developed automated test scripts using Python.
  • Custom Stack for Apache Ambari:
    - Installed and distributed services in a cluster with custom services in Apache Ambari.
    - Implemented continuous monitoring and one-click installation with Chef.
  • Big Data Stack on AWS:
    - Created AWS resources and managed Hadoop clusters using CloudFormation and Chef.
    - Developed end-to-end solutions for deploying big data stacks on AWS Marketplace and TestDrive.

Accomplishments

  • Achieved 50% cost savings for metrics collection using OpenTelemetry Collector.
  • Achieved 20% decrease in incident volume and continuous improvement within the Site Reliability Engineering SRE) domain by automating, refining internal processes, conducting regular reviews.
  • Reduced MTTR by 33% by deploying automated alerting systems and refining communication protocols, ensuring rapid issue detection and resolution.
  • Reduced deployment failure by 25% by using automated gates in Jenkins, E2E tests and Canary verification.

Work Availability

monday
tuesday
wednesday
thursday
friday
saturday
sunday
morning
afternoon
evening
swipe to browse

Work Preference

Work Type

Full Time

Work Location

Hybrid

Important To Me

Career advancementWork-life balanceFlexible work hours

Languages

English
Bilingual or Proficient (C2)

Timeline

SDE II

Disney+ Hotstar
08.2021 - Current

Lead Engineer

Informatica
03.2021 - 07.2021

Senior Software Engineer

Informatica
07.2018 - 03.2021

Technical Lead

Nokia
03.2017 - 07.2018

Senior Systems Engineer

Infosys Limited
01.2015 - 02.2017

Bachelors of Technology (Computer Science) -

MCKV

Senior Secondary Education - undefined

D.A.V Model School

Secondary Education - undefined

S.H.H.S
Souvik SarkhelSDE II