Worked on Following Technologies:
Scala, IntellIj, RabbitMQ, Kafka, ES, ObjectStore (ECS), Grafana, Prometheus, Logstash, GitLab for managing the projects, Kubernetes, Docker, Splunk, S3 Browser, Offset Explorer.
Telemetry Data Processing & DevOps Experience
- I was part of the Triumph ISG team, where I primarily worked on telemetry data processing across multiple servers.
- The workflow involved receiving collections/payloads via Kafka or RabbitMQ. The data files were sometimes extracted and sometimes unextracted. If unextracted, we processed and extracted them before handling the log files. These log files were then transformed into Parquet, JSON, or XML formats based on system requirements.
- Once processed, the log files were published in the Elasticsearch meta index and a Kafka meta topic, allowing downstream API teams to consume the data. The metadata included key fields such as systemID, startTime, endTime, triumphProcessingTime, overallCollectionProcessingTime, and parquetFilePath.
- To track processing performance, we implemented metrics capturing various processing times and ensured compliance with an SLA of 5 minutes per collection. We utilized Grafana dashboards to monitor SLA adherence in real-time (e.g., last 1hr, 2hr, 3hr). Prometheus was used to track RabbitMQ lag via line charts.
DevOps & CI/CD Implementation
We followed DevOps principles and deployed our applications in Kubernetes (K8s) clusters, where I worked extensively on CI/CD pipelines. The CI/CD stages included:
Security & Quality Checks
- Snyk Fixes – Addressing vulnerabilities in POM dependencies
- SonarQube – Ensuring code quality and test case coverage
- BlackDuck Scan – Capturing critical vulnerabilities not detected by Snyk
- Unit Testing – Validating application functionality
- Container Scanning – Identifying container-based security issues.
Build & Deployment Stages
- Compile Package
- Verify Build (including BlackDuck, Checkmarx, Code Quality, SAST, SCA, Unit Testing, and Xray Scans)
- Docker Image Build & Push
- Container Scanning
- Deployment & Verification
Additional Contributions Beyond Day-to-Day Work:
- Conducted benchmarking to compare writing speed between HDD ECS and SSD ECS.
- Ensured a minimum maturity score of 81% across various applications. Addressed vulnerabilities using tools like BlackDuck, SCA, SCST, Checkmarx. Improved unit testing, code quality, container scanning, and Xray scanning.
- Led the migration of applications from PKS cluster to KOB cluster.
- Onboarded Splunk for all squad applications in both non-prod and prod environments. Implemented Splunk alerting for proactive monitoring.
- Developed a liveness script to monitor RabbitMQ lag and enable automatic restarts every hour.
- Analyzed journalctl command for creating the json from system.journal logs files.