Professional Summary
KEY RESPONSIBILITIES:
Batch Monitoring & Automation
- Job Health Tracking: Monitoring batch schedules and proactively managing job failures to maintain system stability.
- Script-based Automation: Creating Shell and Python scripts for batch job automation and alert-based job recovery.
- Schedule Management: Managing batch job lifecycles in Autosys and Control-M for predictable execution flow.
- Failure Diagnostics: Investigating batch failures using log traces and automating response steps for faster resolution.
Stakeholder & Deployment Coordination ·
- Production Change Handling: Implementing scheduled production releases and ensuring validation of deployed components.
- Deployment Alignment: Coordinating deployments with QA and business users to ensure scope alignment and zero-defect delivery.
- Operational Readiness: Collaborating with teams to validate hybrid infrastructure readiness before go-live events.
- Escalation Management: Communicating incident impacts to stakeholders and escalating unresolved issues for triage.
Agile & ITIL Practices
- Agile Delivery Participation: Contributing to sprint goals and attending daily stand-ups for cross-functional alignment. ·
- ITIL Compliance: Handling tickets through ServiceNow and JIRA while adhering to defined change and incident workflows.
- Knowledge Transition: Sharing platform knowledge through walkthrough and documentation to aid new team onboarding.
KEY HIGHLIGHTS
Root Cause & Triage Excellence
- Issue Diagnostics: Diagnosed recurring production failures using scripting and log-based analysis techniques.
- Anomaly Detection: Identified system behavior deviations by correlating logs, alerts, and server performance metrics.
Efficiency & Process Improvements
- Manual Effort Reduction: Eliminated routine support tasks by implementing reusable automation utilities. ·
- SLA Adherence: Improved resolution timelines and SLA metrics through faster issue classification and automation.
Reporting & Analysis
- Data Reconciliation: Validated transactional records across systems using SQL queries for exception tracking.
- Operational Analytics: Compiled support data into performance summaries to aid strategic process refinement.
- Business Impact Summary: Summarized technical incidents into executive level impact reports for client visibility.
System Stability & Recovery
- Post-Failure Handling: Executed missed cron jobs and validated system consistency during recovery operations.
- Environment Sync: Validated downstream systems post-deployment to ensure data propagation and sync integrity.
- Disk Utilization Alerts: Monitored and actioned server disk usage alerts to prevent outages.