Birlasoft Private Limited, India - Noida, March 2023 - present.
As Senior Software Developer - Admin - L3 (R&D Datalake) / Data Management, Noida, U.P.
- Advanced Troubleshooting & Incident Management:Lead resolution of complex, high-impact system issues, including service failures, data corruption, and cluster-wide performance problems.
- Cluster Architecture & Performance Optimization: Design and optimize cluster architecture, including capacity planning, resource allocation, and tuning Hadoop, HDFS, YARN, and other services for peak performance.
- Disaster Recovery, Security & Compliance: Implement and test comprehensive disaster recovery strategies, enforce security policies (Kerberos, encryption), and ensure compliance with regulatory requirements across the platform.
As Software developer - Admin- L1, L2 (R&D Datalake)/ Data Management Noida, U.P. September 2021- March 2023
- Monitor & Respond to Alerts:
- Monitor Cloudera services for alerts and basic issues (e.g., service downtime, resource utilization), escalating to higher levels when needed.
- Service & Job Management: Start, stop, and restart services; schedule and track routine jobs; assist with job failures or service health checks.
- Basic Maintenance & Security: Perform routine backups, apply patches, and ensure basic security (user management).
- Advanced Troubleshooting & Performance Tuning: Resolve complex service failures, optimize cluster performance (CPU, memory, disk), and troubleshoot logs for root cause analysis.
- Cluster Scaling & Configuration:Scale and manage clusters (adding/removing nodes), configure services for high availability, and ensure efficient resource allocation.
- Backup, Recovery, & Security Management:Design and implement backup strategies, conduct disaster recovery drills, and manage security configurations (Kerberos, encryption).
As Trainee Associate - Admin- L0 (R&D Datalake)/ Data Management Noida, U.P. April 2020 – September 2021.
- cluster maintenance, troubleshooting, cluster and
capacity planning.
- Respond to routine system alerts and minor issues.
- Assist with identifying hardware or network failures affecting services.
- Monitor system logs and escalate issues to higher-level support as needed.