Summary
Overview
Work History
Education
Skills
Accomplishments
Projects
Personal Information
Certification
Disclaimer
Timeline
Generic

Sushant Waghmode

Pune

Summary

Accomplished Hadoop administrator with 6+ years of expertise in managing, optimizing, and scaling complex Hadoop clusters for high-performance big data environments. Currently excelling at Teradata India Pvt Ltd., Pune, overseeing critical components such as HDFS, YARN, Hive, Impala, and Spark for maximum reliability and performance. Highly skilled in system administration, performance tuning, security management, and automation using advanced tools like Cloudera Manager, Splunk, and RUNDECK. Known for identifying and resolving complex technical issues, optimizing resource utilization, and driving significant improvements in cluster performance. Passionate about leveraging deep technical expertise to contribute to a high-impact organization that values innovation, problem-solving, and continuous improvement. Dedicated to building a long-term career driving operational excellence and technological advancement.

Overview

6
6
years of professional experience
1
1
Certification

Work History

Hadoop Administrator

TERADATA INDIA PVT LTD
03.2022 - Current
  • Deploying, Maintaining and Administrating a Hadoop Clusters
  • Responsible for implementation and support of the Enterprise Hadoop Environment
  • Experienced in Hadoop cluster configuration and deployment to integrate with system hardware in the AWS
  • Experienced in setting Cloudera Hadoop prerequisite on Linux server
  • Installation and managing Cloudera Hadoop on POC
  • Upgraded Cloudera Manager and CDH for POC
  • Worked on various component of AWS Cloud like S3, and EC2
  • Monitor Hadoop cluster Health, Connectivity and Performance
  • Analyze Hadoop log files
  • File system management and monitoring
  • Setting up new Hadoop users
  • Working Knowledge of Cloudera Manager
  • Responsible for new and existing administration of Hadoop infrastructure
  • Upgraded Cloudera Manager and CDH for production clusters
  • Configuring User & Group Management in Hue
  • HA (High Availability) configuration of NameNode in production with onshore team
  • Providing support to client on daily basis to resolve their queries in effective manner
  • Working closely with Hadoop developers, System Admin, Hive Developers, Spark Developers, and Impala Developers for Hadoop Support
  • Involved in start to end process of hadoop cluster setup where in installation, configuration and monitoring the Hadoop Cluster
  • Adding/removing new nodes to an existing Hadoop cluster
  • Configured various property files based upon the job requirement
  • Understand Hadoop main components and Architecture
  • Implementation and Support the Enterprise Hadoop environment
  • Been a part of POC team and help build new Hadoop clusters and Hadoop Deployment
  • Troubleshooting, diagnosing, tuning and solving Hadoop issues
  • Worked with the range of big data components such as HDFS, Yarn, MapReduce, Tez, Hive, Impala, Spark, Scoop, Oozie, and Hue
  • Commissioning and decommissioning of nodes as per the requirements
  • Fine tune applications and systems for high performance and higher volume throughput
  • Configuring YARN performance tuning for better data processing
  • Configuring hive performance tuning for efficient data processing
  • Maintaining the quorum for the zookeeper in the cluster as per the cloudera standards

Hadoop Administrator

Defacto Veritas
10.2018 - 03.2022

Core Competencies Professional Experience

Detail-oriented Cloudera Hadoop Administrator with over 3 years of hands-on experience in deploying, configuring, and managing Hadoop clusters using Cloudera Distribution (CDH) And CDP. Proficient in monitoring and maintaining Hadoop components, ensuring high availability and performance, troubleshooting issues, and optimizing cluster performance. Strong background in Linux/Unix administration, security management, and scripting for automation. Adept at collaborating with cross-functional teams and ensuring smooth operations in big data environments.

  • Hadoop Ecosystem (HDFS, YARN, Hive, Impala, HBase, Spark)
  • Cloudera Manager and Cluster Monitoring
  • Linux/Unix System Administration (RedHat, CentOS)
  • Performance Tuning and Optimization
  • Security Management (Kerberos, Ranger, Sentry)
  • Data Backup, Recovery, and Disaster Recovery
  • Scripting and Automation (Bash, Python)
  • Troubleshooting and Problem Resolution
  • Cluster Installation and Configuration
  • Job Scheduling and Resource Management
  • Capacity Planning and System Scaling
  • Documentation and Reporting


  • Administered and maintained Cloudera Hadoop (CDH) clusters, ensuring high availability and performance of components such as HDFS, YARN, Hive, Impala, and Spark.
  • Monitored cluster performance using Cloudera Manager and proactively identified issues related to resource utilization, storage, and job execution.
  • Conducted performance tuning to optimize resource allocation, memory usage, and job scheduling, improving overall cluster efficiency by 20%.
  • Implemented and managed security measures, including Kerberos authentication and Cloudera Ranger, ensuring compliance with internal security policies.
  • Automated routine tasks such as backups, system health checks.
  • Collaborated with development teams to ensure smooth integration of big data workflows and troubleshoot Hadoop-related issues in real-time.
  • Managed cluster upgrades and patching processes, minimizing downtime and ensuring the system was up to date with the latest Cloudera features.
  • Supported data backup, recovery strategies, and disaster recovery plans, ensuring quick recovery of data in case of failure.
  • Assisted in the setup, configuration, and maintenance of Cloudera Hadoop clusters in a production environment.
  • Worked with senior admins to monitor cluster performance, troubleshoot system issues, and manage resource allocation within YARN.
  • Performed routine backups, conducted recovery tests, and ensured data consistency within HDFS.
  • Developed Python scripts to automate data workflows, cluster health checks, and resource provisioning tasks.
  • Assisted in managing Hadoop security using Kerberos and Ranger for user authentication and authorization control.
  • Collaborated with cross-functional teams to provide technical support for data-related issues and optimize Hadoop cluster performance.

Education

B.E -

Pune University
Pune, Maharashtra
01.2018

DIPLOMA -

MSBTE
01.2012

S.S.C -

PUNE
PUNE, Maharashtra
01.2009

Skills

  • Hadoop Ecosystem: HDFS, YARN, Hive, Impala, HBase, Spark
  • Cluster Management: Cloudera Manager, Apache Ambari, RUNDECK, Splunk
  • Operating Systems: CentOS, RedHat, Windows 7, Windows 8, Windows 10
  • Cloud Platforms: AWS, GCP
  • Programming/Scripting: Bash, Python, SQL
  • Security Tools: Kerberos, Cloudera Ranger, Sentry
  • Database: MySQL
  • Monitoring Tools: Cloudera Manager, pepperdata ,Splunk
  • Version Control: Git
  • Data Formats: Avro, Parquet, ORC, JSON
  • Other Tools: Pepperdata, Nlyte, RSA, JIRA, ServiceNow

Accomplishments

  • SPT Award: Recognized for outstanding performance in driving system reliability and optimizing Hadoop cluster operations.
  • Quarterly Performance Award: Awarded for consistently exceeding performance targets and contributing to the overall success of the team.
  • Performance Improvement: Spearheaded the optimization of a 50-node Hadoop cluster, resulting in a 30% improvement in resource utilization and a 25% reduction in job execution time.
  • Innovation Award: Successfully developed automation scripts to streamline Hadoop administration tasks, cutting manual intervention by 40%.

Projects

Hadoop Platform Support (Banking Domain)

Technologies: HDP, CDP, Hive, Impala, Spark, YARN, HDFS, OOZIE, Hue, Pepperdata, Splunk, Nlyte, NiFi, Rundeck, Cloudera Monitoring Tool, Profile Check Tool
Responsibilities:

  • Maintained overall health of the Hadoop clusters, ensuring the efficient execution of data processing tasks.
  • Monitored and resolved cluster-related issues on a daily basis, including disk unmount issues, HDFS quota management, and disk space filling issues.
  • Regularly conducted health checks and performance monitoring of the clusters, submitting weekly and monthly reports for ongoing improvements.
  • Utilized Cloudera Monitoring Tools to identify and resolve performance bottlenecks and optimize resource allocation for better system stability.
  • Managed the configuration of Hadoop ecosystem components including Hive, Impala, and Spark, as well as YARN and HDFS for effective data processing.
  • Automated cluster management tasks using Rundeck, reducing manual intervention and increasing operational efficiency.
  • Provided timely and efficient support for incidents and tickets in line with SLA, ensuring minimal downtime and smooth operations.



Big Data Production Support (Telecom Domain)

Technologies: AWS, MySQL, Sqoop, Hive, Impala, Spark, YARN, HDFS, OOZIE, Hue
Responsibilities:

  • Maintained and monitored the health of the Hadoop clusters for performing efficient data processing, ensuring smooth integration across various components like Hive, Impala, and Spark.
  • Responsible for the commissioning and decommissioning of nodes, including load balancing and ensuring High Availability on YARN for seamless job processing.
  • Configured Hue for user and group management, ensuring secure and efficient user access to the Hadoop ecosystem.
  • Coordinated with teams including Hadoop developers, system administrators, Hive, Spark, and Impala developers to provide timely support and troubleshooting for cluster issues.
  • Monitored and resolved issues related to MapReduce, YARN, and Impala, optimizing performance for more effective data processing.
  • Implemented performance tuning on Hive and YARN, improving query execution times and cluster efficiency.
  • Regularly worked with AWS services and MySQL for integrating external systems into the Hadoop ecosystem and supporting Sqoop for data imports/exports.
  • Managed and maintained daily reports and incident logs for timely resolution and efficient service delivery.

Personal Information

  • Date of Birth: 12/02/94
  • Nationality: Indian
  • Marital Status: Single

Certification

Microsoft Certified: Azure Fundamentals (AZ-900) – 2024

  • Microsoft Certified: Azure Administrator Associate (AZ-104) – 2024
  • Vantage Associate 2.3 Certification (Teradata) – May 2024
  • Big Data Foundation Level 1 – January 2022
  • Hadoop Foundation Level 1 – January 2022
  • Hadoop Programming Level 1 – January 2022

Disclaimer

I hereby declare that the information in this document is accurate and true to the best of my knowledge.

Timeline

Hadoop Administrator

TERADATA INDIA PVT LTD
03.2022 - Current

Hadoop Administrator

Defacto Veritas
10.2018 - 03.2022

DIPLOMA -

MSBTE

S.S.C -

PUNE

B.E -

Pune University
Sushant Waghmode