Summary
Overview
Work History
Education
Skills
Timeline
Milan Vaibhav

Milan Vaibhav

Bengaluru

Summary

Technology professional with 10 years of experience, including 5 years in automating Hadoop service testing and 5 years in building and optimizing big data pipelines using PySpark, Kafka & other cloud technologies. Skilled in Python, SQL, and workflow automation

Overview

11
11
years of professional experience

Work History

Staff SDET

Acceldata Technologies Pvt Limited
11.2024 - Current
  • Added scripts to deploy Cloudera Manager along with cluster which reduced manual efforts bringing down deployment time from 1 day to less than 40 minutes.
  • Added scripts for standalone deployment of Kafka, Spark, Hadoop, Zookeeper, Trino which reduced manual efforts bringing down deployment time from half day effort to less than 5 minutes
  • Conducted regression testing, analyzed results, and submitted observations to management for various services.
  • Owned Cloudera & ODP deployment & Management.
  • Optimized test cases to maximize success of manual software testing.
  • Owned standalone Kafka and Spark services deployment & management

Staff Engineer

Conviva
11.2020 - 11.2024
  • Designed and implemented scalable ETL pipelines using PySpark , optimizing data processing across large datasets in Hadoop and Hive environments.
  • Developed and maintained Python and Bash scripts to automate data ingestion, transformation, and validation workflows.
  • Architected cost-effective AWS solutions by leveraging IAM roles, Kinesis streams, and Redshift clusters for real-time and batch processing.
  • Collaborated with cross-functional teams to estimate development efforts, conduct peer code reviews, and enforce coding standards.
  • Orchestrated workflows using Apache Airflow, improving job scheduling efficiency and pipeline reliability.
  • Implement real-time data ingestion and processing using Apache Kafka, ensuring low latency and high throughput.
  • Administer and optimize MySQL and MongoDB databases, including schema design, indexing, and query performance tuning.
  • Design and implement data models and schemas suited to different storage and retrieval needs in MongoDB and MySQL.
  • Ensure consistency and integrity of data across different storage systems, including Hadoop's HDFS.

Data Engineer

Bounceshare
01.2020 - 07.2020
  • Developed data pipelines using PySpark and Hive to support reporting and analytics, improving query performance by 30%.
  • Wrote SQL queries and built data models that simplified reporting and reduced dashboard load times
  • Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability.
  • Validated data flow through Kafka , ensuring messages were correctly produced, consumed, and processed.
  • Tested Kafka Registry for accurate schema management and compatibility.
  • Verified data integrity and accessibility in Athena tables, ensuring accurate query results.
  • Automated pipeline tests to streamline validation processes and improve efficiency.
  • Collaborated with development and data engineering teams to address issues and optimize pipeline performance.

SDET

Conviva
05.2018 - 01.2020
  • Developed and maintained a comprehensive test framework , including utility functions and automation of test scenarios.
  • Created and managed regression test suites , ensuring thorough coverage of features and functionalities.
  • Automated deployment processes for various services, improving efficiency and consistency.
  • Deployed and managed the complete Quality Engineering (QE) infrastructure , establishing a reliable testing environment including Cloudera and Hadoop clusters.
  • Automated Cloudera Manager and Cluster deployment using Terraform, Python and Ansible.
  • Executed regression tests, analyzed results, and identified issues , providing actionable insights to improve software quality.
  • Collaborated with development teams to refine test scenarios based on results and feedback, driving continuous improvement.
  • Helped in creating the Jenkins pipeline to run the automated suite every 24 hrs.
  • Created a docker image which can be used to run the test suite.

SDET

Nokia
06.2014 - 05.2018
  • Developed an automation framework using Python and PyTest for efficient testing.
  • Conducted daily stand-up meetings to review progress and plan tasks.
  • Designed and executed test cases for sanity checks based on requirement documents.
  • Performed manual testing on services such as HDFS and Hive to ensure functionality.
  • Automated sanity test scenarios , enhancing the efficiency of the testing process.
  • Prepared and analyzed reports on failed automated scripts, providing insights and recommendations.
  • Documented GUI components and installation procedures to support accurate and comprehensive records.
  • Automated Test cases for UI using Selenium Java and TestNG

Education

Bachelor of Technology - Electronics And Communications Engineering

Bengal College Of Engineering & Technology, Durgapur
08.2012

Skills

  • Experienced in multiple programming languages including Python, Java, and Bash

  • Experienced with big data technologies including Hadoop, Spark, Kafka, and Airflow

  • Experience with MySQL and MongoDB databases

  • Experienced with container orchestration technologies such as docker & k8s

  • Others - Maven, Git, Prometheus, Grafana

Timeline

Staff SDET - Acceldata Technologies Pvt Limited
11.2024 - Current
Staff Engineer - Conviva
11.2020 - 11.2024
Data Engineer - Bounceshare
01.2020 - 07.2020
SDET - Conviva
05.2018 - 01.2020
SDET - Nokia
06.2014 - 05.2018
Bengal College Of Engineering & Technology - Bachelor of Technology, Electronics And Communications Engineering
Milan Vaibhav