Milan Vaibhav

Summary

Technology professional with 10 years of experience, including 5 years in automating Hadoop service testing and 5 years in building and optimizing big data pipelines using PySpark, Kafka & other cloud technologies. Skilled in Python, SQL, and workflow automation

Overview

11

years of professional experience

Work History

Staff SDET

Acceldata Technologies Pvt Limited

11.2024 - Current

Added scripts to deploy Cloudera Manager along with cluster which reduced manual efforts bringing down deployment time from 1 day to less than 40 minutes.
Added scripts for standalone deployment of Kafka, Spark, Hadoop, Zookeeper, Trino which reduced manual efforts bringing down deployment time from half day effort to less than 5 minutes
Conducted regression testing, analyzed results, and submitted observations to management for various services.
Owned Cloudera & ODP deployment & Management.
Optimized test cases to maximize success of manual software testing.
Owned standalone Kafka and Spark services deployment & management

Staff Engineer

Conviva

11.2020 - 11.2024

Designed and implemented scalable ETL pipelines using PySpark , optimizing data processing across large datasets in Hadoop and Hive environments.
Developed and maintained Python and Bash scripts to automate data ingestion, transformation, and validation workflows.
Architected cost-effective AWS solutions by leveraging IAM roles, Kinesis streams, and Redshift clusters for real-time and batch processing.
Collaborated with cross-functional teams to estimate development efforts, conduct peer code reviews, and enforce coding standards.
Orchestrated workflows using Apache Airflow, improving job scheduling efficiency and pipeline reliability.
Implement real-time data ingestion and processing using Apache Kafka, ensuring low latency and high throughput.
Administer and optimize MySQL and MongoDB databases, including schema design, indexing, and query performance tuning.
Design and implement data models and schemas suited to different storage and retrieval needs in MongoDB and MySQL.
Ensure consistency and integrity of data across different storage systems, including Hadoop's HDFS.

Data Engineer

Bounceshare

01.2020 - 07.2020

Developed data pipelines using PySpark and Hive to support reporting and analytics, improving query performance by 30%.
Wrote SQL queries and built data models that simplified reporting and reduced dashboard load times
Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability.
Validated data flow through Kafka , ensuring messages were correctly produced, consumed, and processed.
Tested Kafka Registry for accurate schema management and compatibility.
Verified data integrity and accessibility in Athena tables, ensuring accurate query results.
Automated pipeline tests to streamline validation processes and improve efficiency.
Collaborated with development and data engineering teams to address issues and optimize pipeline performance.

SDET

Conviva

05.2018 - 01.2020

Developed and maintained a comprehensive test framework , including utility functions and automation of test scenarios.
Created and managed regression test suites , ensuring thorough coverage of features and functionalities.
Automated deployment processes for various services, improving efficiency and consistency.
Deployed and managed the complete Quality Engineering (QE) infrastructure , establishing a reliable testing environment including Cloudera and Hadoop clusters.
Automated Cloudera Manager and Cluster deployment using Terraform, Python and Ansible.
Executed regression tests, analyzed results, and identified issues , providing actionable insights to improve software quality.
Collaborated with development teams to refine test scenarios based on results and feedback, driving continuous improvement.
Helped in creating the Jenkins pipeline to run the automated suite every 24 hrs.
Created a docker image which can be used to run the test suite.

SDET

Nokia

06.2014 - 05.2018

Developed an automation framework using Python and PyTest for efficient testing.
Conducted daily stand-up meetings to review progress and plan tasks.
Designed and executed test cases for sanity checks based on requirement documents.
Performed manual testing on services such as HDFS and Hive to ensure functionality.
Automated sanity test scenarios , enhancing the efficiency of the testing process.
Prepared and analyzed reports on failed automated scripts, providing insights and recommendations.
Documented GUI components and installation procedures to support accurate and comprehensive records.
Automated Test cases for UI using Selenium Java and TestNG

Education

Bachelor of Technology - Electronics And Communications Engineering

Bengal College Of Engineering & Technology, Durgapur

08.2012

Skills

Experienced in multiple programming languages including Python, Java, and Bash

Experienced with big data technologies including Hadoop, Spark, Kafka, and Airflow

Experience with MySQL and MongoDB databases

Experienced with container orchestration technologies such as docker & k8s

Others - Maven, Git, Prometheus, Grafana

Timeline

Staff SDET - Acceldata Technologies Pvt Limited

11.2024 - Current

Staff Engineer - Conviva

11.2020 - 11.2024

Data Engineer - Bounceshare

01.2020 - 07.2020

SDET - Conviva

05.2018 - 01.2020

SDET - Nokia

06.2014 - 05.2018

Bengal College Of Engineering & Technology - Bachelor of Technology, Electronics And Communications Engineering

Summary

Overview

Work History

Staff SDET

Staff Engineer

Data Engineer

SDET

SDET

Education

Bachelor of Technology - Electronics And Communications Engineering

Skills

Timeline

Similar Profiles

Ray HornRay Horn

Tshilidzi TshiterekeTshilidzi Tshitereke

Tshilidzi TshiterekeTshilidzi Tshitereke

WEI JIANGWEI JIANG