LAXMAN VAISHNAV - Data Engineer - Anand Rathi Wealth Services

Summary

Data Engineer with 4+ years of experience building and optimizing scalable ETL pipelines for large-scale financial datasets. Proficient in PySpark, Apache Spark, and AWS EMR, with strong expertise in distributed data processing. Reduced ETL processing time from 9 hours to 1 hour through performance optimization. Skilled in data modeling and delivering high-quality data solutions for business insights.

Overview

4

years of professional experience

Work History

Data Engineer

Anand Rathi Wealth Services

Jodhpur, Rajasthan

12.2021 - Current

Tech Stack: AWS EMR, S3, Lambda, CloudWatch, Airflow, Hadoop, Spark, PySpark, SQL, Hive (Tez), Hdfs, Oozie, Quicksight

Designed and implemented a scalable ETL pipeline on AWS EMR using transient clusters, performing data transformations with PySpark and loading processed data using Spark write operations to target storage systems.
Designed and implemented an ETL pipeline using AWS EMR transient clusters, leveraging Hive with Tez for optimized query execution.
Automated cluster provisioning and job execution using AWS Lambda for triggering EMR clusters and AWS CloudWatch for scheduling.
Orchestrated workflows and managed dependencies using Apache Oozie, ensuring seamless execution of Hive, Sqoop, and PySpark jobs.
Performed additional data processing & optimizations using PySpark, working with data stored in S3 and Hive tables.

Education

Bachelor of Engineering -

M.B.M. Engineering College

Jodhpur

2018

Skills

Programming & Data Processing:
Python, SQL, PySpark, Shell Scripting, XML

Big Data Technologies:
Apache Spark, Apache Hadoop, HDFS, AWS EMR

Cloud & AWS Services:
Amazon S3, AWS IAM (Roles & Policies), AWS Lambda, Amazon CloudWatch

Data Warehousing & Databases:
Snowflake, Apache Hive, MySQL, SQL Server

Workflow Orchestration:
Apache Airflow, Apache Oozie

Analytics & Visualization:
Tableau, Power BI, Amazon QuickSight

Accomplishments

Improved ETL workflow performance by ~85%, reducing execution time from 9 hours to 1 hour by leveraging PySpark optimizations, efficient data partitioning, and distributed processing on AWS EMR transient clusters.
Awarded for outstanding performance in Q2 of 2022-23 for delivering high-quality ETL solutions and process optimizations.
Recognized as ‘Star Performer’ in multiple projects for successfully optimizing data pipelines and reducing processing time.
Received Spot Award for implementing an efficient PySpark-based data processing framework, enhancing data accuracy and performance.
Optimized ETL Performance: Reduced data processing time by 30% by implementing Hive partitioning & bucketing, improving query efficiency.
Automated Workflow Execution: Designed and implemented Apache Airflow DAGs, reducing manual job execution and ensuring 100% job success monitoring.

Timeline

Data Engineer

Anand Rathi Wealth Services

12.2021 - Current

Bachelor of Engineering -

M.B.M. Engineering College