Currently working at Persistent System Limited as Lead Software Engineer with 4.7+ experience in designing scalable PySpark solutions and developing high-performance scripts for large datasets. Proven track record in optimising data workflows, enhancing vehicle transportation efficiency, and minimising costs through automation. Expertise in AWS services including Lambda, S3, DynamoDB, Glue, and Step Functions for data validation, integration processing, and machine learning applications. Skilled in data warehousing, ETL processes, cloud computing, and data visualisation to facilitate data-driven decision-making. Committed to delivering structured documentation and conducting thorough data quality assessments to ensure accuracy and alignment with project goals. Career focus on leveraging advanced technologies to drive innovation and improve operational efficiency.
Amazon – Data Spark Automation Projects (Offshore_1)
Duration: February 2025 – May 2025
Client: Amazon
Role: PySpark Developer
Designed and implemented scalable PySpark solutions to support the training of a language model that automates the analysis of migration changes between Apache Spark versions.
Built modular, high-performance PySpark scripts for processing large-scale datasets used in model training and evaluation.
Authored clear, structured documentation detailing script logic, data pipelines, and output formats for client reference and maintenance.
Conducted comprehensive data quality assessments to ensure output accuracy, consistency, and alignment with project requirements.
Client-Corten Logistics
Key Features:
Python
SQL
Data Warehousing
ETL
Cloud Computing
Data Mining
Data Visualization
Data Analysis
Data Modeling
Apache Spark
AWS
Data Pipeline
Data Migrations
Databricks
Pandas
AWS Lambda
Amazon S3
Amazon DynamoDB
AWS Glue
undefined