

Goal-oriented IT Professional with 15+ years of experience in IT, specializing in Data Engineering. Designed and implemented data warehouses and scalable, high-performance data pipelines using Azure Databricks, Pyspark focusing on ETL and ELT workflows. Design end-to-end MLOps lifecycle implementation using Databricks, MLflow, and GitHub Action for automated model testing, evaluation, and deployment across environments.
•Developed and implemented a Linear Regression model to predict Airbnb customer pricing, performing data preprocessing (data cleaning, handling missing values, and normalization) and feature engineering (encoding categorical variables and deriving key features), followed by model training and evaluation; leveraged Databricks and MLflow for experiment tracking and model management, and implemented CI/CD pipelines using GitHub Actions for automated testing, evaluation, and deployment across Dev, Staging, and Production environments.
•Conducted data preprocessing, feature engineering, and model
evaluation to optimize model performance and accuracy
•Collaborated with cross-functional teams to understand business
requirements and translate them into actionable insights and
recommendations
•Designed, developed, and implemented technical data and analytic
solutions for HSBC Wholesale Banking Group using Azure cloud services
•Integration of solutions for continuous deployment
•Worked on the creation of scalable CI/CD pipelines
Architected a robust ELT data pipeline using Azure Data Factory and Azure Data Lake Storage, implementing a medallion-style architecture (Landing, Curated, Processed layers) designed and built dimensional data models (Fact and Dimension tables) for sales and orders, optimized with full and incremental loading strategies, and delivered analytics-ready data into Azure SQL Warehouse.
Collaborated with cross-functional teams to define data modeling
standards and best practices, ensuring alignment with business
objectives and scalability of data solutions architecture
Developed Databricks notebooks for Dim tables creation using PySpark, SparkSQL for Fact tables
Implemented validation notebooks for data validation in different layers
AWS Glue for ETL pipeline creation and data import from PostgreSQL
Proposed, designed, and implemented data pipelines for ETL on AWS and
Azure
•Analyzing big datasets, developed Spark API's, and converted Hive/SQLqueries into Spark transformations
•Imported tables from RDBMS to HDFS using Sqoop and utilized Kafka for real-time streaming
•Designed and proposed data pipelines for data ingestion into Hadoop &Data Lake