Project : AI-Driven Shelf & Stock Management Integration
Description: Implemented an AI-powered shelf and inventory monitoring pipeline by integrating retail data with third-party computer vision platforms. Enabled automated shelf analytics, including stock availability, planogram compliance, and pricing accuracy, using scheduled data transfers orchestrated through Azure Databricks.
Roles and Responsibilities:
- Designed and developed end-to-end data pipelines to support AI-driven shelf and stock management across retail stores using Azure Databricks.
- Coordinated secure daily/weekly transfers of inventory, pricing, planogram, sales, and metadata datasets to third-party AI vendors for model training and real-time analytics.
- Integrated API-based and SFTP-based data exchange processes, validating authentication workflows using Postman (API) and FileZilla (SFTP) during POC/testing.
- Implemented production-grade data delivery workflows in Databricks, including scheduling, automated retries, monitoring, and exception handling.
- Deployed workflows using Databricks Asset Bundles to ensure version-controlled, reproducible, and CI/CD-aligned releases.
- Collaborated with engineering, vendors, and store operations teams to ensure high-quality data feeds, improving shelf-availability insights, and reducing manual audits.
- Enhanced operational efficiency by automating data handoff processes, enabling faster and more consistent AI-based shelf monitoring across pilot stores.
Project: UNIFIED CUSTOMER MATCHING ENGINE
Description: Developed a unified customer matching engine by integrating and reconciling data from diverse sources, applying rule-based logic to assign a unique customer identifier for consistent entity resolution across systems.
Roles and Responsibilities:
- Designed and implemented a unified customer matching engine to streamline data extraction and classification across physical and digital sales channels for customers.
- Integrated encrypted transactional data from online and offline sources using PySpark and Azure Databricks, enabling secure and scalable data unification.
- Applied deterministic matching logic and rule-based algorithms to assign unique customer identifiers for consistent cross-platform recognition.
- Collaborated with cross-functional teams to improve customer data accuracy, resulting in a 35% increase in unified customer identification.
- Enhanced personalization strategies for targeted marketing campaigns, contributing to a 20% improvement in marketing effectiveness through enriched customer profiles.
Project: Rewards Data Ingestion.
Description: Developed a streaming data warehousing solution for mobile rewards data by ingesting Event Hub streams via Databricks, handling schema evolution, and implementing a medallion architecture to deliver business-specific insights through stakeholder collaboration.
Roles and Responsibilities:
- Designed and implemented a real-time data pipeline using Azure Databricks to ingest 150 million daily records from a mobile application, supporting a rapidly growing retail network with 125,000 daily active users across multiple stores.
- Ingested streaming rewards data from a mobile application via Azure Event Hub and APIM, and processed it in Azure Databricks by decoding binary payloads into JSON-formatted strings in the Bronze layer, then parsing relevant fields in the Silver layer to build structured Delta tables for downstream consumption.
- Collaborated with business stakeholders to identify and extract key payload elements relevant to customer rewards and personalization.
- Applied the medallion architecture to build Bronze, Silver, and Gold streaming tables, ensuring scalable and organized data transformation layers.
- Migrated raw and processed data to Azure Data Lake Storage (ADLS), improving data accessibility, and reducing processing costs marginally.
- Optimized pipeline throughput and streaming model performance, resulting in a 25% boost in decision-making speed, and unlocking new revenue opportunities.
Project: Sales Adjustments Pipeline
Description: Built a scalable sales adjustments pipeline using 15 source tables to generate basket- and item-level sales data, enabling accurate supplier billing, SAP integration, and daily finance reporting for evaluating reward-driven sales across stores.
Roles and Responsibilities:
- Designed and developed a scalable sales adjustments pipeline using 15 source tables to compute rewards-based sales at both the basket and item levels.
- Implemented campaign-type-specific logic (e.g., repeatable/non-repeatable missions, coupon-based earnings, and star product rewards) to accurately derive sales adjustments across varied promotional structures.
- Modeled item-level sales adjustment data to generate SAP-ready outputs, streamlining integration with ERP systems for financial reconciliation.
- Automated daily file generation capturing store-level sales deltas, with debit-credit adjustments, supporting market sheet reporting for the finance team.
- Collaborated with finance stakeholders to ensure adjusted sales data met audit and reporting requirements for evaluating rewards program performance.
- Built and maintained downstream tables to support the supplier billing portal, enabling accurate tracking of product utilization, and supplier settlements.
- Orchestrated the end-to-end pipeline using Databricks Asset Bundles, enabling scalable deployment and optimized performance through dynamic overwrite partitioning; handled late-arriving data by running D-2 logic, and supported backfill across multiple dates in case of source delays.
Project: Data Ingestion from Third-Party Sources.
Description: Developed a metadata-driven, reusable ingestion pipeline in Azure Data Factory to onboard legacy third-party data into the new Azure data platform. Integrated dbt for scalable transformation, data modeling, and quality checks across Delta Lake layers, enabling end-to-end orchestration, and improved governance during the platform migration.
Roles and Responsibilities:
- Designed and implemented a reusable, metadata-driven ingestion pipeline in Azure Data Factory to onboard legacy third-party data feeds, ensuring seamless migration, and business continuity during the transition to the new Azure data platform.
- Developed support for multiple file formats (CSV and Parquet) with flexible load types, including incremental, full, partition overwrite, and upsert logic, driven entirely by metadata configuration.
- Integrated Event Hub triggers to orchestrate the ingestion flow based on file drop notifications, automating job execution, and reducing operational overhead.
- Performed schema-driven data quality checks and encrypted PII columns before loading curated data into staging, and enriched Delta Lake layers.
- Incorporated dbt to build modular, maintainable SQL transformation pipelines on top of Delta Lake, including layered modeling (staging → intermediate → mart models), automated schema testing, uniqueness checks, and data constraints, reusable macros for applying business rules and standard transformations, and documentation and lineage tracking using dbt docs.
- Enabled scalable and structured data storage using separate raw, staging, and enriched zones in ADLS, improving traceability, maintainability, and downstream usability.
- Built audit tracking capabilities in Synapse Serverless SQL pools to monitor pipeline health and step-level status across all ingestion workflows.