Project : Product Content and Intelligence (client - Wayfair)
- Data Engineer with almost 2 years of experience in building , maintaining scalable, automated data pipelines across cloud-native environments.
- Accountable for automation of product tagging pipelines using Google Cloud Pub/Sub, BigQuery, and Airflow (Cloud Composer), reducing manual intervention by 60%.
- Enhanced product attribute accuracy (color, shape, material , subject , style, lifestage) by 30% by integrating ML inference into enrichment workflows.
- Developed SQL-based enrichment processes, improving data preparation and reducing latency in ADS endpoint submissions.
- Applied Python, specifically leveraging PANDAS, for transformation and enhancing data quality
- Created unified schemas and validation logic across multiple Pub/Sub topics, ensuring consistent and reliable data exchange between systems.
- Automated data uploads using Python, reducing operational overhead and increasing pipeline reliability.
- Integrated Datadog dashboards and alerting to monitor performance metrics such as job failures, latencies, row-level updates, and endpoint success rates.
- Enabled proactive issue resolution by integrating PagerDuty with Airflow for real-time alerts via email, SMS, and phone.
- Strong collaborator with ML, UFS teams, delivering production-ready data workflows with clear schema documentation and lifecycle management.
Project : Review Summarization system (client - Wayfair)
- Collaborating with the UFS teams to design and implement a Gen AI review summarization system. where i defined and documented the schema for four Pub/Sub topics, including aspect extraction and review summarization inputs and outputs, ensuring clarity in data exchange workflows.
- Developed a review summarization pipeline that processes live review events, extracts relevant aspects, and generates concise summaries for SKU reviews based on defined thresholds.
- Established data constraints and validation rules to maintain data integrity and optimize the summarization process, including thresholds for minimum review counts and incremental updates. thus improving processing efficiency by 40%.