
Senior Data Engineer specializing in data architecture and analytics solutions with 12+ years of experience. Delivered impactful results through expertise in Data Engineering, Data Warehousing, and Data Lakes, leveraging technologies such as Python, AWS, LLM, ETL, BI and SQL. Committed to driving efficiency using DevOps practices and Agile methodologies in data modeling and database design.
Client: Sanofi (CHC, BioPharma), Regeneron
DATA & AI ENGINEERING / DATA QUALITY PLATFORM (DATAORION)
1. Built DataOrion, an AI-powered data platform on Snowflake enabling automated, end-to-end AI-ready data preparation.
2. Developed LLM-driven metadata generation (BMG) integrated with Informatica CDGC using Snowflake cortex.
3. Designed Qualii, an AI-based Data Quality engine for natural language → SQL rule generation, LLM-driven, combining metadata, sample data, and user context.
4. Engineered scalable pipelines using Snowflake, Python, Streamlit, enabling agentic workflow orchestration.
5. Implemented config-driven DQ framework (EDG-DQ) supporting in-motion & at-rest validation.
6. Built SQL-based rule engine, error logging, and failed-record tracking for enterprise datasets.
7. Automated DQ score computation & publishing to CDGC, improving governance and catalog insights.
8. Integrated pipelines with Informatica CDI/IICS for orchestration and scheduling.
10. Built Streamlit UI + backend services for rule generation, deduplication, and SQL normalization.
11. Integrated Snowflake Cortex models for intelligent rule recommendations.
AWS SERVERLESS DQ & ANALYTICS
1. Built serverless DQ Score Utility (AWS Lambda, API Gateway, DynamoDB, S3, SNS).
2. Developed API-driven DQRO processing (GET/POST) with CDGC integration.
3. Implemented secure APIs (OAuth2/JWT via Layer7) and job tracking with DynamoDB.
3. Automated deployments using Terraform + GitHub Actions (CI/CD).
4. Engineered serverless pipelines for structural DQ checks integrated with Informatica ingestion flows.
POWER BI DATA QUALITY DASHBOARD
1. Built enterprise DQ dashboards using Snowflake + Power BI for real-time insights.
2. Developed KPIs, DAX, and data models for DQ score tracking and trend analysis.
3. Implemented RBAC and reusable templates for scalable deployments.
Client: TERADYNE
MELISSA ADDRESS VALIDATION
1. Built batch address validation pipeline using Melissa API (Python + Snowflake).
2. Implemented parallel processing, batching, and caching for performance optimization.
3. Mapped Melissa codes (AE/AC/AV/AS) into structured outputs
Built dynamic validation classification (Verified, Partial, Failed)
4. Enabled address standardization with error handling and retries
Provided expert consulting services to optimize client technology infrastructure.
Business intelligence tools
ETL tools expertise
Data Pipeline Management
Database management systems
Programming Languages: Pyspark, Python, SQL, Unix Shell Scripting
Cloud and OS platforms
Azure services
Container Orchestration
Data modeling
Pharmaceutical industry expertise
Agentic AI: Snowflake Cortex LLM
Streamlit AI
Claude