Project: Data Validation Firewall
Expertise in Python, AWS, and Master Data Management (MDM)
- Automated CRO data ingestion pipelines using AWS Step Functions to orchestrate Lambda functions for validating XML files, transforming them into JSON, and storing them in Postgres.
- Built a Python-based matching system leveraging FuzzyWuzzy and Reltio APIs to identify HCPs and HCOs, retrieve golden identifiers, and record match scores, enhancing data accuracy.
- Enabled data stewards to efficiently update or create profiles in Reltio by delivering enriched, pre-validated data, reducing manual effort, and improving operational efficiency.
- Delivered a scalable, cloud-based solution that streamlined data processing and optimized resource utilization.
Key Skills:
AWS (S3, Lambda, Step Functions) | Python | REST API Integration | Postgres | Reltio | Data Automation
Project: Regulatory Data Product
Expertise in Python, Snowflake, dbt and Data Integration
- Developed and optimized Python UDFs using Snowflake and Snowpark, packaged as stored procedures and scheduled with Snowflake Tasks to automate data extraction from Veeva Vault.
- Integrated 450+ tables through multiple API endpoints, implementing full and delta load strategies for backend jobs.
- Implemented an efficient data pipeline to fetch regulatory data via API, process job status, and retrieve results as csv for storage in the RAW Snowflake layer.
- Leveraged DBT models to apply business logic, transforming data into intermediate models in the WORK layer and final consumable views in the PUBLISH layer.
- Connected Snowflake to Starburst, enabling seamless analytics and querying access to the final views for business users.
Key Skills:
Python (UDF, Snowpark) | Snowflake | DBT | API Integration | Veeva Vault | Data Automation | Starburst