Project - USMCA Feb 2024 - Apr 2025
- Used GCP buckets as the initial storage layer for raw data coming from various sources like Flat files, RPA, file servers, SAP.
- Implemented ETL pipelines by transforming Alteryx workflows into Spark Scala jobs to process raw data into structured formats.
- Utilized Scala, SQL and conducted comprehensive performance analysis and optimization exercises for existing code improving efficiency.
- Monitored and debugged dataproc jobs using logs and metrics, identifying and resolving bottlenecks in data pipelines.
- Automated the jobs and loaded transformed data into BigQuery using airflow.
- Ensured accurate and timely availability of data for Power BI dashboards by monitoring and optimizing ETL pipelines.
Project - Trade Vault Aug 2022- Jan 2024
- Built scalable data warehousing solutions using Snowflake, integrating diverse data sources, including APIs, third-party tools, and on-premises systems, to centralize business intelligence reporting.
- Implemented SQL-based ETL processes directly in Snowflake using native features, such as Streams , Tasks, and Stored Procedures, to ingest and transform trade data.
- Created scalable data ingestion pipelines to load structured and semi-structured trade data (e.g., JSON, CSV, and Parquet) into Snowflake's staging tables for processing.
- Utilized Snowflake's Time Travel and Zero-Copy Cloning features to manage historical data and support back-testing and audit requirements in the trade vault.
- Enhanced data security by implementing role-based access control and utilizing Snowflake's data masking for sensitive trade information.
- Google Analytics and Big Query are leveraged to track and analyze user behavior, including calculating user time on site and engagement metrics across various web pages and product categories.