- Led team in efficiently handling large-scale data solutions.
- Addressed and resolved complex data processing challenges, focusing on performance optimization and scalability.
- Provided technical leadership and mentorship to junior team members, fostering skill development and professional growth.
- Engaged in impact analysis and estimated budgets for ETL components across multiple upcoming projects.
Project 1: Cornerstone Data Migration
Developed a Python/PySpark solution for API data extraction to AWS S3 , with transformation and Snowflake integration.
Roles and Responsibilities:
- Leveraged PySpark for orchestrating data pipelines, facilitating efficient data exchange between API endpoints and distributed storage solutions, including AWS S3 and Snowflake .
- Executed large-scale data analysis and transformation tasks using Spark SQL and Spark's data processing capabilities.
- Managed incremental data operations including inserts, updates, and deletions sourced from APIs.
- Implemented parallel processing using PySpark to efficiently handle large-scale data ingestion from APIs .
- Streamlined daily data updates and workflow scheduling through automation capabilities of Airflow .
Project 2: Canadian Profitability
Streamlined processing of an offline MS Access Database by developing automation scripts in PySpark. Resulted in significant savings of 1400 labour hours annually . Data generated was utilized by QlikView applications to enhance data visualization.
Roles and Responsibilities:
- Analyzed customer business requirements to prepare data mappings , evaluating data flow, transformation needs, and data fixes.
- Authored Spark and Hive jobs to efficiently extract records from multiple downstream sources.
- Managed export and import of batch and delta data into HDFS, HBase, and Hive utilizing PySpark.
- Monitored job executions, performed debugging, and resolved bugs to ensure smooth operations.
- Implemented job automation solutions using shell scripting and Airflow for enhanced efficiency.
Project 3: HRDW Re-platform
Successfully executed involving ETL processes using IBM DataStage. Transformed and stored data in a Data Warehouse , making it accessible to reporting applications such as Cognos, QlikView, QlikSense, and Power BI.
Roles and Responsibilities:
- Orchestrated analysis of upstream and downstream data flows and business requirements .
- Utilized IBM Data Stage to implement robust ETL processes, facilitating efficient data import/export operations within data warehouse environment.
- Performed data loads, and conducted extensive SQL-based data validation and quality assurance to uphold data integrity.
- Crafted and automated data extraction procedures using Linux shell scripting , optimizing system performance through strategic Cron-job scheduling .