Accomplished Senior Manager at L & T Financial Services LTD, adept in leveraging Python and SQL for data transformation and analytics. Proven ability to optimize workflows and enhance scalability using GCP services. Strong problem-solving skills complemented by a collaborative approach, driving impactful results in data processing and integration.
Associate Cloud Engineer (GCP), lle9f0, 105883
Nostradamus: Mumbai, MH, 08/01/24 - Present,
Designed and optimized partitioned and clustered BigQuery datasets for petabyte-scale analytical workloads, reducing query costs by 40%
Developed ELT pipelines to ingest structured and semi-structured data into BigQuery using Dataflow and Cloud Storage.
Automated data ingestion and transformation tasks using Cloud Functions triggered by Cloud Storage and Pub/Sub, enabling near real-time processing.
Orchestrated custom ETL workflows on GCE VM instances with Python and Pandas, handling high-throughput batch data processing.
Used predefined Dataflow templates (e.g., GCS to BigQuery, Pub/Sub to BigQuery) to accelerate ETL pipeline deployment and integrated them with Cloud Composer for orchestration.
Containerized lightweight data services are deployed via Cloud Run for scalable, stateless microservices handling on-demand data transformation APIs
Finance Cloud reporting platform: Pune, MH, 04/01/24 - 06/30/24,
Troubleshooting and debugging Hive queries and performance issues
Hive partitioning, bucketing, and indexing for efficient data retrieval
Played a key role in data transformation and cleansing using Apache Spark operations
Designed and implemented a scalable data warehouse using BigQuery optimizing for performance and cost-efficiency Diagnosed and resolved issues by sampling data
Updated existing PySpark code to ensure compatibility with GCP Dataproc.
Financial crime surveillance operation: Kuala Lumpur, MY, 12/23/22 - 02/29/24,
Refactored on-premises Hive queries to be compatible with GCP BigQuery
Developed Python code for various Google Cloud API clients such as BigQuery, Configuration, Dataproc, and others Developed ETL pipelines to extract, transform, and load data from disparate sources
Created Python utilities to facilitate connections between source and target systems for data processing using PySpark
Migrated data from on-premises Hive to GCP BigQuery using the Dataiku tool
Converted Shell scripts into Airflow DAGs
Updated existing PySpark code to ensure compatibility with GCP Dataproc
Customer sales analysis: Pune, MH, 09/01/21 - 02/28/22,
Utilized SQL to extract data from eight different related tables from customer sales databases using JOIN and VIEW
Transformed and filtered data by using aggregating and filtering functions to improve the reporting process
Aggregated and visualized the data by using pandas to compile a professional report
Updated existing PySpark code to ensure compatibility with GCP Dataproc
Data Quality Assurance: Mumbai, MH, 06/01/19 - 09/30/21,
Implemented quality checks and automated processes to ensure the accuracy and completeness of healthcare data
Updated existing PySpark code to ensure compatibility with GCP Dataproc
Designed and maintained data warehouses to store and organize structured and unstructured healthcare data for analytics and reporting purposes
Developed Python code for various Google Cloud API clients such as BigQuery, Configuration, Dataproc, and others