

Experienced and certified Data Engineering, Management, and Governance Specialist with 8+ years of experience in designing and implementing large-scale data pipelines, cloud migrations, real-time data processing, and automation solutions. Proven expertise in Google Cloud Platform (GCP), BigQuery, Apache Airflow (Cloud Composer), Terraform, and Python. Adept at working with modern data integration tools like Debezium CDC, Kafka, Pub/Sub, and orchestrating robust ETL/ELT workflows using SQL, Cloud Functions, and CI/CD pipelines (Azure DevOps & GitHub Actions). Successfully delivered projects across diverse domains including finance, retail, cybersecurity, and enterprise asset management.
Skilled in performance optimization, data modeling, compliance automation, and conversational AI integrations using AgentSpace AI. Strong foundation in Hadoop ecosystem, with prior experience in Cloudera/Hortonworks administration, Hive, NiFi, Sqoop, and Pig.
Designed and managed scalable ETL/CDC pipelines using Python and Debezium CDC from PostgreSQL to BigQuery in GCP for real-time data streaming.
Migrated batch data pipelines from AWS to GCP, optimizing performance using BigQuery partitioning, clustering, and STRUCT data types.
Developed DAGs using Cloud Composer (Apache Airflow) to orchestrate and monitor data workflows from raw to insights layers in BigQuery.
Implemented CI/CD pipelines using Azure DevOps, integrating with GCP Cloud Functions for automated deployment and data processing.
Built and deployed real-time Purchase Order chatbot using AgentSpace AI, integrated with BigQuery and GCP Cloud Functions for dynamic query handling.
Trained NLP chatbot models to interpret natural language queries like “What’s the status of PO123?” and generate SQL against structured purchase order data.
Developed custom CDC logic with deduplication and snapshot comparison to ensure data freshness and integrity in BigQuery.
Automated VPN compliance audits using Prisma Cloud, Microsoft Graph APIs, and Python scripts to validate Conditional Access Policies.
Created JSON-based compliance reports and pushed them securely to AWS S3 via authenticated cloud credentials.
Converted legacy Control-M XML job definitions to Cloud Composer DAGs using custom-built Python scripts and Janus Converter logic.
Built reusable and modular DAG templates using Python, deployed to GCS, and managed airflow triggers for cross-environment orchestration.
Migrated on-premise MySQL, Oracle, and Hive workloads to GCP using Bash scripting, Terraform, and Airflow, supporting hybrid data flows.
Integrated Hive, HQL, and Sqoop job logic into GCP with caching, using DataProcOperator and custom transformations in Airflow.
Maintained and monitored Hadoop clusters (CDH 5.x), managed HDFS partitions, and executed data transfer jobs using Sqoop and Pig.
Automated data ingestion using Apache NiFi with formats like Avro, Parquet, and ORC into Hive for World Bank Group's data lake project.
Administered Hadoop clusters for VISA and Mercedes Benz, handling user onboarding, Kerberos configuration, and SLA-based monitoring.
Developed Shell scripts and XML configurations to automate Control-M job deployments and integrate scheduling logic.
Created Hive external tables with advanced partitioning and bucketing strategies for optimized querying and reporting.
Leveraged Terraform, GitHub Actions, and Cloud Shell for deploying Compute Engine and KMS setups in a secure and scalable way.
Collaborated with cross-functional teams including DevOps, compliance, and security to enforce data governance policies and ensure audit readiness.