Data Engineering
Data Engineer
- Served as one of the core engineers in building NABARD's enterprise Data Governance Platform from the ground up, driving architecture, onboarding strategy, and platform adoption.
- Led a team of 3 Data Engineers and collaborated with client stakeholders to deliver enterprise-scale data engineering and governance initiatives.
- Designed and developed scalable data ingestion, metadata processing, and governance workflows using Databricks, PySpark, Spark SQL, Delta Lake, and Unity Catalog for NABARD's enterprise data platform.
- Integrated 32+ enterprise applications across MySQL, PostgreSQL, SQL Server, Oracle, and AWS S3 into a centralized Databricks Lakehouse, enabling unified data visibility and governance.
- Executed large-scale metadata onboarding and cataloging processes, generating 100,000+ datasets and cataloging 1M+ columns through automated Databricks pipelines.
- Built and optimized ETL/ELT pipelines using PySpark and Spark SQL, improving metadata processing performance and scalability across enterprise systems.
- Designed and published 200+ governed data products, delivering trusted, analytics-ready datasets for business users and reporting platforms.
- Implemented PII discovery, classification, and masking for 200K+ sensitive data fields, strengthening enterprise data security, privacy, and compliance.
- Developed advanced Spark SQL and analytical queries for data validation, profiling, reconciliation, and metadata analysis, ensuring high data quality standards.
- Leveraged LLM-powered classification and governance automation to accelerate sensitive data identification and business glossary enrichment.
- Owned the end-to-end lifecycle of governance assets, including dataset onboarding, metadata enrichment, lineage validation, business glossary mapping, and governed data product delivery.
