Summary
Overview
Work History
Education
Skills
Certification
Timeline
PERSONAL PROJECTS
INTERESTS
Generic
NILAB NATH

NILAB NATH

Gauhati

Summary

Results-driven Data Scientist at EY GDS with hands-on experience in Python, PySpark, Pandas, and Azure Databricks. Specialized in developing scalable data validation pipelines and GenAI, Agentic AI, and document intelligence solutions. Skilled in deploying FastAPI-based APIs, leveraging LLMs and vector databases, and implementing workflow automation and microservice architecture for enterprise AI advancements.

Overview

3
3
years of professional experience
1
1
Certification

Work History

Data Scientist

Ernst and Young GDS
08.2022 - Current
  • Big Data Engineering & Workflow Automation (PySpark, Pandas, Databricks) Developed scalable data pipelines for validation, preprocessing, and extraction; automated workflows with Pandera schema checks; managed file conversions; implemented data quality modules; and built Plotly dashboards with scheduled pipelines for daily delivery.
  • Data Warehousing & SQL Analysis (Fact and Dimension Modeling) Performed SQL analysis to identify and map source tables for corresponding FACT and DIMENSION tables.
  • NLP & Data Privacy Solutions (PII Masking, SpaCy, Transformers, Streamlit) Developed an NLP-based PII masking solution using Transformers and SpaCy, optimized performance with multiprocessing and multithreading, built a Streamlit application for deployment, enhanced compatibility with diverse SpaCy models, and achieved significant reduction in PII detection latency.
  • Agentic AI & Code Conversion (SAS to Python, Streamlit) Developed an Agentic AI solution for SAS-to-Python code conversion, implemented a compile code node to detect syntax errors; and built a Streamlit-based user interface for the project.
  • Document Intelligence & AI Process Optimization Developed and enhanced Document Intelligence solutions by generating API payloads aligned with business requirements, fixing bugs, and optimizing workflows. Also developed and deployed Azure Functions for automated solution delivery, and reduced redundant LLM calls to cut processing costs and improve throughput.
  • SOW Ontology Developed a SOW Ontology engine using SpaCy for POS tagging and LLM workflows to identify key technologies and signatories. Used PGVector for efficient chunk retrieval, automated PDF ingestion with Azure Document Intelligence, and created Delta tables in Unity Catalog. Improved XML creation for PDF forms, streamlining the ingestion process for higher data accuracy and efficiency.

Education

M Tech - Power Electronics

Swami Vivekananda University
07.2025

B TECH - Electronics and Electrical Engineer

KIIT UNIVERSITY
Bhubaneshwar, India
06.2022

Skills

  • FastAPI
  • Basic SAS
  • Embeddings
  • Streamlit
  • LLM Evaluation
  • Agentic AI
  • PySpark
  • Azure Databricks
  • Pandas
  • Basic SQL
  • GEN AI
  • Github Copilot
  • GIT

Certification

  • Microsoft Certified: Azure Fundamentals
  • Python basic certification Hackerrank
  • Generative AI with langchain and Huggingface Udemy
  • Github Copilot Github

Timeline

Data Scientist

Ernst and Young GDS
08.2022 - Current

B TECH - Electronics and Electrical Engineer

KIIT UNIVERSITY

M Tech - Power Electronics

Swami Vivekananda University

PERSONAL PROJECTS

Resume Chatbot (01/2025 - 02/2025) Developed a chatbot leveraging open-source models for embeddings and LLMs to retrieve relevant resume sections by skill or experience. Implemented a FastAPI backend to expose REST endpoints for chat and document search, enabling seamless integration into web or HR systems, and supporting scalable, cloud-based deployment.

INTERESTS

Chess, Cricket,IOT and Robotics

NILAB NATH