
Data Engineer with 3 years of experience building scalable ETL and data processing pipelines across GCP and Hadoop environments. Strong in SQL, Python, BigQuery, Spark, and microservice-based workflow orchestration, with hands-on experience in privacy-compliant data workflows for PII scanning, masking, and deletion. Proven impact in reducing Spark runtime by 20%+, lowering scanning and deletion costs by $5,000+ per month, and enabling self-service onboarding across 100+ datasets. Experienced in FastAPI-based microservices, event-driven architectures, CI/CD-supported releases, data quality, production support, and stakeholder collaboration.
LLM-based Data Operation Chatbot
Languages: Python, SQL, Java (Intermediate)
Cloud & Big Data: GCP (BigQuery, GCS, Dataproc Serverless), Hadoop, Hive, Apache Spark, AWS (EC2, S3, Lambda, SNS)
Databases: BigQuery, PostgreSQL, MongoDB, Pinecone, Weaviate, FAISS
Data Engineering: ETL Pipelines, Data Processing, Data Modeling, Dataframe-Based Transformations, Microservices
API & Frameworks: FastAPI, Apache Airflow
Messaging & Distributed Systems: Kafka
DevOps & Deployment: Docker, Kubernetes, Git, CI/CD, YAML-based deployment configuration, Postman, JIRA
Analytics & Visualization: Pandas, NumPy, Tableau, Power BI, Matplotlib, seaborn, scikit-learn, PyTorch
AI/ML: Machine learning, Deep learning, NLP, LLMs, RAG Pipelines, LangChain, LangGraph, Vector Databases
Ekeeda - School of Data Science
Ekeeda - School of Data Science
Data Analytics with python
Customer Analysis Dashboard | Tableau
HR Analytics Dashboard | Power BI