Summary
Overview
Work History
Education
Skills
Projects
Timeline
Generic

Aritra Gupta Baksi

Jaipur

Summary

Dynamic Software Engineer with a proven track record at Celebal Technologies, specializing in high-throughput data pipelines and real-time processing. Expert in PySpark and Agile methodologies, I significantly reduced data discrepancies and enhanced team performance through mentorship. Passionate about leveraging innovative solutions to drive impactful data engineering initiatives.

Overview

2
2
years of professional experience

Work History

Software Engineer, Big Data Engineer

Celebal Technologies Pvt. Ltd
02.2023 - Current
  • Developed and deployed high-throughput real-time and batch pipelines using Apache Kafka, Databricks, and PySpark, enabling scalable ingestion and SCD Type-2 transformation with downstream write to CosmosDB.
  • Led the migration of Hive Metastore to Unity Catalog for 1,400+ Databricks jobs and assets, enabling fine-grained access control through Table ACLs, improving data sharing with Delta Sharing, and boosting performance via Parquet-to-Delta optimization—enhancing overall data governance and auditability.
  • Implemented data quality frameworks, log analytics, and automated validation tests covering millions of records, reducing data discrepancies by 70%, and issue resolution time by 40%.
  • Optimized workloads with advanced techniques such as Z-ordering, data skipping, file compaction, cluster autoscaling, and query plan tuning, improving processing efficiency by 60%.
  • Mentored and guided junior developers through code reviews, knowledge-sharing sessions, and actionable feedback, fostering skill development, and improving overall team performance.
  • Led Agile Pods, ensuring on-time project delivery, efficient resource management, and successful execution of high-impact data engineering initiatives.

Education

BTech - Computer Science and Engineering

DR. B C Roy Engineering College
Durgapur
05.2023

High Secondary - PCM

Hetampur Raj High Scool
06-2019

Skills

  • Languages: Python, PySpark, SQL, C
  • Data engineering: Apache Spark, Delta Lake, ETL, data pipelines, data modeling, adaptive query execution (AQE), Z-ordering
  • Databases and storage: Cosmos DB, ADLS, SQL Server, data warehousing, Unity Catalog
  • Cloud and platforms: Databricks, Azure Data Factory, DevOps, Synapse, DBT
  • Streaming and messaging: Kafka, real-time processing
  • Soft skills: problem-solving, team leadership, cross-functional collaboration, agile methodology
  • Certifications: Databricks Certified Data Engineer Associate, AZ-900, DP-900

Projects

Azure Synapse to Databricks Migration
Data Engineer | March 2023 – Dec 2023

  • Migrated enterprise data warehouse from Azure Synapse to Databricks
  • Converted views, stored procedures, and tables into modular Databricks workflows
  • Built automated validation scripts to ensure data integrity across platforms
  • Enabled secure access through Unity Catalog for Dev, QA, and Prod
  • Documented logic and workflows to support handover and production readiness

Real-time Data Ingestion with Kafka, Databricks & CosmosDB
Data Engineer | Jan 2024 – Aug 2024

  • Built real-time pipelines in Databricks using Kafka (CDC and full-load) with SCD Type-2 logic
  • Persisted curated outputs to CosmosDB for low latency access
  • Integrated audit layers, centralized logging, and automated pipeline deployments via Databricks Workflows
  • Managed ingestion with Apache and Confluent Kafka across environments

Unity Catalog Migration & Databricks Environment Modernization
Senior Data Engineer | Aug 2024 – Feb 2025

  • Migrated 1,400+ assets from Hive Metastore to Unity Catalog with zero downtime
  • Automated Parquet-to-Delta conversion and deep table cloning for job consistency
  • Set up fine-grained access controls, Delta Sharing, and RBAC via Unity Catalog
  • Integrated AWS Glue Catalog to maintain metadata continuity
  • Enabled centralized governance with audit logging and lineage tracking

Unity Catalog Migration & Databricks Environment Modernization
Lead Data Engineer | March 2025 – June 2025

  • Migrated over 800 ADF pipelines and made them Unity Catalog compatible
  • Migrated Hive Metastore tables to Unity Catalog, and enabled Liquid clustering on top of it
  • Set up fine-grained access controls, Delta Sharing, and RBAC via Unity Catalog
  • Led an 8+ member team from the technical side
  • Enabled centralized governance with audit logging and lineage tracking

Timeline

Software Engineer, Big Data Engineer

Celebal Technologies Pvt. Ltd
02.2023 - Current

BTech - Computer Science and Engineering

DR. B C Roy Engineering College

High Secondary - PCM

Hetampur Raj High Scool
Aritra Gupta Baksi