Summary
Overview
Work History
Education
Skills
Certification
Languages
Awards
Personal Information
Timeline
Generic
Debasish Panda

Debasish Panda

Pune

Summary

Results-driven Technology Analyst with extensive experience at Infosys Pvt Ltd, specializing in Big Data solutions. Expertise in Hadoop and Spark has led to significant enhancements in system performance and optimized data processing efficiency. Proven track record of automating processes and fostering team knowledge through comprehensive documentation underscores strong problem-solving abilities. Committed to leveraging technical skills to drive innovation and deliver impactful results in dynamic environments.

Overview

4
4
years of professional experience
3
3
Certification

Work History

Technology Analyst | Big Data Engineer

Infosys Pvt Ltd
09.2021 - Current
  • Currently working in Big Data production, managing data pipelines using HDFS, Azure, and Jenkins.
  • Responsible for monitoring jobs, troubleshooting issues, maintaining system stability, and ensuring timely data delivery.
  • Handle deployments via Jenkins and manage data storage on HDFS.
  • Collaborate with teams to resolve incidents and optimize workflows for better performance and reliability in a 24/7 production environment.
  • Worked on the migration from IAAS to PAAS model on Azure.
  • Built and ran pipelines to convert the orc files on HDFS to parquet files on ADLS using Synapse analytics.
  • Worked on the reconciliation of data and automating the inner processes.
  • Developed various shell scripts to eliminate repetitive and manual tasks.
  • Previously Worked on an Open Banking project involving large, complex datasets from various sources.
  • Standardized data formats and performed ETL operations using Spark, with outputs stored in Hive and Phoenix.
  • Managed the Big Data platform built on Microsoft Azure IAAS and Cloudera stack.
  • Designed, developed, and deployed jobs to ingest data into storage systems like HDFS.
  • Created and managed Hive tables for ORC file storage and data analysis to meet business requirements.
  • Authored DDL and DML scripts, optimizing for performance and adherence to best practices.
  • Utilized Sterling for data imports and Control-M for workflow scheduling and job coordination.
  • Set up Hadoop environments for DEV and TEST as landing zones for input data.
  • Applied data transformations using Spark for batch data processing.
  • Ensured data quality, integrity, and performance optimization throughout the pipeline.
  • Key Achievements: Enhanced system performance through data optimization and integration best practices.
  • Successfully identified opportunities for implementing Big Data solutions, resulting in improved data processing efficiency.
  • Developed comprehensive documentation for design, testing, and deployment stages, improving team knowledge and project success rates.
  • In my role, I also worked on migrating enterprise data from on-prem HDFS to Azure Data Lake Storage as part of a cloud transformation program.
  • The migration involved moving from a Hadoop-centric infrastructure model to a scalable cloud-based data platform.
  • Apache Spark and PySpark were core to this implementation.
  • I designed and developed PySpark jobs to process large-scale datasets, perform data validation, and execute transformations post-ingestion.
  • Spark was used to efficiently handle high-volume data and ensure parallel processing across the cluster.
  • Since most of the source data was in ORC format, optimized for Hive, I implemented PySpark-based pipelines to convert ORC files into Parquet, which is more suitable for ADLS and downstream analytics.
  • During this process, I enforced schemas, handled nulls and duplicates, applied partitioning strategies, and optimized file sizes to improve query performance.
  • I also performed data reconciliation and quality checks using Spark, such as record count comparison between HDFS and ADLS, data completeness checks, and transformation validation to ensure consistency after migration.
  • From a performance perspective, I optimized Spark jobs by tuning partitions, leveraging lazy evaluation, and selecting appropriate write modes to reduce execution time and storage overhead.
  • The curated Parquet data was then exposed to downstream consumers for analytics and reporting.

Education

Bachelor of Science - Computer Science & Information Technology

Institute of Technical Education And Research
07-2021

Skills

  • Python
  • Spark
  • Hive
  • HBase
  • Hadoop
  • SQL
  • Teradata
  • Phoenix
  • Pyspark
  • Jenkins (CI/CD)
  • SSIS

Certification

  • AZ-900: Microsoft Azure Fundamentals
  • DP-900: Microsoft Azure Data Fundamentals
  • AZ-104: Microsoft Azure Administrator
  • AI-900: Microsoft Azure AI Fundamentals

Languages

English
Bilingual or Proficient (C2)
Hindi
Bilingual or Proficient (C2)
Odia
Bilingual or Proficient (C2)

Awards

Pursuit of Exceptional Performance for consistent contribution to production stability and timely issue resolution

Personal Information

  • Date of Birth: 05/05/99
  • Nationality: Indian

Timeline

Technology Analyst | Big Data Engineer

Infosys Pvt Ltd
09.2021 - Current

Bachelor of Science - Computer Science & Information Technology

Institute of Technical Education And Research
Debasish Panda