Debasish Panda

Summary

Results-driven Technology Analyst with extensive experience at Infosys Pvt Ltd, specializing in Big Data solutions. Expertise in Hadoop and Spark has led to significant enhancements in system performance and optimized data processing efficiency. Proven track record of automating processes and fostering team knowledge through comprehensive documentation underscores strong problem-solving abilities. Committed to leveraging technical skills to drive innovation and deliver impactful results in dynamic environments.

Overview

4

years of professional experience

3

Certification

Work History

Technology Analyst | Big Data Engineer

Infosys Pvt Ltd

09.2021 - Current

Currently working in Big Data production, managing data pipelines using HDFS, Azure, and Jenkins.
Responsible for monitoring jobs, troubleshooting issues, maintaining system stability, and ensuring timely data delivery.
Handle deployments via Jenkins and manage data storage on HDFS.
Collaborate with teams to resolve incidents and optimize workflows for better performance and reliability in a 24/7 production environment.
Worked on the migration from IAAS to PAAS model on Azure.
Built and ran pipelines to convert the orc files on HDFS to parquet files on ADLS using Synapse analytics.
Worked on the reconciliation of data and automating the inner processes.
Developed various shell scripts to eliminate repetitive and manual tasks.
Previously Worked on an Open Banking project involving large, complex datasets from various sources.
Standardized data formats and performed ETL operations using Spark, with outputs stored in Hive and Phoenix.
Managed the Big Data platform built on Microsoft Azure IAAS and Cloudera stack.
Designed, developed, and deployed jobs to ingest data into storage systems like HDFS.
Created and managed Hive tables for ORC file storage and data analysis to meet business requirements.
Authored DDL and DML scripts, optimizing for performance and adherence to best practices.
Utilized Sterling for data imports and Control-M for workflow scheduling and job coordination.
Set up Hadoop environments for DEV and TEST as landing zones for input data.
Applied data transformations using Spark for batch data processing.
Ensured data quality, integrity, and performance optimization throughout the pipeline.
Key Achievements: Enhanced system performance through data optimization and integration best practices.
Successfully identified opportunities for implementing Big Data solutions, resulting in improved data processing efficiency.
Developed comprehensive documentation for design, testing, and deployment stages, improving team knowledge and project success rates.
In my role, I also worked on migrating enterprise data from on-prem HDFS to Azure Data Lake Storage as part of a cloud transformation program.
The migration involved moving from a Hadoop-centric infrastructure model to a scalable cloud-based data platform.
Apache Spark and PySpark were core to this implementation.
I designed and developed PySpark jobs to process large-scale datasets, perform data validation, and execute transformations post-ingestion.
Spark was used to efficiently handle high-volume data and ensure parallel processing across the cluster.
Since most of the source data was in ORC format, optimized for Hive, I implemented PySpark-based pipelines to convert ORC files into Parquet, which is more suitable for ADLS and downstream analytics.
During this process, I enforced schemas, handled nulls and duplicates, applied partitioning strategies, and optimized file sizes to improve query performance.
I also performed data reconciliation and quality checks using Spark, such as record count comparison between HDFS and ADLS, data completeness checks, and transformation validation to ensure consistency after migration.
From a performance perspective, I optimized Spark jobs by tuning partitions, leveraging lazy evaluation, and selecting appropriate write modes to reduce execution time and storage overhead.
The curated Parquet data was then exposed to downstream consumers for analytics and reporting.

Education

Bachelor of Science - Computer Science & Information Technology

Institute of Technical Education And Research

07-2021

Skills

Python
Spark
Hive
HBase
Hadoop
SQL

Teradata
Phoenix
Pyspark
Jenkins (CI/CD)
SSIS

Certification

AZ-900: Microsoft Azure Fundamentals
DP-900: Microsoft Azure Data Fundamentals
AZ-104: Microsoft Azure Administrator
AI-900: Microsoft Azure AI Fundamentals

Languages

English

Bilingual or Proficient (C2)

Hindi

Bilingual or Proficient (C2)

Odia

Bilingual or Proficient (C2)

Awards

Pursuit of Exceptional Performance for consistent contribution to production stability and timely issue resolution

Personal Information

Date of Birth: 05/05/99
Nationality: Indian

Timeline

Technology Analyst | Big Data Engineer

Infosys Pvt Ltd

09.2021 - Current

Bachelor of Science - Computer Science & Information Technology

Institute of Technical Education And Research

Summary

Overview

Work History

Technology Analyst | Big Data Engineer

Education

Bachelor of Science - Computer Science & Information Technology

Skills

Certification

Languages

Awards

Personal Information

Timeline

Technology Analyst | Big Data Engineer

Bachelor of Science - Computer Science & Information Technology

Similar Profiles

Aakaash NAakaash N

Bhemeswara Rao AnkireddyBhemeswara Rao Ankireddy

Narendra TNarendra T

Snigdha DasSnigdha Das