Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic
Dipak Rout

Dipak Rout

Data Engineer
Bangalore

Summary

IT professional with 5+ years of experience in the Spark and big data ecosystem, including Hive, Sqoop, Hadoop, Impala, and Python PySpark. Skilled in Scala, Java, MSSQL Server, and AWS, as well as Azure cloud. Seeking to broaden horizons in the field of Big Data and apply strong interpersonal and technical skills in a collaborative team environment. Committed to contributing to organizational growth and achieving job satisfaction.

Overview

5
5
years of professional experience
1
1
Certification

Work History

Data Engineer

Accenture
02.2024 - Current
  • Prepared XML and JSON ingestion framework using pyspark technology
  • Optimized data processing by implementing efficient ETL pipelines and streamlining database design.
  • Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability.
  • Worked on flattening for complex type structure
  • Created the complex type hive table for loading the complex ingestion data like XML and JSON
  • Created the flattened hive and impala views for client side query on top of complex type hive table
  • Created the compaction script in scala spark for small Hadoop files
  • Created too many python script for mailing purpose, deleting the old files, automation
  • Handled the autosys tool for daily ingestion job
  • Implemented parallelization in ingestion framework for better performance
  • Developed Glue ETL job for batch processing of Data from S3 as source and Loaded the transformed data to Redshift Serverless Cluster
  • Created automated pipeline to ingest Static batch data, Incremental batch data and schema drift batch data to Redshift.
  • Implemented SCD 2 for Incremental load.
  • Implemented SNS for Schema Evolution Data.
  • Integrated multiple files based on the business requirement from AWS S3 and published to multiple vendors.
  • Handled Schema evolution using Glue Data catalog during Glue ETL job.
  • Worked on creating complex SQL queries for data extraction, Transformation and Loading (ETL) from different data sources
  • Migrated on-premises data pipelines to AWS, leveraging S3 and Redshift to reduce storage cost.
  • Worked on Spark job optimization using salting techniques,Enabling Spark AQE and speculative execution.
  • Worked on ingesting salesforce data to Hive.

Analyst

TCS
Bangalore
02.2020 - 02.2024
  • Experience in preparing Business & Functional requirement documents.
  • Expertise in database modeling, data mapping, ETL, Data Quality management, Data analysis and requirements gathering, SQL and reporting process.
  • Experience in Apache spark and python programming
  • Experience in developing data processing tasks using pyspark such as reading data from external sources, merge data, perform data enrichment and load in to target data destination.
  • Experience on AWS ecosystem, IAM, AWS S3 storage, AWS glue, Athena, Redshift.
  • Good knowledge on Hadoop, Sqoop and hive
  • SQL Server, T-SQL Experience, Joins, Data Warehousing, Data Modeling, OLTP, OLAP.
  • Handling seven Databases with the help of MSSQL Management studio.
  • Excel - Hlookup, Vlookup, pivots and other advanced functions.
  • Worked on optimizing Spark Jobs.

Education

B-Tech - Mechanical

IGIT Sarang
Dhenkanal Odisha
06.2025 - 05.2025

Skills

SQL

Python

Pyspark

Scala

core Java

Linux bash Scripting

Libraries: numpy,pandas,Boto3,FastParquet

Oracle

MySQL

MSSQL Server

MongoDb

Hadoop

Hive

Sqoop

Impala

Spark

ETL development

Data pipeline design

Data modeling

Data warehousing

Big data processing

Performance tuning

Spark framework

Data governance

Certification

Azure AZ-900, Completed

Timeline

B-Tech - Mechanical

IGIT Sarang
06.2025 - 05.2025

Data Engineer

Accenture
02.2024 - Current

Analyst

TCS
02.2020 - 02.2024
Dipak RoutData Engineer