Summary
Overview
Work History
Education
Skills
Certification
Skills
Personal Information
Timeline
Generic
Dipak Rout

Dipak Rout

Bangalore

Summary

IT professional with 6+ years of experience in the Spark and big data ecosystem, including Hive, Sqoop, Hadoop, Impala, and Python PySpark. Skilled in Scala, Java, MSSQL Server, and AWS, as well as Azure cloud. Seeking to broaden horizons in the field of Big Data and apply strong interpersonal and technical skills in a collaborative team environment. Committed to contributing to organizational growth and achieving job satisfaction.

Overview

6
6
years of professional experience
1
1
Certification

Work History

Senior Data Engineer

WIPRO
BANGALORE
06.2025 - Current
  • Designed and implemented scalable Hive data models for Consumer Vehicle Lending (CVL) domain to support analytics and downstream reporting.
  • Built and maintained end-to-end data ingestion pipelines using PySpark and Hive to load data from Oracle and Teradata into production Hive tables.
  • Developed high level HQL from SaaS code to load data to HIVE.
  • Performed ETL data validation and quality checks, ensuring data accuracy, completeness, and consistency before production loads.
  • Conducted regression testing and production data verification, reducing data discrepancies and improving deployment reliability.
  • Managed Hyper RDBMS (HyRi) ingestion processes and optimized Hive table performance using partitioning and data modeling best practices.
  • Worked in Linux environment with Python for automation and operational support.
  • Validated Hive/ETL jobs in lower environments (DEV/UAT) with structured test cases and sample data validation.
  • Automated build and deployment pipelines using Jenkins, enabling controlled and repeatable releases.
  • Used Ansible for configuration management and deployment of Hive scripts, HQL, and ETL components across environments.
  • Monitored Jenkins build logs, resolved deployment failures, and ensured successful promotion of code to UAT and Production.
  • Followed version control and release management best practices for smooth CI/CD operation.
  • Architected and implemented a real-time fraud detection platform using Apache Kafka and Spark Structured Streaming, processing 5–10M+ financial transactions per day with
  • Designed scalable event-driven data pipelines and Medallion Data Lake architecture (Raw → Cleansed → Curated) on S3 using partitioned Parquet, optimizing performance and enabling schema evolution.
  • Developed advanced fraud detection logic including velocity checks, geo-location anomaly detection, merchant risk scoring, and behavioral feature engineering integrated with real-time ML scoring services.

Data Engineer

Accenture
02.2024 - Current
  • Prepared XML and JSON ingestion framework using pyspark technology
  • Optimized data processing by implementing efficient ETL pipelines and streamlining database design.
  • Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability.
  • Worked on flattening for complex type structure
  • Created the complex type hive table for loading the complex ingestion data like XML and JSON
  • Created the flattened hive and impala views for client side query on top of complex type hive table
  • Created the compaction script in scala spark for small Hadoop files
  • Created too many python script for mailing purpose, deleting the old files, automation
  • Handled the autosys tool for daily ingestion job
  • Implemented parallelization in ingestion framework for better performance
  • Developed Glue ETL job for batch processing of Data from S3 as source and Loaded the transformed data to Redshift Serverless Cluster
  • Created automated pipeline to ingest Static batch data, Incremental batch data and schema drift batch data to Redshift.
  • Implemented SCD 2 for Incremental load.
  • Implemented SNS for Schema Evolution Data.
  • Integrated multiple files based on the business requirement from AWS S3 and published to multiple vendors.
  • Handled Schema evolution using Glue Data catalog during Glue ETL job.
  • Worked on creating complex SQL queries for data extraction, Transformation and Loading (ETL) from different data sources
  • Migrated on-premises data pipelines to AWS, leveraging S3 and Redshift to reduce storage cost.
  • Worked on Spark job optimization using salting techniques,Enabling Spark AQE and speculative execution.
  • Worked on ingesting salesforce data to Hive.

Analyst

TCS
Bangalore
02.2020 - 02.2024
  • Experience in preparing Business & Functional requirement documents.
  • Expertise in database modeling, data mapping, ETL, Data Quality management, Data analysis and requirements gathering, SQL and reporting process.
  • Experience in Apache spark and python programming
  • Experience in developing data processing tasks using pyspark such as reading data from external sources, merge data, perform data enrichment and load in to target data destination.
  • Experience on AWS ecosystem, IAM, AWS S3 storage, AWS glue, Athena, Redshift.
  • Good knowledge on Hadoop, Sqoop and hive
  • SQL Server, T-SQL Experience, Joins, Data Warehousing, Data Modeling, OLTP, OLAP.
  • Handling seven Databases with the help of MSSQL Management studio.
  • Excel - Hlookup, Vlookup, pivots and other advanced functions.
  • Worked on optimizing Spark Jobs.

Education

B-Tech - Mechanical Engineering

IGIT, SARANG
Dehenkanal Odisha
05-2019

Skills

,,,,,,,,,,,,,,,,,,,,,,

  • Spark
  • Pyspark
  • Python
  • SQL
  • Impala
  • Sqoop
  • Hive
  • Hadoop
  • MSSQL Server
  • MySQL
  • Oracle
  • Libraries: numpy,pandas,Boto3,FastParquet
  • Linux bash Scripting
  • core Java
  • Scala
  • AWS S3
  • AWS Glue
  • AWS Redshift
  • AWS step functions
  • Athena
  • Data governance
  • Data Quality Checks
  • Spark framework
  • Performance tuning
  • Big data processing
  • Data warehousing
  • Data modeling
  • Data pipeline design
  • ETL development

Certification

Azure AZ-900, Completed

Skills

Autosys

eclipse

Ansible 

Jenkins

Bitbucket

WinScp

JIRA

Personal Information

Timeline

Senior Data Engineer

WIPRO
06.2025 - Current

Data Engineer

Accenture
02.2024 - Current

Analyst

TCS
02.2020 - 02.2024

B-Tech - Mechanical Engineering

IGIT, SARANG
Dipak Rout