Summary
Overview
Work History
Education
Skills
Professional Preface
Disclaimer
Technical Profile
Certification
Timeline
Generic
Vishal Tak

Vishal Tak

Senior Data Engineer
Pune

Summary

Results-driven Senior Data Engineer with 4+ years of experience in Big Data technologies, Cloud computing (Azure), and large-scale data migration projects. Skilled in Hadoop, Apache Ozone, Spark, Azure Databricks, ADF, with expertise in ETL pipeline development, data processing, and cloud solutions. Recently certified as a Databricks Certified Data Engineer Associate, seeking an opportunity to drive data-driven solutions and optimize large-scale data architectures.

Overview

6
6
years of professional experience
18
18
years of post-secondary education
1
1
Certification

Work History

Sr. Data Engineer

Atgeir Solutions Pvt Ltd
Pune
02.2023 - Current
  • Developed Data Migration Utility to facilitate data transfer from HDFS to Apache Ozone
  • Created Scanner scripts to analyze directory structures and gather statistics
  • Implemented Data Validation Utility using MD5 checksum for file-level validation
  • Conducted TPC-DS benchmarking for Hive, Impala, and Spark to evaluate performance
  • Documented Hadoop cluster statistics, benefits of Apache Ozone, and designed storage structures on Confluence
  • Generated diagrams to visualize the storage architecture on Ozone, including volumes and buckets
  • Utilized the OMD dashboard for data migration from HDFS to Ozone
  • Developed a Scanner script in shell script to identify small files for optimized storage
  • Developed a FileMerger script in Python to Merge the identified small files for optimized storage
  • Improved data management efficiency and reduced storage overhead through the development of specialized utilities
  • Ensured data integrity and consistency using robust mechanisms
  • Tested utilities with dummy data to ensure functionality

Data Engineer

Feedoozy Technologies Pvt Ltd
Remote
09.2019 - 02.2023

Working for data warehouse team in retail domain

  • Build automated data pipeline in AWS for the client to save their time and get desired processing speed
  • Creating Data pipeline for migrating data from MySQL SERVER to S3 using AWS DMS
  • Responsible for applying data cleaning techniques to correct and remove corrupt data, to improve data quality
  • Using AWS Glue jobs to process raw data
  • Applying various PySpark transformation to validate data in Glue jobs
  • Responsible for writing desired output to Redshift
  • Worked with Parquet file format to leverage the storage
  • Developed Lambda trigger jobs for source upstream data so that specific Glue job will automatically start for processing
  • Developed Glue jobs as per client requirements and writing data to S3 and help the client creating Master tables using AWS Athena

Education

B.E - CSE

Dr. BABASAHEB AMBEDKAR MARATHWADA UNIVERSITY
Chhatrapati Sambhajinagar
08.2014 - 08.2018

DIPLOMA - CSE

SYP
Chhatrapati Sambhajinagar
08.2011 - 08.2014

MATRICULATION - SSC

MUKUL MANDIR HIGH SCHOOL
Chhatrapati Sambhajinagar
01.2001 - 08.2011

Skills

Azure Databricks

Azure Data Factory

PySpark

SQL

Python

Apache Ozone

Delta Lake

Glue

RDS

undefined

Professional Preface

  • 4.5+ years of extensive experience as a Data Engineer, specializing in Big Data, Cloud Computing (AWS & Azure), and large-scale ETL pipelines.
  • Proficient in developing automated scripts for data migration, including Shell scripting, Python-based automation, and Spark-based ETL workflows.
  • Hands-on experience with Azure Databricks for big data processing, implementing Delta Lake architectures for scalable, ACID-compliant data lakes.
  • Expertise in Azure Data Factory (ADF) to design data ingestion, transformation, and orchestration pipelines, integrating structured & unstructured data from diverse sources.
  • Experience in writing optimized data extraction logic in PySpark, enhancing query performance and data processing efficiency.
  • Built end-to-end ETL pipelines in both AWS (Glue, Redshift, RDS, S3, Lambda) and Azure (ADF, Databricks, Synapse) environments.
  • Strong SQL expertise, including writing complex queries, stored procedures, and optimizing execution plans for high-performance data retrieval.
  • Worked with large-scale data ingestion, transformation, and processing, integrating data from HDFS, Apache Ozone, Amazon S3, and Azure Data Lake Storage (ADLS).
  • Experienced in Spark architecture, including Spark-Core, Spark-SQL, and Spark Streaming, for real-time and batch data processing.
  • Proficient in file formats like Parquet, AVRO, CSV, ORC, and compression techniques to optimize storage and query performance.
  • Expertise in Data Cleaning, Data Quality Management, and Performance Tuning, ensuring high-quality data pipelines.
  • Implemented partitioning, bucketing, and indexing strategies in Hive, Delta Lake, and Redshift, improving data query speeds by 40%.
  • Strong understanding of cloud security best practices, IAM roles, and data governance strategies in AWS & Azure environments.
  • Experience working in Agile/Scrum environments, collaborating with cross-functional teams to develop scalable data solutions.

Disclaimer

I hereby declare that the particulars of information and facts stated herein above are true, correct and complete to the best of my knowledge and belief.

Technical Profile

  • Big Data & Cloud: Hadoop, Apache Ozone, Spark, Azure Databricks, AWS Glue, ADF (Azure Data Factory)
  • Programming Languages: Python, SQL, Shell Scripting
  • Cloud Technologies: Azure (ADF, Databricks, Synapse), AWS (Glue, RDS, DMS, S3)
  • Data Processing: PySpark, Hive, SQL, Delta Lake
  • File Formats & Compression: Parquet, AVRO, CSV, ORC
  • Version Control & Tools: Git, Confluence, Linux, Jupyter Notebook, PyCharm, VS code
  • Methodologies: Agile/Scrum

Certification

Databricks Certified Data Engineer Associate

Timeline

Databricks Certified Data Engineer Associate

02-2025

Sr. Data Engineer

Atgeir Solutions Pvt Ltd
02.2023 - Current

Data Engineer

Feedoozy Technologies Pvt Ltd
09.2019 - 02.2023

B.E - CSE

Dr. BABASAHEB AMBEDKAR MARATHWADA UNIVERSITY
08.2014 - 08.2018

DIPLOMA - CSE

SYP
08.2011 - 08.2014

MATRICULATION - SSC

MUKUL MANDIR HIGH SCHOOL
01.2001 - 08.2011
Vishal TakSenior Data Engineer