Work Preference
Summary
Overview
Work History
Education
Skills
Languages
Work Availability
Websites
Timeline
AccountManager

Akshay Manoharlal Joshi

Senior Data Engineer
Pune

Work Preference

Work Type

Full Time

Location Preference

On-SiteRemoteHybrid

Important To Me

Work-life balanceCompany CultureFlexible work hoursPersonal development programsHealthcare benefitsWork from home optionPaid sick leavePaid time offTeam Building / Company RetreatsCareer advancement

Summary

Senior Data Engineer with 6+ years of experience in building and optimizing scalable data pipelines using PySpark and AWS (EMR, S3, Athena, Glue). Proven expertise in Big Data, ETL development, and data warehousing, with hands-on experience in Hadoop ecosystem technologies. Skilled in designing end-to-end data solutions, improving data processing efficiency, and handling large-scale datasets across multiple domains. Strong problem-solving abilities with a focus on delivering reliable, high-quality data solutions.

Overview

7
7
years of professional experience

Work History

Senior Data Engineer

Accenture Pune (Client – NatWest Bank)
03.2023 - Current
  • Led the migration of legacy Oracle-based systems to AWS, converting complex SQL logic into optimized PySpark ETL pipelines, improving processing scalability, and reducing execution time by 25–35%.
  • Developed and managed ETL workflows using AWS Glue and EMR, processing large datasets efficiently, and reducing pipeline failures by 30%.
  • Designed and implemented a critical enterprise risk data table, enabling faster downstream consumption, and improving reporting efficiency by 20%.
  • Played a key role in bank acquisition data migration, ensuring 99.9% data accuracy while integrating savings and credit card datasets into core systems.
  • Built a custom data validation tool, reducing manual validation effort by 60–70%, and significantly improving data reliability.
  • Developed an automation framework to replicate production data into pre-production, reducing environment setup time by more than 50%.
  • Utilized AWS Athena for ad-hoc querying, reducing data analysis turnaround time by 40% for business users.
    Implemented workflow orchestration using AWS Step Functions, improving pipeline reliability and reducing job failures by 25%.
  • Contributed to the AWS EMR upgrade, ensuring a seamless transition with near-zero downtime and improved cluster performance.
  • Collaborated with cross-functional teams to deliver scalable data solutions in a secure enterprise environment.

Data Engineer

Amazon Development Centre Pune
07.2019 - 03.2023
  • Developed and maintained PySpark-based ETL pipelines, processing 200+ GB of incremental data daily, ensuring high availability and reliability.
  • Designed and optimized Spark jobs, reducing processing time by 20–30% through performance tuning and efficient resource utilization.
  • Built a data deduplication framework, improving data quality, and reducing duplicate records by over 95%.
  • Designed and managed Hive tables, improving query performance, and enabling faster analytics for large datasets.
  • Created and managed HBase tables using Phoenix, integrating data from PostgreSQL RDS, using Spark SQL.
  • Improved storage efficiency by implementing Parquet format with Snappy compression, reducing storage costs by 30%–40%.
  • Enabled seamless data ingestion from multiple systems, improving pipeline reliability, and reducing data latency.
  • Delivered robust data pipelines supporting critical business use cases in a high-scale e-commerce environment.

Education

Bachelor of Engineering - Mechanical Engineering

University of Pune
Nashik, India
06-2019

Skills

Programming & Query Languages: Python, SQL, PySpark, Shell Scripting

Big Data Technologies: Apache Spark (Core, SQL), Hadoop (HDFS, MapReduce), Hive, HBase

Cloud & AWS Services: AWS EMR, AWS Glue (Jobs, Crawlers), AWS Athena, Amazon S3, AWS Step Functions, EC2, RDS, DynamoDB

Data Engineering & Processing: ETL Pipeline Development, Data Pipeline Orchestration, Batch Processing, Data Transformation, Data Ingestion, Data Validation

Workflow Orchestration & Scheduling: Apache Airflow

Databases & Data Storage: Oracle, PostgreSQL, MySQL, DynamoDB, HBase

Performance Optimization: Spark Optimization (Partitioning, Caching, Broadcast Joins), Query Optimization

File Formats & Storage: Parquet, ORC, CSV, JSON (Snappy Compression)

Languages

English
Advanced (C1)
Hindi
Bilingual or Proficient (C2)
Marathi
Bilingual or Proficient (C2)

Work Availability

monday
tuesday
wednesday
thursday
friday
saturday
sunday
morning
afternoon
evening
swipe to browse

Timeline

Senior Data Engineer

Accenture Pune (Client – NatWest Bank)
03.2023 - Current

Data Engineer

Amazon Development Centre Pune
07.2019 - 03.2023

Bachelor of Engineering - Mechanical Engineering

University of Pune
Akshay Manoharlal JoshiSenior Data Engineer