Summary
Overview
Work History
Education
Skills
Interests
Timeline
Generic
Piyush Dwivedi

Piyush Dwivedi

Gurugram

Summary

Results-driven Bigdata Engineer with nearly four years of experience in Agile environments. Adept at building scalable CI/CD pipelines, cloud infrastructure, and big data solutions. Proficient in AWS services, Spark, Hive, and data pipeline development. Known for optimizing performance, enhancing system reliability, and cost-efficiency.

Overview

4
4
years of professional experience

Work History

Associate Engineer

BT Global Business Services
Gurugram
06.2021 - Current
  • Contributed to Hive table creation, data ingestion, and data writing processes for downstream analytics.
  • Participated in the performance tuning of Sqoop, Hive, and Spark (PySpark) jobs, with a focus on reducing Hive query latency and overall project costs.
  • Managed Resilient Distributed Datasets (RDDs), and applied transformation logic for efficient distributed data processing.
  • Collaborated on Amazon EMR cluster development and testing environments, utilizing Sqoop for data migration between HDFS and Hive.
  • Designed and implemented batch-processing data pipelines on Amazon EMR using PySpark for large-scale ETL workflows.
  • Utilized PySpark DataFrame APIs for structured data analysis, cleansing, and feature engineering in production pipelines.
  • Administered EC2 instances and EMR clusters, ensuring job deployments aligned with defined scalability and performance metrics.
  • Processed and optimized serialized data formats, including Avro, Parquet, and ORC, for streamlined data handling using PySpark.
  • Employed Sqoop for data import/export operations between relational databases and Amazon S3-based cloud storage.
  • Optimized PySpark jobs to eliminate data duplication and enable seamless integration with other Big Data components.
  • Collaborated with data science teams on feature selection, and built custom PySpark applications for ad targeting and customer lifetime value prediction.
  • Conducted job profiling and tuning of PySpark applications to ensure efficient memory usage and execution times.

Education

Bachelor of Technology (B.Tech) - Electrical Engineering

United College of Engineering & Management
Prayagraj
05-2019

Intermediate Certificate - Science

Mahatma Hansraj Modern School
Jhansi
05-2015

Skills

  • Big data applications
  • Apache Spark
  • AWS EMR management
  • Spark SQL
  • Batch processing
  • Apache Airflow
  • PySpark
  • Azure Databricks
  • Hive

Interests

  • Exploring new, adventurous places
  • Cinema
  • Cooking
  • Reading
  • Music

Timeline

Associate Engineer

BT Global Business Services
06.2021 - Current

Bachelor of Technology (B.Tech) - Electrical Engineering

United College of Engineering & Management

Intermediate Certificate - Science

Mahatma Hansraj Modern School
Piyush Dwivedi