Summary
Overview
Work History
Education
Skills
Timeline
Generic

Shilpa Jadaun

Jaipur,Rajasthan

Summary

Having 5.2 years of Experience in IT in various roles and technologies being Big Data Engineer from last 4 years.

Big Data professional with strong focus on data processing, data modification, advance analytics ,workflow automation and ETL processes. Skilled in Hadoop, Spark, and Python, with significant experience in designing and implementing scalable data solutions. Pursuing full-time role that presents professional challenges and leverages interpersonal skills, effective time management, and problem-solving expertise.

Overview

5
5
years of professional experience

Work History

Big Data Engineer

Hol Infosolutions Pvt Ltd
12.2020 - Current
  • Executed large-scale data processing using PySpark and AWS Hive on AWS EMR
  • Built data pipelines with AWS EMR and PySpark, reading from and writing to AWS S3
  • Designed Spark jobs to process data stored in AWS S3 and AWS Hive
  • Used AWS Hive to perform SQL queries on datasets processed by Spark on AWS EMR
  • Debugged PySpark code to resolve errors and improve efficiency on AWS EMR
  • Created and managed RDDs (Resilient Distributed Datasets) for data transformations
  • Utilized Data Frames for structured data manipulation and analysis
  • Worked with Spark's data serialization formats (Avro, Parquet, JSON, etc.)
  • Automated Spark job submission on AWS EMR using AWS Step Functions for consistent execution
  • Integrated AWS S3 with PySpark for intermediate storage in large-scale AWS EMR workflows
  • Used AWS Step Functions to parallelize PySpark jobs for faster execution on AWS EMR clusters
  • Designed and implemented ETL processes using Spark
  • Collaborated with data architects to design data storage solutions
  • Worked with Spark Data Frames for feature engineering
  • Integrated Spark with data lakes such as AWS S3, HDFS, EMR
  • Implemented Spark partitioning and caching strategies
  • Implemented data partitioning and shuffling strategies for optimization
  • Experience in handling hive schema evolution with Avro file format
  • Skilled in handling semi structured/serialised data processing using hive (AVRO,PAQUET,ORC)
  • Experienced in efficiently using Hive managed and external table with respect to the business requirement
  • Strong understanding of Hive serialized data processing performance optimization techniques, such as using columnar storage, data partitioning, and indexing, and their trade-offs in terms of query performance and resource utilization
  • Experienced in using Sqoop to import and export data from and to cloud-based data storage services such as Amazon S3
  • Developed Sqoop scripts to perform data transformations and data cleansing during data import from external databases into Hadoop clusters
  • Deployed PySpark jobs on AWS EMR clusters provisioned with specific AWS EC2 instances for cost optimization
  • Tuned PySpark jobs on AWS EMR to handle large-scale data stored in AWS S3
  • Implemented data aggregation and transformation in PySpark jobs on AWS EMR
  • Used AWS Hive to query structured data within AWS EMR jobs
  • Managed Spark job orchestration on AWS EMR using Airflow
  • Proficient in configuring Sqoop to import and export data using custom SQL queries and stored procedures
  • Proficient in writing Sqoop commands to transfer data between Hadoop and various databases such as MySQL, and SQL Server
  • Implemented efficient joins in PySpark jobs on AWS EMR to process relational data stored in AWS S3
  • Monitored AWS EMR cluster performance and optimized resource usage for long-running PySpark jobs

Mainframe System Engineer

Infosys Limited
02.2016 - 03.2017
  • Lead daily mainframe operations ensuring high availability and smooth execution of batch jobs, and application processing using COBOL
  • Provided 24/7 support for critical mainframe systems, ensuring system uptime and quick recovery during failures
  • Employed JCL tuning and COBOL code review to eliminate inefficient loops and enhance database query performance

Education

Bachelor of Technology -

Poornima College Of Engineering
Jaipur
05.2015

XII -

Indian Public School
Jaipur
01.2011

Skills

  • Hadoop(HDFS)
  • Sqoop
  • Apache Spark
  • Cloudera
  • MySQL
  • Hive
  • Python
  • SQL
  • PySpark programming
  • Apache Kafka
  • Amazon web services(AWS)
  • ETL implementation and Processing

Timeline

Big Data Engineer

Hol Infosolutions Pvt Ltd
12.2020 - Current

Mainframe System Engineer

Infosys Limited
02.2016 - 03.2017

Bachelor of Technology -

Poornima College Of Engineering

XII -

Indian Public School
Shilpa Jadaun