Summary
Overview
Work History
Education
Skills
Interests
Timeline
Generic

Paras Arora

Data Engineer
Bangalore

Summary

Data Engineer with 5.5 years of working experience on designing and implementing end-to end applications using AWS services and Hadoop Ecosystem.

  • Proficiency in developing, deploying, and debugging cloud-based applications using AWS Lambda, AWS CLI, AWS SDK with Python-Boto3, EMR, Kinesis, Glue, DynamoDB, S3, RDS , Step Functions and other core AWS services.
  • Built CI/CD pipeline as part of Automation setup using AWS Cloudformation , AWS CodeBuild and AWS CodePipeline.
  • Worked in complete Software Development Life Cycle (analysis, design, development, testing, implementation and support) using Agile Methodologies.
  • Experience in designing both time driven and data driven automated workflows/jobs using Talend Open Studio for ETL Experienced in using Integrated Development environments like Eclipse, PyCharm, Visual Studio, NetBeans and Talend Open Studio
  • Practical knowledge and good coding skills on Python, Shell, SQL.
  • Extensive experience in all stages of Data Engineering from loading data to different source systems, transforming and scheduling the data jobs.

Overview

5
5
years of professional experience
4
4
years of post-secondary education

Work History

Data Engineer

GE Aviation
Bangalore
07.2020 - Current

PROJECT- SHORELINE

  • Developed and implemented an automated data and self-service data ecosystem using AWS platform and services like S3 , SNS , DynamoDB , EMR with Hudi , EventBridge , Lambda , Redshift etc.
  • Handled different data ingestion patterns and loaded the data into Redshift with the custom-built pipeline.
  • Worked on Hudi integrated with EMR for supporting inserting, updating, and deleting data through spark
  • Utilize CI/CD Processes using GIT , AWS CFT , Code build , Code pipeline , to automate development, deployment and testing.
  • Created different micro-services and operations lambdas using python and AWS services , libraries and tool for the better observability and better performance.
  • Collaborate with business users, architects and developers to build solutions that meet business needs
  • Operate in Agile framework, creating user stories from and tasks from customer requirements to track project's progress

Data Engineering Specialist

GE Aviation
Bangalore
05.2019 - 07.2020

PROJECT : STORM

  • Designed and implemented a near real-time data pipeline to process structured and semi-structured data and ingesting it into the centralized data lake using multiple AWS and big data technologies
  • Utilized PySpark to perform data processing on large datasets to improve ingestion and processing speed of that data by 80%
  • Migrated on-prem Hortonworks code to AWS EMR code base completely to take advantage of the nearly unlimited expanding storage capabilities of S3, EMR offering both industry-leading scalability and data availability
  • Developed Sqoop and Spark Jobs to load the data from RDMS, External Systems into S3 and Hive partitioned tables
  • Responsible for building scalable distributed data solutions using EMR cluster environment with Amazon EMR 5.3.1
  • Created internal Python Library used to parse and reformat data from external applications reducing the error rate of data files
  • Used Hive QL to analyze the partitioned and bucketed data, executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business specification logic
  • Implemented custom error handling in Talend jobs and also worked on different methods of logging
  • Created ETL/Talend jobs both design and code to process data to target databases
  • Participated in all phases of development life-cycle with extensive involvement in the definition and design meetings, functional and technical walkthroughs.

System Engineer

Tata Consultancy Services
Bangalore
10.2016 - 04.2019
  • Design and developed end-to-end ETL process from various source systems to Staging area, from staging to loading, loading to certified zones and from certified to SQL data warehouses
  • Responsible for building scalable distributed data solutions using Hadoop
  • Loading the data from the different Data sources like (SQL,Vertica, SFTP servers) into HDFS using Sqoop (Version 1.4.3) and load into Hive tables, which are partitioned
  • ETL Data Cleansing, Integration & Transformation: Responsible of managing data from disparate sources
  • Designed a data warehouse using Hive, created and managed Hive tables in Hadoop for de-normalizing the data
  • Error logging and email notification are built to aid in debugging and maintenance Generic Data ingestion Framework for HPI
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data
  • Involved in importing and exporting data (SQL Server, Oracle, csv and text file) from local/external file system and RDBMS to HDFS from Sqoop
  • Involved in writing SQL Queries and used Joins to access data from Oracle, and MySQL and developed HQL queries by transforming SQL queries
  • Used Partitions and Bucketing techniques in hive and designed both managed and external tables to optimize performance
  • Prepared ETL mapping Documents for every mapping and Data Migration document for smooth transfer of project from development to testing environment and then to production environment
  • Performed Unit testing and System testing to validate data loads in the target
  • Achievements:
  • Recognized for Outstanding Performance Award by client HPI for successful release for the project to production
  • Selected as winner of On the Spot award for multiple customer appreciation received for replicating the EDL flow in Talend
  • Received Star Performer Of the Month award and much appreciation from the customer for implementation.

Education

Bachelor of Technology - Information Technology

KIET, UPTU
01.2012 - 01.2016

Intermediate - undefined

Pinewood School

S.G - undefined

R.R. Public School

Skills

Java , Python

Big Data Ecosystems: Hadoop, MapReduce, HDFS, Hive, Sqoop, Spark

AWS Services – EMR , Glue , S3, Athena ,Lambda, DynamoDB , Redshift , Step Functions

Knowledge of XML,HTML, JSON, JavaScript etcAndroid Application Development

Tools like Talend Open Studio, Visual Studio, PyCharm Professional

Handling of multiple databases like Redhshift , Mysql

Active Listening

Interpersonal Communication

Decision Making

Solution development

Interests

Badminton, Travelling , Adventure Sports , New tech learning

Timeline

Data Engineer

GE Aviation
07.2020 - Current

Data Engineering Specialist

GE Aviation
05.2019 - 07.2020

System Engineer

Tata Consultancy Services
10.2016 - 04.2019

Bachelor of Technology - Information Technology

KIET, UPTU
01.2012 - 01.2016

Intermediate - undefined

Pinewood School

S.G - undefined

R.R. Public School
Paras AroraData Engineer