Data Engineering professional with 10+ years of experience in Data Engineering and Data Pipeline Frameworks. Expertise in handling Big Data and Data Warehousing projects utilizing extensive Big Data components like Hadoop, Spark, Hive, Scala, and Sqoop. Proficient in cloud technologies including AWS (S3, EC2) and container technologies like Docker and Kubernetes. Currently working as a Team Lead at Comcast, demonstrating a strong capability to learn and drive both personal and organizational growth. Hardworking and passionate with strong organizational skills willing to take on added responsibilities to meet team goals and manage multiple priorities with a positive attitude.
Overview
11
11
years of professional experience
Work History
Development Engineer 4
Comcast
chennai
02.2020 - Current
Worked well in a team setting and worked as a team leader,providing support and guidance
Gained leadership skills by managing projects from start to finish
Gained extensive knowledge in data engineering and analysis
Proved successful working within tight deadlines and a fast-paced environment
Ingestion of source data from RDBMS to hdfs running hive query for data analysis
Optimization for query run time
Created data pipeline using spark scala and stored the data in on prem Minio and AWS
Worked extensively on writing and fine-tuning SQL queries
Implemented Custom Data Quality checks to ensure delivered data were of utmost quality
Used docker and k8s for building the image and running project end to end
Applied analysis skills, leveraging insights, developing and deploying data models and evaluating and improving existing models to create solutions.
Worked alongside team members and leaders to identify analytical requirements and collect information to meet customer and project demands.
Worked on user escalation to resolve the critical data asset issue and fix it and deliver on time
Bigdata Hadoop Developer
TCS
12.2014 - 02.2020
Built and maintained data pipelines to ingest, process, and store large volumes of data.
Collaborated with data analysts to provide clean and reliable datasets for reporting and analysis.
Implemented data validation and monitoring processes to ensure data quality
Developed and maintained documentation for data engineering processes and system
Data migration from HDFS to cloud storage [Minio/Data Lake] changing the
Data ingestion from different server source (oracle/sql server) to target (MINIO/Datalake)
Code enhancement worked on Automation using shell scripts for improving productivity and reducing manual effort
Performance Tuning to reduce the run time
Worked in docker and Kubernetes Containerization
Big Data Hadoop Admin
TCS
kolkata
10.2013 - 12.2014
Part of Building New Cluster
Created Hive queries to process the data and worked with sqoop
Upgrade Hadoop version
Setting up ssh keys and implementation password less ssh/sftp
Administrative activities with Ambari
Setting up Linux users, groups, kerbores principals and keys 7
Cluster Health check and ensuring Cluster stability 8
Add/Remove nodes from Cluster 9
Installing Hadoop/Third Party components 10.Screen Hadoop cluster job performances 11.Hands on experience in analysing Log files for Hadoop and eco system service and troubleshooting
HDFS
Yarn
Hive
Education
B.Tech - University
West Bengal University of Technology (W.B.U.T
2013
HS(12th Std) -
WBCHSE
2008
10th Std -
WBBSE
2006
Skills
Technical Skills
Programming Languages: Spark,Scala, Sql
Big Data Technologies: Hadoop, Spark, Hive, Sqoop, HDFS, YARN,Ambari