Mohammed Shahinsha

Data Engineer

Chennai

Summary

Data Engineer with 3 years of experience designing and optimizing data pipelines using PySpark, Hive, Hadoop, MySQL, and cloud platforms like GCP and Azure. Skilled in handling diverse data sources such as CSV, JSON, relational databases, distributed file systems, and cloud storage. Expertise in building efficient ETL workflows, data validation processes, and scalable data architectures.

Overview

years of professional experience

Language

Work History

Data Engineer

cams

11.2023 - 09.2024

Developed batch processing pipelines using Hadoop and Spark, handling distributed datasets and improving data processing efficiency.
Implemented data extraction workflows connecting MySQL and Spark, enabling seamless data integration for transactional reporting.
Optimized Spark jobs for performance tuning and faster data processing across cloud environments.
Created and managed data pipelines with Azure Data Lake, ensuring scalable and secure storage solutions.
Worked on data aggregation and transformation using Hive, supporting analytics reporting and strategic decision-making.

Data Engineer

S2 Integrators Pvt Ltd

01.2022 - 10.2023

Designed and implemented ETL pipelines using PySpark to process structured and semi-structured datasets from CSV and JSON files.
Built data workflows integrating Hive and Spark SQL, transforming raw datasets into structured formats optimized for reporting.
Automated data ingestion processes, ensuring data validation and error handling for high-volume datasets.
Collaborated with analytics teams to ensure data availability and consistency for business insights and dashboard generation.
Worked with Google Cloud Storage (GCS) and BigQuery to store and query large datasets, improving data accessibility and performance.

Education

Bachelor of Technology (B.Tech) - Civil Engineering

MGR UNIVERSITY

Chennai, India

08-2019

Skills

Data Processing & Tools: PySpark Hive Hadoop Spark SQL Airflow Kafka

Accomplishments

Improved data pipeline throughput by 40% by optimizing Spark jobs for CSV and JSON file processing.
Enhanced data accuracy by implementing schema validation and error-handling routines in ETL workflows.
Reduced storage costs and improved scalability by integrating cloud storage solutions on GCP and Azure.

Timeline

Data Engineer

cams

11.2023 - 09.2024

Data Engineer

S2 Integrators Pvt Ltd

01.2022 - 10.2023

Bachelor of Technology (B.Tech) - Civil Engineering

MGR UNIVERSITY