Parth Dodia

Data engineer

Jersey City,NJ

Summary

Four-year experience in designing and deploying scalable data pipelines, handling complex data integration in diverse cloud platforms Skilled in deploying and managing Relational and NoSQL databases, leveraging Elasticsearch for fast searching and analytics across datasets, integrating these technologies within cloud-based data pipelines for enhanced data accessibility and analysis Robust technical acumen in leveraging big data technologies such as Hadoop for distributed data management, Spark for efficient processing of large datasets in cluster environments, and Kafka for high-throughput real-time data streaming Specialized in optimizing ETL processes and data pipeline performance through meticulous tuning of Hadoop clusters and Spark jobs, ensuring minimal latency and maximum throughput in data operations

Overview

years of professional experience

Certification

Work History

Data Engineer

Berkshire Hathaway

09.2024 - Current

Developed and integrated real-time dashboards by seamlessly connecting Tableau with SQL databases and various cloud platforms, providing executives with actionable insights that improved financial performance and market trend analysis
Designed and maintained high-performance data pipelines utilizing AWS S3, Glue, Lambda, and Redshift, achieving a 30% boost in the efficiency of ingesting and processing high-volume financial datasets
Engineered a data synchronization framework using Pandas and SQL Server Integration Services, which enhanced the consolidation of disparate financial data streams into a centralized warehouse, improving data accuracy and timeliness for risk assessment and portfolio management by 35%
Refined CI/CD deployment strategy by incorporating GitLab for version control, Docker for containerization, and Kubernetes for orchestration, which streamlined the entire lifecycle of financial data pipeline deployments, increasing deployment efficiency and operational agility by 40%
Monitored and debugged ETL workflows daily to identify and resolve anomalies, improving the reliability of the data pipeline and ensuring a 15% decrease in data processing errors, contributing to system optimization and overall performance
Collaborated with team, project manager and clients to gather business requirements and streamline data workflows, ensuring alignment with organizational goals and client needs resulting in a 15% improvement in project delivery time

Data Engineer

Aplus Datalytics

01.2019 - 08.2022

Developed and seamlessly integrated a machine learning pipeline using AWS SageMaker and Azure ML into the existing data architecture, enhancing predictive analytics accuracy by 15%
Designed and enforced a data governance framework leveraging Collibra and Apache Atlas, improving metadata management and data quality by 20%
Enhanced the data lake architecture with Amazon Redshift Spectrum and Azure Synapse Analytics, achieving a 30% increase in data processing efficiency through optimized partitioning and indexing
Implemented a robust multi-cloud data management strategy utilizing AWS RDS, Azure SQL Database, and Google Cloud Spanner, which increased system resilience and operational uptime by 25%, ensuring high availability and disaster recovery
Translated business requirements into tailored reports and dashboards using Power BI, delivering actionable insights on KPIs such as click-through rate (increased by 10%), operational efficiency (boosted by 8%), and revenue growth (up by 12%)

Education

Master of Science - Information Systems

Pace University

May 2024

Bachelor of Technology - Electronics and Telecommunications

Mumbai University

May 2020

Skills

Methodologies:
Agile, Waterfall
Language: Python, Scala, SQL
ETL Tools: Informatica, Talend, Alteryx
Databases:

MySQL, PostgreSQL, MongoDB, Snowflake, Oracle, Redshift
Big Data technologies: Apache Spark, Hadoop, Kafka, Airflow
Tools: Power Bi, Tableau, Git, Ansible, Chef, Eclipse, Jupyter Notebook, Cloud endure, Jira
Cloud Services: AWS (S3, Glue, Lambda), Azure (Data Factory, Synapse Analytics), GCP (BigQuery)

Accomplishments

Certifications: IBM Data Engineering Professional (Link), Google Analytics

Certification

Data Pipeline Orchestration, Data Warehousing, Data Modeling, Data Visualization

Work Preference

Work Type

Full Time

Location Preference

Hybrid

Timeline

Data Engineer

Berkshire Hathaway

09.2024 - Current

Data Engineer

Aplus Datalytics

01.2019 - 08.2022

Master of Science - Information Systems

Pace University

Bachelor of Technology - Electronics and Telecommunications

Mumbai University