Harsh Yadav

Data Engineer

Cognizant Technologies Solutions

07.2021 - Current

Project 1:

Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability.
Refactored on-premises HQL scripts into BigQuery SQL format, applying necessary transformations for compatibility and performance.
Utilized Google Cloud Composer (Airflow) to orchestrate and schedule SQL and PySpark jobs, enabling efficient workflow automation.
Designed and implemented a Data Validation Framework using Google Cloud Composer (Airflow) to ensure data consistency in target tables, achieving an 80% reduction in manual validation efforts.
Developed a Data Profiler Framework using Google Cloud Composer (Airflow) to automate the generation of comprehensive dataset statistics. This framework enabled other teams to assess data quality, identify trends, and efficiently build new pipelines.
Continuous Integration/Deployment pipeline integration, make pull request, code reviews to deploy the pipeline using Jenkins
Followed Agile methodologies for iterative development and tracked efforts using Rally for efficient project management.

Project-2:

Led the migration of multiple workflows by designing and implementing end-to-end data pipelines. Utilized Google Cloud Composer for orchestration and integrated Databricks to trigger Spark jobs via the Databricks operator in Airflow DAGs. Delivered final outputs to Google Cloud Storage (GCS) buckets.
Streamlined the conversion of HQL scripts to PySpark notebooks during the migration from Hive to Databricks. Ensured seamless integration, functionality, and accuracy through comprehensive unit testing. Enhanced Airflow DAGs with email notifications and file delivery tasks for efficient downstream delivery.
Contributed to Continuous Integration/Deployment (CI/CD) pipelines by creating pull requests, conducting code reviews, and deploying workflow notebooks, properties files, Airflow DAGs, and shell scripts into the QEA environment.
Demonstrated project ownership by maintaining direct client interactions to provide regular progress updates. Followed Agile methodologies for iterative development and ensured effective effort tracking using Rally.

Similar Profiles

Santhosh SSanthosh S

Associate at Cognizant Technologies SolutionsAssociate at Cognizant Technologies Solutions

Rahul DevRahul Dev

QA Architect at Cognizant Technologies SolutionsQA Architect at Cognizant Technologies Solutions

SANTHOSHKUMAR MSANTHOSHKUMAR M

Networking & IT Infrastructure at Cognizant Technologies Solutions India Pvt LtdNetworking & IT Infrastructure at Cognizant Technologies Solutions India Pvt Ltd

Sourav Dutta ( Work Force Management )Sourav Dutta ( Work Force Management )

Process Specialist-Voice Programmer Analyst at Cognizant Technologies SolutionsProcess Specialist-Voice Programmer Analyst at Cognizant Technologies Solutions

Muralinath SamreddyMuralinath Samreddy

Data Engineer at LTIMindtreeData Engineer at LTIMindtree

Summary

Overview

Work History

Data Engineer

Education

Bachelor of Technology -

Skills

Timeline

Data Engineer

Bachelor of Technology -

Similar Profiles

Santhosh SSanthosh S

Rahul DevRahul Dev

SANTHOSHKUMAR MSANTHOSHKUMAR M

Sourav Dutta ( Work Force Management )Sourav Dutta ( Work Force Management )

Muralinath SamreddyMuralinath Samreddy