Accomplished Senior Data Engineer at TEK-Systems, specializing in ETL pipeline development and PySpark programming. Enhanced data processing efficiency by 30% through innovative solutions in cloud environments. Proven ability to collaborate effectively with cross-functional teams, ensuring data integrity and driving impactful analytics for informed decision-making.
Overview
6
6
years of professional experience
Work History
Senior Data Engineer
TEK- Systems ( Client- John Deere)
03.2025 - Current
Built robust and scalable ETL pipelines in Databricks using PySpark, following the Medallion Architecture (Bronze, Silver, and Gold layers) to ensure modularity, lineage, and data quality.
Automated data ingestion from multiple structured and semi-structured data sources, like ServiceNow, SAP, and flat files; staged in the Raw (Bronze) S3 layer, processed, and enriched in the Cleansed (Silver) layer, and served analytics-ready data in the Curated (Gold) layer.
Implemented Delta Lake features such as time travel, schema enforcement, and ACID transactions to ensure data reliability and version control.
Developed reusable PySpark modules for data transformation, cleaning, validation, and auditing.
Wrote optimized SQL queries and Spark SQL code for large-scale data processing and reporting use cases, reducing processing time by over 30%.
Worked extensively with Databricks Notebooks, jobs, and workflows to orchestrate and schedule daily batch pipelines.
Maintained and enhanced the organization’s Enterprise Data Lake (EDL) by integrating data from ServiceNow and other systems, ensuring data freshness, and reliability for downstream analytics.
Created detailed data models for raw and curated datasets, maintaining lineage, documentation, and compliance with governance policies.
Collaborated with stakeholders to gather reporting requirements, and developed insightful Power BI dashboards for asset tracking (e.g., laptops, tractors, and hardware inventory).
Handled end-to-end Power BI development, including data modeling, DAX measures, slicers, drill-through, and performance tuning.
Experienced in using Python for scripting, API integration, and data validation tasks outside of Spark workflows.
Implemented transformations in PySpark. Developed and integrated 50+ transformations into an existing PySpark project, improving data processing efficiency by 25% and reducing runtime by 15%.
Implement a CICD (Continuous Integration and Continuous Development) pipeline for code deployment.
Experience with the creation of technical documents for functional requirements, impact analysis, and technical design documents.
Spearheaded the adoption of innovative BigQuery solutions, improving query performance by 60%.
Utilized GitHub for repository management, branch strategy, and pull requests to streamline the development workflow, and code review process.
Maintained and optimized GitHub repositories, including documentation and release management, for efficient project lifecycle management.
Developed data pipelines to ingest and process large datasets.
Designed and implemented data pipelines to support analytics and reporting needs.
Payroll
OLA ELECTRIC
10.2024 - 03.2025
Company Overview: Big Data (BIE-1)
Designed and implemented end-to-end scalable data pipelines using PySpark to ingest, transform, and process large volumes of structured and unstructured data from diverse sources.
Optimized PySpark jobs for batch and real-time data processing, significantly improving pipeline efficiency and reducing latency.
Built robust ETL pipelines in PySpark & SQL integrating data from multiple sources into a centralized data lake for downstream analytics and machine learning models.
Collaborated with cross-functional teams to design PySpark workflows that ensured data integrity, quality, and consistency throughout the pipeline lifecycle.
Automated data processing workflows using PySpark and Apache Airflow, enabling seamless data flow and reducing manual intervention by 80%.
Developed fault-tolerant PySpark pipelines with error-handling mechanisms to ensure continuous data flow in large-scale distributed environments.
Integrated PySpark pipelines with cloud platforms AWS for storage, processing, and advanced analytics.
Designed and scheduled workflows using Apache Oozie to orchestrate and manage complex Hadoop jobs for data ingestion, transformation, and processing.
Implemented Oozie workflows to automate ETL processes, ensuring seamless coordination between MapReduce, Hive, and PySpark jobs.
Developed error-handling mechanisms in Oozie workflows to ensure data pipeline reliability and reduce job failures by 30%.
Big Data (BIE-1)
Payroll
EXL Services (Clint-USA BASED BANK)
09.2022 - 04.2024
Company Overview: Consultant, Big Data
Evaluated technology stack for cloud-based analytics solutions.
Conducted extensive research to identify and implement the best strategies and tools for building end-to-end analytics solutions on the cloud, leading to a 30% improvement in data processing efficiency
Extensive experience in data analytics for the banking domain.
Led analytics projects that increased data-driven decision-making by 40%, directly contributing to a 15% growth in loan approvals, and a 10% reduction in customer churn.
Optimized Banking SAS Code for the Spark Platform.
Migrated and optimized over 200 SAS scripts to run on the Spark platform using Microsoft Cloud Technologies, reducing processing time by 50%, and cutting cloud costs by 20%.
Data collection and preparation.
Collected, cleaned, and transformed 1 TB of raw credit card data into actionable datasets, including Loan Data, Customer Data, and Payments Data.
Improved data accuracy by 30%, leading to better predictive modeling outcomes.
I worked on Hadoop file formats.
Efficiently managed and processed over 500 GB of data using Hadoop file formats like Parquet, ORC, and AVRO, resulting in a 35% improvement in data retrieval speed.
Use SQL and Snowflake's native features (like Streams, Tasks, and Stored Procedures) to transform raw data.
Good understanding of Spark architecture with Databricks, structured streaming.
Setting Up AWS and Microsoft Azure with Databricks, Databricks Workspace for Business Analytics, Manage Clusters in Databricks, Managing the Machine Learning Lifecycle.
Oversaw stakeholder management for banking accounts, led onshore and offshore teams to launch insightful projects for credit cards, and generated revenue of over $2M per annum.
Analyst
Clarivate (Fusion Technosoft)
12.2019 - 09.2022
Design and Develop ETL Integration Patterns.
Developed and implemented ETL integration patterns using Python on Spark, resulting in a 30% improvement in data processing efficiency and reducing ETL job failures by 20%.
Monitoring and Error Resolution.
Monitored and resolved errors across multiple environments, improving job success rates by 15% and reducing average error resolution time by 40%.