Summary
Overview
Work History
Education
Skills
Projects
Timeline
Generic

Satyam Verma

Noida

Summary

At Gojoko Technologies India Private Limited, I spearheaded data engineering initiatives, optimizing ETL workflows and enhancing data warehouse designs for scalability and performance. My expertise in Python and collaborative approach with cross-functional teams significantly improved data quality and governance, aligning with GDPR standards and driving data-driven decision-making.

Overview

2
2
years of professional experience

Work History

Data Engineer

Gojoko Technologies India Private Limited
Noida
12.2022 - Current
  • ETL Processes and Data Integration: Design, implement, and optimize ETL workflows to extract, transform, and load data from various sources into a centralized data warehouse, ensuring data quality, consistency, and scalability.
  • Database and Data Warehouse Design: Develop and maintain data warehouse structures, including star and snowflake schemas, dimensional models, and fact tables, to support business requirements, and ensure efficient data storage and retrieval.
  • Performance Optimization and Tuning: Monitor and optimize database and ETL performance by implementing indexing, partitioning, and other strategies to ensure fast query performance, and scalability.
  • Data Quality and Governance: Ensure data integrity, accuracy, and compliance with data governance policies, implementing data validation, cleansing routines, and security measures to protect sensitive data.
  • Collaboration and Documentation: Work closely with Data Analysts, Credit Risk, and BI Teams to meet data requirements, and maintain comprehensive documentation of designs, ETL processes, and data models for future reference and audits.

Education

B-Tech - Computer Science Engineering

Institute Of Engineering And Technology
Sitapur
06-2022

Skills

  • Python
  • PySpark
  • MySQL
  • Data Modeling
  • AWS
  • Git and GitHub
  • Databricks
  • CI/CD
  • Airflow
  • Docker and Portainer
  • System Design
  • GDPR

Projects

CASE STUDY: Data Quality and Processing Solution – Python, MySQL, Airflow and AWS (S3, CloudWatch, Secrets Manager, Parameter Store, SNS)

  • Developed an automated pipeline to process CSV files from AWS S3 using Apache Airflow.
  • Implemented data quality checks, including date format validation, mandatory column checks for null values and valid date of birth.
  • The results were then inserted into two separate tables: one for records passing DQ checks and an error table for records that failed.
  • Utilized AWS CloudWatch for monitoring and AWS SNS for alerting to ensure reliability.

Project: New Blacklist Rules Implementation – Python and MySQL

  • Gained a thorough understanding of the existing blacklist_rules functionality and the criteria for blacklisting applicants.
  • Enhanced the blacklist system by adding new rules to blacklist applicants for a specific period.

Project: New Greylist Rules Implementation – Python and MySQL

  • Gained a thorough understanding of the existing greylist_rules functionality and the criteria for greylisting applicants.
  • Enhanced the greylist system by adding new rules to greylist applicants for a specific period.

GDPR Compliance Implementation – MySQL and MsExcel

  • The data classification effort as part of the General Data Protection Regulation (GDPR) compliance initiative.
  • Categorized database table columns into Personal Identifiable Information (PIIs) and non-PIIs.
  • Ensured the organization’s data management practices adhered to GDPR guidelines, safeguarding sensitive information and enhancing data privacy.

AWS Glue Pipelines – PySpark, MySQL, Airflow and AWS (S3, Glue, CloudWatch, Secrets Manger, Parameter Store, IAM, SNS)

  • Successfully developed and maintained AWS Glue pipelines to ingest over 10,000+ data files from AWS S3 into database tables, ensuring seamless and efficient data integration.
  • Designed robust and scalable data ingestion and transformation processes, optimizing the flow and storage of large volumes of data to enhance system performance and reliability.
  • Implemented comprehensive monitoring solutions using AWS CloudWatch to track pipeline performance and identify potential issues, ensuring continuous system operation.
  • Utilized AWS Secrets Manager to securely manage and retrieve database credentials, enhancing data security and simplifying access management.

Airflow DAGs Migration and Custom ETL Framework Development – Python, Docker, AWS (EC2, Parameter Store, Secrets Manager, SNS, IAM)

  • Migrated more than 20 Airflow DAGs to version 2.4.3, improving the efficiency and stability of data workflows.
  • Replaced AWS Data Pipeline service with a custom-developed framework for running ETL pipelines.
  • Developed and implemented success and failure notification functionalities within the framework, ensuring timely alerts and facilitating prompt issue resolution.

Query Runner Framework Development – Python, MySQL, Data Modelling, ORMs, Airflow, AWS (EC2, Secrets Manager, Parameter Store, IAM, SNS, CloudWatch)

  • Designed and developed a query runner framework to send reports to end-user systems via multiple channels: SMTP, SFTP, SharePoint, AWS S3, and database tables.
  • Enhanced the reporting capabilities and delivery efficiency by implementing robust and flexible data transmission methods.
  • Ensured timely and accurate distribution of reports to end-users, improving data accessibility and usability.

Project: Data Warehouse (DWH) Project – PySpark, Databricks (SQL, Workflows, Dashboard), Data Modelling, CI/CD.

· Fulfilled ongoing requirements and enhancements in the Stage layer to preprocess and ingest raw data from diverse sources, ensuring efficient Extract, Transform, Load (ETL) processes.

· Developed and managed the Vault layer for storing, integrating, and maintaining historical and transactional data using Data Vault 2.0 methodology, enabling scalable and auditable data storage.

· Optimized the Mart layer to deliver business-specific data models, facilitating OLAP (Online Analytical Processing) for advanced reporting and analytics.

· Collaborated with cross-functional teams including data architects, developers, and business analysts to ensure data consistency, accuracy, and conformance with data governance policies.

· Employed continuous integration and continuous deployment (CI/CD) pipelines for automated testing, deployment, and monitoring, ensuring high-quality deliverables and rapid iterations.

· Utilized PySpark and Databricks for developing and orchestrating data workflows, creating interactive dashboards for real-time data visualization and insights.

· Documented ETL processes, data models, and architectural changes to maintain up-to-date records, supporting project continuity and knowledge transfer.

Project: DataLake House [DWH 2.0 Enhancement, Hourly Mart Refresh, Data Quality Improvement] – PySpark, Databricks (SQL, Workflows, Dashboard), Data Modelling, CI/CD.

· Upgraded the Data Warehouse (DWH) to DataLake House architecture, enabling a seamless integration of structured and unstructured data.

· Implemented enhancements to ensure the Mart layer refreshes on an hourly basis, improving the timeliness and accuracy of business intelligence reports.

· Identified and resolved data gaps within the platform, ensuring data integrity and consistency.

· Collaborated with data architects and engineers to design and implement scalable data pipelines for efficient data ingestion and processing.

· Utilized PySpark, Databricks, and other big data technologies to streamline data workflows and optimize performance.

· Monitored and maintained data quality, applying data governance practices to uphold data accuracy and reliability.

· Documented data processes, architectural changes, and troubleshooting guides to support ongoing project development and maintenance.

Timeline

Data Engineer

Gojoko Technologies India Private Limited
12.2022 - Current

B-Tech - Computer Science Engineering

Institute Of Engineering And Technology
Satyam Verma