Summary
Overview
Work History
Education
Skills
Personal Information
Certification
Accomplishments
Affiliations
Timeline
Generic
ROUSHAN KUMAR

ROUSHAN KUMAR

Pune

Summary

Result-driven 7 yrs. experienced, smart-working data engineer with expertise in big data processing frameworks including Hadoop, Apache Spark, Databricks and data warehousing. Proficient in SQL for database management. Demonstrated strong problem-solving abilities and adaptability, delivering innovative data solutions in diverse environments and stringent timelines. Committed to continuous learning and professional growth, leveraging self-directed learning to enhance technical skills.

..

You do the thinking and let the machine do the job, as it is faster, infallible and doesn't get tired like you.

Overview

3
3
years of professional experience
1
1
Certification

Work History

Data Engineer

Next Pathway
Pune
05.2022 - Current

Next Pathway—Data Engineer (2022-2025)

  • Societe Generale—Worked as team lead on the repointing project from Terradata to Snowflake, involving BTEQ shell scripts, multiple challenges, and a very stringent timeline. Developed multiple Python pocket automations involving TPT file translations to SnowSQL files and automation scripts to handle project deliveries in an efficient manner in the least amount of time and effort possible.
  • Aviva-worked on the migration of Datastage to the Azure DBT project. I primarily handled the TWS to Azure Airflow DAG translation aspect, which involved the translation of TWS scripts to Airflow DAGs involving multi-DAG dependency and complex workflows. Create pocket automation to translate the TWS file to Python DAG file which resulted in efficient and quick delivery of the project. Worked on the testing and debugging of the DAGs across multiple application.
  • Humana—Worked on the Datastage to Pyspark migration project involving technologies such as Azure DataBricks and utilized DBFS and ADLS for storage and Delta Lake implementation. The project involved multiple challenges like efficient PySpark job execution, handling JSON data, and cluster optimization. Created a Python automation script to translate shell scripts to PySpark Databricks notebooks.
  • Menards—Worked on the migration of Informatica to Pyspark. The project involved migration of Informatica pipelines to Pyspark which was supposed to be executed on-prem. The project involved multiple challenges involving reading data from on-prem database using JDBC connection and loading the same to target warehouse and then Pyspark for ETL implementation of Informatica jobs. Created multiple pocket automations involving the conversion of Oracle DDLs to PySpark schemas and multiple Oracle SQLs to the target warehouse.
  • Seattle Children—Worked on the migration of Datastage to GCP involving translation of Datastage pipelines to GCP Spark jobs and SQLs to BigQuery. Worked on the UT, SIT and UAT of PySpark jobs, ensuring quality delivery in the least possible amount of time.
  • R&D—Worked as a mutation developer in the R&D team to develop mutation language for automated translation of MySQL queries to target SQL languages. This involved extreme logical and critical thinking on the problem statements and their best solution implementation.

Infosys Ltd.—Technology Analyst (2018–2022)

  • Executed Bigdata projects focusing on Data Warehouse, Data Management, and ETL concepts.
  • Utilized technologies such as PySpark, Hive, Shell scripting, HTML, Basic JAVA Spring Boot
  • Developed multiple ODL (Organized Data Layer) projects on Hive and Pyspark
  • Developed and implemented a solution for flattening near real-time JSON data in SORs into tabular format utilizing PySpark. multiple jobs across various applicata solutionions with this change.
    Facilitated significant time and effort savings through improved technology implementation.
    Provided opportunities for team members to gain new skills and knowledge during transition.
  • Engineered generic reusable components for team and clients, streamlining data validation and comparison processes
  • Developed and implemented solution for flattening near real-time JSON data in SORs into tabular structured format utilizing PySpark.
  • Acquired new skills in response to requirements and evolving technologies for improvement of deliverables' quality.

Education

Bachelor of Engineering - CSE (Computer Science and Engineering.)

Birla Institute of Technology
Mesra
06-2018

Skills

  • Hive, Hadoop, Data warehousing, ETL processes
  • PySpark
  • Azure Databricks, DBFS, ADLS, Airflow
  • Python
  • Python automation
  • Shell Scripting
  • Linux, crontab
  • SQL
  • Git, Azure DevOps
  • HTML
  • JAVA
  • GCP
  • Team leadership

Personal Information

Date of Birth: 09/29/95

Certification

  • SnowPro Core - April 2024
  • End to End Pyspark Developer (Udemy) - October 2024

Accomplishments

  • Received Peer Award (Employee of the Month): April 2025
  • Received Client appreciation for automations done to fastrack the project deliveries.
  • Have a good track record of getting Outstanding rating throughout my Career at Infosys and Next Pathway.

Affiliations

  • I love trekking and have a keen interest in Singing and music listening.
  • Let's keep travelling and humming as the time is limited and there is much more to explore.

Timeline

Data Engineer

Next Pathway
05.2022 - Current

Bachelor of Engineering - CSE (Computer Science and Engineering.)

Birla Institute of Technology
ROUSHAN KUMAR