Summary
Overview
Work History
Skills
Personal Information
Education
Certification
Timeline
Hi, I’m

Srijan Srivastava

Lead Data Engineer
Patna
Srijan Srivastava

Summary

A lead data engineer with over 9 years of experience and strong background in designing, developing and maintaining robust big data solutions and designing enterprise applications across various technologies including Apache Spark, Databricks, AWS, and the Hadoop ecosystem. Demonstrated expertise in programming and processing complex data, as well as configuring databases to effectively meet business data requirements. Proficient in utilizing Snowflake for data warehousing and analytics, ensuring efficient data management and the establishment of reliable data pipelines.

Overview

9
years of professional experience
2
Languages
4
years of post-secondary education
1
Certificate

Work History

Luxoft

Senior Software Developer
04.2025 - Current

Job overview

  • Delivered high-quality code on time by effectively managing project timelines and prioritizing tasks accordingly.
    Developed and updated configuration files and internal metadata while performing data quality, validation, cleansing, and transformation of incoming datasets.
    Designed and maintained data layers for high-volume, structured data using Hadoop-based frameworks and data warehouse platforms.
    Implemented PySpark operations including joins, Spark SQL and data transformations using Python and Pandas, optimizing large-scale data processing workflows.
    Developed Shell scripts to enhance automation, improve workflow efficiency, and support data processing operations.
    Performed complex data extraction, analysis, and reporting by writing optimized SQL queries across relational and cloud-based databases like Hive, Impala, PostgreSQL, MS SQL and PLSQL.
    Developed and maintained comprehensive technical specification documents, defining architecture, data flows and integration requirements for project implementation.
    Engineered and optimized data models using advanced modeling techniques, while enforcing data governance, ensuring referential integrity, consistency, and security across enterprise systems.
    Executed end-to-end Software Development Life Cycle (SDLC) activities, including requirement analysis, design, coding, unit testing, deployment, and productionization of scalable applications.
    Performed root cause analysis, debugging, and issue resolution to optimize system performance, enhance reliability, and ensure operational stability.
    Collaborated with cross-functional teams to integrate software components seamlessly into existing systems.
    Mentored junior developers, providing guidance on best practices and coding techniques for improved productivity.

EPAM Systems, Inc
Gurugram

Senior Software Engineer
03.2021 - 04.2025

Job overview

  • Developed and executed Apache Spark jobs within Databricks and AWS EMR to clean, normalize, and aggregate claims data, ensuring high data quality and consistency.
  • Integrated Snowflake as a data warehouse to facilitate real-time data analysis, utilizing Snowpipe for seamless data ingestion.
  • Collaborated with business analysts, development teams, and infrastructure specialists to design and implement solutions based on project requirements.
  • Performed data validation, cleansing, and transformation on input datasets, including the creation of configuration files.
  • Stored large volumes of structured and semi-structured data across various data layers using AWS S3, Delta Lake, Snowflake and the Big Data Hadoop Framework.
  • Engaged in all phases of the Software Development Life Cycle (SDLC), including analysis, design, development, testing, and
    deployment, delivering unit-tested systems within customer-prescribed timeframes.
  • Conducted internal code reviews in BitBucket/GitHub, providing constructive feedback to enhance overall product quality
    and team collaboration.
  • Identified and analyzed issues, delivering effective solutions to improve system performance and reliability.
  • Mentored junior developers, fostering professional growth and enhancing team productivity.
  • Contributed to the design and development of technical specification documents to guide project direction and implementation.

Emids Technologies
Bangalore

Software Engineer
10.2019 - 03.2021

Job overview

  • Worked with a 14-node production Hadoop cluster and a 7-node development cluster to manage large-scale data processing.
  • Performed data validation, cleansing, and transformation on input datasets to ensure high data quality and consistency.
  • Overcame challenges in storing large volumes of structured and semi-structured data in a data lake using the Big Data Hadoop Framework, Hive and Snowflake data warehouse.
  • Utilized RDDs, DataFrames, Spark joins, and Spark SQL with Scala to perform complex data processing tasks.
  • Leveraged HBase as a metastore and used RDBMS solutions like MS SQL and PostgreSQL as data sources for effective data management.
  • Gained an understanding of healthcare domain concepts and terminology to drive data-related initiatives.
  • Acquired in-depth knowledge of the Cotiviti data lake framework to enhance data architecture.
  • Engaged in all phases of the Software Development Life Cycle (SDLC), including analysis, design, development, testing, and
    deployment of applications in the Hadoop cluster.
  • Contributed to the design and development of technical specification documents to support project implementation.

ITC Infotech India Limited
Bangalore

Big Data Developer
08.2016 - 10.2019

Job overview

  • Developed big data solutions using Hadoop, Hive, Spark, and Neo4j to meet complex data processing needs.
  • Standardized practices for data ingestion, cleansing, and analysis to deliver high-quality solutions.
  • Created data pipelines to convert Hive tables into Spark DataFrames, ensuring efficient data transformation and output.
  • Overcame challenges related to storing and processing large volumes of structured data using the Hadoop Framework.
  • Utilized SQL for data manipulation in both RDBMS and Apache Hive enhancing data accessibility and reporting.
  • Contributed to the design and development of technical specification documents to guide project implementation.

Skills

Apache Spark/PySpark

undefined

Personal Information

  • Date of Birth: 02/03/1994
  • Gender: Male

Education

University Of Pune
Pune, India

Bachelor's Of Engineering
08.2012 - 06.2016

Certification

Databricks Certified Data Engineer Associate

Timeline

Senior Software Developer

Luxoft
04.2025 - Current

Databricks Certified Data Engineer Associate

01-2025

Senior Software Engineer

EPAM Systems, Inc
03.2021 - 04.2025

Software Engineer

Emids Technologies
10.2019 - 03.2021

Big Data Developer

ITC Infotech India Limited
08.2016 - 10.2019

University Of Pune

Bachelor's Of Engineering
08.2012 - 06.2016
Srijan SrivastavaLead Data Engineer