Summary
Overview
Work History
Education
Skills
Additional Information
Timeline
Generic

Soumyakanta Rath

Data Engineer/Data Platform Engineer
Bengaluru

Summary

  • Detail-oriented Data Engineering designs, develops and maintains highly scalable, secure and reliable data structures. Accustomed to working closely with system architects, software architects, Data Scientists and design analysts to understand business or industry requirements to develop comprehensive data models. Proficient at designing and programming scalable data pipelines at the modeling, design and implementation stages.
  • Data Engineering with around 6 + years of experience in building data pipelines to ingest, process and transform data from files,APIs, Streams and databases using various tech stacks like Apache Spark, Apache Kafka,Flink, Databricks,Python,Scala, Airflow,NoSQL/MPP databases as well as Snowflake, Redshift on on-premise or cloud environment like AWS,Azure.
  • Responsive expert experienced in monitoring database performance, troubleshooting issues and optimizing database environment. Possesses strong analytical skills, excellent problem-solving abilities, and deep understanding of database technologies and systems. Equally confident working independently and collaboratively as needed and utilizing excellent communication skills.
  • In-depth understanding of Spark Architecture and worked on various Spark Optimization techniques for large scale PBs of data volume like Broadcasting, Salting, Bucketing, Partitioning, Caching, Checkpointing etc.
  • Experience in building Modern Serverless Data Lake Architecture and Data Pipelines using AWS Tech Stacks( Glue,AWS lambda, Step Function, S3,Athena & Redshift).
  • Worked in Azure Environment(Azure data factory, ADLS Gen-2, Databricks, Deltalake, Unity Catalog).
  • Worked extensively on Data Quality Issues, Data leakage, building realtime dashboard - slack integrated notification on Data Quality.
  • Worked with AI Engineers/ Data Scientist to avail data for their effective model building such as Amazon Bedrock with NLP, Text to SQL generation for data analysis
  • Worked with various file formats like Parquet, Orc,Avro files and various compression formats like Snappy, LZO. Worked on MPP systems and query engines such as Presto, Impala, Hive and Snowflake, Redshift.
  • Ability to write complex sql program for complex and multilayered datasets and experienced with various SQL database like Oracle,Postgres. Experience in working with NoSQl Dbs(MongoDB,Cassandra,Hbase), DynamoDB.
  • Strong Programming ability in Python,Scala and experience working with CICD and in Agile and multi-dimensional team.
  • Experience in working with International Data Privacy Standards(HIPPA,GDPR) of America & Europe and worked in Agile and Kanban model.

Overview

14
14
years of professional experience
4
4
years of post-secondary education

Work History

Senior Data Platform Engineer

Numentica(client - Fox Tech)
02.2024 - Current
  • Worked on creating solid framework for Data Quality issues between Events(streaming data) and Summary data. Finding the data leakage, data quality issues.
  • Comparing and building framework for event based data with the batch data to ensure quality of information is intact.
  • Ensured data quality through rigorous testing, validation, and monitoring of all data assets, minimizing inaccuracies and inconsistencies.
  • Enhanced system performance by designing and implementing scalable data solutions for high-traffic applications.
  • Delivered exceptional results under tight deadlines, consistently prioritizing tasks effectively to meet project timelines without compromising quality or accuracy.
  • Work in a team of data engineers and data scientists and analysts to understand their data needs and develop data solutions that meet those needs.
  • Worked with AI engineers to facilitate data and Api building to implement Amazon Bedrock LLM to text to sql models for data analysis.
  • Design, build, and maintain scalable data platforms and pipelines to support the needs of the business.

Tech Stacks:- Databricks, Delta lake, Unity Catalog,Flink, Kafka, DynamoDB, Python

Senior Data Engineer

Cloudwick Technology (a Product Based Company)
11.2021 - 01.2024
  • Design, build and optimize the data architecture using suitable design patterns and effective Data Model to data pipelines to make them accessible for Business Data Analysts, Data Scientists and Business users to enable data-driven decision making and to create single source of truth. Developed database architectural strategies at modeling, design and implementation stages to address business or industry requirements.
  • Build serverless data lake architecture using AWS tech stacks:-S3, Glue, Redshift, Athena, Step function, lambda, cloudwatch .. for daily, weekly and incremental load.
    Build data pipeline on robust Data profiling and Data quality with pluggable code during stages of data pipeline and Data Quality is in desired range, schema valid or not, data format errors
  • Gathered, defined and refined requirements, led project design and oversaw implementation.
    Designed data models for complex analysis needs.
    Designed and created Audit/Control log table and Master Data management for transparent view of status of data loading. Optimize data pipelines, algorithms and data storage to improve performance and scalability.
  • Work closely with Analysts to Productionize and scale value-creating abilities including data integrations and transformations, model features and statistical and ML models(ARIMA models, Random Forest, Regression). Wrote Complex Redshift sql views for Data Analyst and Data Scientist to leverage models on top of it.
    Brought best-practice to proactively and continuously build data related practices within the team.

Data Platform Engineer

Swiggy
04.2021 - 11.2021
  • Creating Spark jobs, tasks in Databricks on TBS of data with optimized code and scheduled it using Databricks workflow.
  • Bulk loading from external stage(AWS S3), internal stage to Snowflake cloud using COPY command.
    Used Snowpipe for continuous data ingestion from S3.
  • Created and orchestrated Airflow Dags for various data pipelines and ad-hoc jobs.
  • Work with analytics partners to deploy scalable data pipelines for analytical needs in Databricks.
  • Created Spark/Scala UDF's for ML Team on geohash to decode and haversine distance.
  • Build and expose metadata catalog for the Data Lake for easy exploration, profiling as well as lineage requirements.
  • Enable Data Science teams to test and productionize various ML models, including propensity, risk and fraud models to better understand, serve and protect our customers.
  • Proved successful working within tight deadlines and a fast-paced environment.
  • Tech Stacks :- AWS S3, Databricks, Airflow, Snowflake, Scala Spark, Python

Data Engineer

UST Global
12.2018 - 02.2021
  • Worked for US largest Retail chain build Data Pipeline for South America Cost of goods and supply chain management for Data Science Team on stochastic, classification models.
  • Developed, implemented and maintained data analytics protocols, standards, and documentation.
  • Worked in creating decoupled and scalable Serverless Datalake architecture using AWS cloud environment (AWS S3, EMR, Redshift, AWS lambda, AWS Step function ) and No SQL databases for one of Retail giant. Extensively used Spark as processing engine with optimization to achieve optimal performance for data pipeline.
  • Build scalable ETL pipeline design/implementation with large distributed data on top of Hive and Spark.
    Worked in variety of Data Ingestion process from source system(RDBMS,SFTP & API Endpoints).
  • Expertise in using Spark SQL with various data sources like JSON, CSV, Parquet and Hive.
  • Demonstrated experience in designing and developing reusable code Frameworks, libraries, and components.
  • Worked in Dynamic Environment with Scrum Master, Product Owner, Data Architect, Data Scientist and DevOps.
  • Collaborated with Business Analyst, Architect Team to maintain data integrity and verifying stability of ETL pipeline.
  • Analyzed complex data and identified anomalies, trends, and risks to provide useful insights to improve internal controls.
  • Collaborated with system architects, design analysts and others to understand business and industry requirements.
  • Managed large scaled TBs of data and billions of data in Spark jobs.

Senior Programmer

Epsilon( A Publicis Company)
04.2016 - 04.2018
  • Involved in building data pipeline from collecting terabytes of log information, storing, cleansing and extracting meaningful information that will be loaded to NoSQL DB(HBase/Cassandra) for further analysis or to run ML models on it in Apache Spark Ecosystem with Scala.
  • Involved in transitioning from traditional Datawarehouse to Hive through the query language. Develop efficient hive scripts with joins in dataset using various techniques.
  • Expertise in creating Hive tables(External/Managed), partitioning, bucketing,loading, aggregating of table and creating UDF's in Hive.
  • Imported and Exported large sets of RDBMS data into HDFS and vice versa using Sqoop.
  • Experiencing in processing large amounts of structured and unstructured data. Played a role in Data pipeline building with Relational and NoSQL DB.
  • Implementing design enhancements and change requested by Business/Project requirement.

Software Engineer

Harman ( A Samsung Company)
12.2014 - 04.2016
  • IRI is an American market research company which provides clients with consumer data, shopper, and retail market intelligence data and Analytic Solutions focused on the consumer packaged goods (CPG) industry. IRI’s clients include 95 percent of the Fortune Global 500 CPG, retail and healthcare companies.
  • Developed, tested and maintaining the ETL process necessary to load data into a data warehouse, ensuring data quality and reconciliation between RDBMS.
  • Developed database applications using Oracle SQL/ PlSQL and Python on Unix/Linux environments,Extensive exposure to Python modules to interact with data.
  • Managing ETL loads and Jobs in Netezza and Oracle Warehouse.
  • Assisted in designing and implementation of data marts and data warehousing applications as per business needs.
  • Designed data extraction processes to meet business application requirements and data sources,Provided technical expertise and guidance for data management, quality and reporting functions.
  • Suggested improvements in processes for managing data integrity and quality for overall enterprise.

Technical Specialist

IBM
02.2014 - 12.2014
  • Played a key-role in identifying, build, developing automation features and provide maintenance to existing tools in order to eliminate costs, improve turnaround time and overall quality.
  • Troubleshooting/ Root cause analysis of any data related issues.
  • Developed reports and perform ad-hoc data analysis using SQL queries and provide data extracts to users when requested.

Production Analyst

Theorem(Client:-Epsilon)
01.2011 - 02.2014
  • Created shell script in loading data to Data warehouse concepts like Star schema, Snowflake schema.
  • Primary focus in automating manual process, status notification and simplifying process.
  • Troubleshooting/ Root cause analysis of any data related issues.
  • Developed reports and perform ad-hoc data analysis using SQL queries and provide data extracts to users when requested.
  • Identified root causes of recurring problems to implement effective solutions.

Education

B.Tech (Computer Science Engg) - Computer Science

Biju Pattnaik University of Technology
08.2005 - 09.2009

Skills

  • Apache Kafka,Flink

  • AWS(Glue,Redshift,Athena,S3,lambda,Stepfunction)

  • Azure( Databricks,Deltalake, ADLS, Azure Data Factory)

  • Data Quality Assurance, Data Governance, Data Modelling, Data Security

  • Python Programming, API Development,Machine Learning, Metadata Management, Advanced SQL, Scala Programming, Spark Development

  • NoSQL Databases, Data Security, Kafka Streaming, Git Version Control

  • Snowflake,NoSql DBs(MongoDB,Cassandra)

  • Python,Scala

  • Version Control(Git,Bitbucket)

  • Tableau, Grafana

  • Ml Models(Arima,Regression, Classification)

  • Microsoft Fabric

Additional Information

  • Attended and Participated Data driven meetups, talk shows.
  • Got opportunity to attend Data driven events in organizations like Atlassian, LinkedIn.

Timeline

Senior Data Platform Engineer

Numentica(client - Fox Tech)
02.2024 - Current

Senior Data Engineer

Cloudwick Technology (a Product Based Company)
11.2021 - 01.2024

Data Platform Engineer

Swiggy
04.2021 - 11.2021

Data Engineer

UST Global
12.2018 - 02.2021

Senior Programmer

Epsilon( A Publicis Company)
04.2016 - 04.2018

Software Engineer

Harman ( A Samsung Company)
12.2014 - 04.2016

Technical Specialist

IBM
02.2014 - 12.2014

Production Analyst

Theorem(Client:-Epsilon)
01.2011 - 02.2014

B.Tech (Computer Science Engg) - Computer Science

Biju Pattnaik University of Technology
08.2005 - 09.2009
Soumyakanta RathData Engineer/Data Platform Engineer