4.2+ years of Software development expertise, specializing in data analytics solutions for real-world business challenges. Proficient in handling complex legal and manufacturing documents.Skilled in the complete software development lifecycle, delivering user-centric solutions within tight timelines. Experienced with demanding technologies like PySpark, AWS services, and Python.
Overview
4
4
years of professional experience
Work History
Clinical Hub for Adverse Event Reporting Solution
Saama Technologies India pvt
Pune
12.2023 - Current
The CHAERS solution is responsible for automating the Adverse event reporting for USMA managed outsourced studied that are not supported by internal AERO safety reporting pipeline.
Designed the architecture of this module, based on budget to determine which services must be used or not.
Designed and implemented scalable data pipelines using AWS Glue jobs.
Built ETL workflows using Step functions and loaded data from various sources into target data warehouses
Written Python and Pyspark Scripts to get data from API and dump into the tables and Athena is used for data validations.
Processed XML and JSON files in Pyspark.
Developed CI/CD to copy code to S3 bucket from Gitlab.
Automate the deployment in different environment using AWS cloudformation.
Worked documentation part like data dictionary.
Project : Genentech PCT Module
Saama Technologies India pvt
Pune
05.2023 - 11.2023
Designed and implemented scalable data pipelines using PySpark, processing large volumes of data in the Life Science domain for analytics and reporting purposes.
Built ETL workflows using AWS Glue and loaded data from various sources into target data warehouses.
Developed and maintained data ingestion processes from different data sources, ensuring data quality and consistency. Files given by Product Owners are placed on Amazon S3 location. We check their format and columns and validate data to ensure correct processing.
Airflow DAG is used for pipelining jobs. Business logic is implemented using Pyspark and Amazon EMR is used for clustering purposes
Athena is used for data validations. Collaborated with cross-functional teams, including data scientists and business analysts, to understand data requirements and deliver effective solutions.
Collaborated with cross-functional teams, including data scientists and business analysts, to understand data requirements and deliver effective solutions and maintain data governance standards.
Project : Smart Data Quality
Saama Techonogies India pvt
Pune
07.2022 - 04.2023
Automating and accelerating data management processes.
With SDQ, data discrepancies are automatically identified as they are captured, reducing time to issue a query from over 25 days to under 2 days.
Flask API development for CRUD operations on postgres SQL tables.
Written testcases for flask API testing using postman integration with Jenkins.
Project : Deep Learning Intelligent Assistant
Saama Technologies India pvt
Pune
04.2021 - 06.2022
DALIA is an AI-based assistance that provides easy to use , content and domain -aware conversational experiences with key data and insights from saama's award winning Life Science Analytics Cloud(LSAC)
Developed and deployment chatbot on the server
To implement entity automatic complex query generation part for different questions and intents
Previously worked for single intent then Developed program for multiple intent
To optimize the code to reduce response time and implement entity auto suggestion functionality
Developed a way to Maximum 100 user can be process per sec.
Project : CDH
Saama Technologies India pvt
Pune
01.2020 - 03.2021
Clinical Data Hub is designed to handle every task related to data generated in clinical trials, from registering the study to ensuring the results conform to the CDISC standards
Developed automate the pipeline for converting Raw data to standard clinical format
Created flask API for Collecting source data information from s3 and sftp source
Minimised the reading and loading data time into tables by using dask and modin dataframes
Written data cleaning python script using pandas and store the data into target layer
Modularizing the existing code as part of enhancement using python oops concepts
And written test cases to eradicate bugs and inconstancies in the code.
Education
Post Graduation Diploma - Big-Data Engineer
CDAC
Pune
2019
Bachelor of Engineering - Electronic and tele communication
PVG College of Engineering
Nashik, MH
2018
Skills
Python
Pandas
Flask API
Postgres
SQL
Pyspark
AWS Services
GitHub
Postman
Jenkins
Accomplishments
Received 'Shining Star of the Month' for June 2022 and Oct 2023
Timeline
Clinical Hub for Adverse Event Reporting Solution
Saama Technologies India pvt
12.2023 - Current
Project : Genentech PCT Module
Saama Technologies India pvt
05.2023 - 11.2023
Project : Smart Data Quality
Saama Techonogies India pvt
07.2022 - 04.2023
Project : Deep Learning Intelligent Assistant
Saama Technologies India pvt
04.2021 - 06.2022
Project : CDH
Saama Technologies India pvt
01.2020 - 03.2021
Post Graduation Diploma - Big-Data Engineer
CDAC
Bachelor of Engineering - Electronic and tele communication
Associate software Engineer(Data Analyst) at Saama Technologies India Pvt LtdAssociate software Engineer(Data Analyst) at Saama Technologies India Pvt Ltd