- A Data Engineer professional with a mix of technical and business skills with over 8 years of rich experience in data warehousing using big data technologies , sound knowledge of ETL process , building data engineering solutions and providing data-driven insights to Fortune 500 Global Pharmaceutical clients.
- Experienced in AWS technologies/services (Hive, Redshift etc), Python/Pyspark, Databricks, EKS and Airflow.
- Experience in Agile/Scrum and waterfall methodology. Experience in working with bitbucket, confluence , Jira tools. Involved in requirement gathering, design, development, testing and implementation phases of the projects
- Currently working with Axtria as Senior Manager assuming role of Data Engineering Lead.
1. Python from GreatLearning
https://olympus1.greatlearning.in/course_certificate/HRHBBXWB
2. Redshift Development from Udemy
https://www.udemy.com/certificate/UC-604decc5-74cf-4df4-80e0-4b96a3f3474e/
3. Advanced DataIku Designer
https://verify.skilljar.com/c/z3i6nrennyi6
- National Level Chess Player in High School. Bagged Third position in State tournaments
- Bagged second position in college Dramatics.
- Awarded Bravo award in Axtria Q3'2019 for great quality work.
Senior Manager | Axtria India Pvt Ltd.
July 2017- Present
Major Projects
One Data platform - Various Datasets spanning from different vendors and pipelines are streamlined into one platform for the end-users. Allows cross-utilization and reduces turn-around time for the users as it provides a centralized repository of data products.
Role : Responsible for design and synthesis of data products (ideate the data products from the raw data) to cater specific business needs. It involved multiple stakeholder collaboration for requirement gathering and then codifying it into a form of Analytics ready Dataset using Pyspark as the base on top of EKS clusters to support large volumes of data.
Optimization Track - Led the track as an Operations Lead to reduce the overall execution time of the heavy processes and hence allowing room for more validations to be carried out without breaching agreed SLAs.
JET Wave 2 - JET is a data science model designed to enhance the experience of sales rep and healthcare professional's partnership. The model generates personalized suggestions for the sales rep to target the HCP based on various attributes such as specialty to make HCPs aware of the latest products that will eventually make a rise in revenue.
Role : Responsible as a data engineer lead to communicate with different stakeholders to gather requirements, design and develop the pipelines for data engineering processes using Pyspark and dataiku.
And for a parallel stream for Medical Data Processing, build pipelines using Kedro Viz and Argo Scheduler
Data/Technology Migration Projects
Talend Move Out - The project was to move the existing ETL jobs from licensed tool talend to pyspark using Databricks. Involved meta-data driven approach to execute jobs with any custom setting.
Role : Responsible to understand already built complex Talend code of pharmaceutical sales and customer data and developed robust Pyspark scripts which led to overall optimization as well. With this migration , overall time was reduced and it led to easier debugging and bug fixes.
Source Data Switch - Analyzed existing code/data to assess the impact of migration of source data from one vendor to other and accordingly formulate the changes in the system with minimum impact. And created client ready presentations.
Data Ingestion, Reporting and Quality Assurance
Data Quality Reports- Scope was to migrate the existing excel based reports into tableau and create new dashboards , such as to visualize large reports and providing users a good interface.
Role : Worked on creating tableau dashboards which were spawned across many areas such as customer, product and sales etc. And worked on developing Metadata driven reports using Hive and Tableau to report critical issues in the sales and marketing data.
Data Ingestion and Quality Checks