As a Data Engineer having a proven track record of transforming raw data into valuable assets. Expertise includes developing efficient data pipelines, implementing data warehouses, managing databases, and migrating systems to the cloud. Additionally, I posses strong skills in data analytics, SQL, and Power BI, enabling to extract meaningful insights and create compelling visualizations.
Minimizing CO2 production from Blast Furnace using Machine Learning, Metallurgical and Materials Engineering, IIT Roorkee, 01/2018, 03/2018, Predicting CO2 production based on various factors like Temperature, Pressure, input materials, etc. Attempt to minimize its production. Conversion Rate Prediction for Ads, Data Science Intern | InMobi, 05/2018, 07/2018, Developed new features for CVU(conversion rate) prediction by using matrix factorization method and app ownership profile for a user. Created a 9-bit unique profile for millions of users by analysing there app usage pattern and used that profile for a targeted ad experience. Pre-processing and analysing many TB’s of data(including data from all the ad networks like Google, Facebook etc) with the help apache spark(Big Data language) and applied different machine learning models on it using ML lib. Worked on the US region data and analysed the user response to a particular ad on the basis of age group, handset version, demand market id etc. Tech-Stack: Spark, Scala and Python. Established the need for a recommendation system for Glance(An InMobi Product), May 2018 - July 2018, 05/2018, 07/2018, Analysed the state wise user news viewing pattern for Samsung and Gionee handset users. Created a categorical news viewing experience from the analysed data of past usage. Tech-Stack: Spark, Scala and Python. XTRAC Datalake, Data CoE Fidelity Investments, 01/2020, Present, Developed the architecture for Fact and Dimension model in snowflake. Wrote queries to populate the Raw and Prepared zone. Implemented tokenization of data in while data movement and during storing of data. Build a generic function for load test and performance testing to calculate latency. Optimization of queries which lead to data movement between landing and prepared within a minute. Created a framework to check data loss between source and snowflake. Build an automatic which helps in check validity of data between source and snowflake at regular interval of time. Onboarded multiple BU’s on xtrac datalake such as WI, FILI and FI/PI. Implementation of mapping logic to map workitem data with workitem history data. Olympus Datalake, FDA Fidelity Investments, 02/2021, Present, Developed the queries for data movement to prepared zone. Developed the python code for data movement which was scheduled using airflow dags. Data Quality Framework, FDA Fidelity Investments, 01/2021, 10/2021, Implementation of data quality framework and wrote the data quality check which runs on everyday basis. Developed the dashboard for data quality in Power Bi which consumes data from snowflake on CDC basis.