Spearheaded team of 5+ Data Engineers, Data Scientists, and Power BI Developers for building Machine learning models to predict prices for inland pricing using ML algorithms such as Regression, clustering and various ensemble techniques.
Defined and established new packages algorithms implement a ML technique using mathematical formulae
Successfully deployed a recommender system for pricing.
Key role in implemented models using various ML Techniques including linear regression, enhancing the existing packages, clustering techniques such as k-mean, dB scan, and Hierarchical clustering of Geo data.
· Implemented a novel approach using “weighted constrained linear regression model”. Submitted this paper which was selected for presenting at MLDS conference 2023 and published in the journal.
· Explored and implemented models using AutoML packages such as MLjar and FLAML.
· Implemented online learning model to retrain the models.
· Deployed the model in production using pickel.
· Building Data visualization using powerBI.
Working with business and product teams to discuss the business problems and create opportunities for using data science techniques to enable effective decision making process
[M1]Responsibilities can be presented more efficiently using action verbs like ‘Managed, and Executed, adding more value to what you do and have done professionally. You can also add few of your achievements separately after ‘responsibilities’ as ‘selected accomplishments’ to efficiently present and showcase your accomplishments to the recruiter.
Mentor and lead team to build cost effective and time efficient ETL pipelines using spark data bricks. Responsible for Analysis, design, development and implementation of solutions using big data spark for data analytics and work with the product team.
Work on various file systems such as hive tables, sql tables, blobs, delta tables Work with the business and leadership teams to present the models for cost savings.
Ensure the end to end implementation which includes data persistence, data flow and analytics
Part of data architecture team for decision making process on Tech stack, cluster size mentored and trained data enthusiasts to build community of practice team for data engineering.
Site leader for arity Data Analytics and Data science team at Bengaluru.
Design and build longitudinal view of the trips data which could be leveraged by the Data scientists to test their model.
Design and build data flow process using spark for persisting the trips data and enrich them with user details from different sources like Cassandra, Kafka and S3.
Design and build data warehouse to read data from Kafka topics , S3 and create SCD tables on PostgreSQL using spark framework .Used spark (Scala) to build ETL pipelines which included reading data from various sources ( Kafka, PostgreSQL, Cassandra) , transform them using aggregations performing SCD implementations for PostgreSQL tables.
Used Apache Flink for real time streaming of data and perform real time aggregation and loading data to Kafka topics.
Used ELK stack for real time analytics, for reading the data from Kafka topics, process it using log-stash and write it to elastic search. Built Kibana dash boards on the ES index
Worked with data formats like json, Avro, parquet and ORC Built POC to deploy Scala spark jobs using Apache airflow.
Dockerised a python code to read data from external sources, transform them and create aggregated data Worked on building PCF applications for continuous streaming of data
Technical Lead involved in Data analysis design and development of Abinitio graphs, HIVE tables and spark code.
Lead Developer involved in Data analysis design and development of abinitio graphs
Worked as onshore co ordinator for 2 years - TD Bank, Mississauga , Canada for
Abinitio Developer
Unix shell programming
Microsoft Certified :Azure Fundamentals AZ 900
Microsoft Certified :Azure Fundamentals AZ 900