Accomplished Senior Machine Learning Engineer with overall experience of 11 years currently working at Tata Consultancy Services, skilled in architecting MLOps pipelines and optimizing cloud resources. Proven track record in deploying high-quality models with CI/CD practices, while mentoring teams to enhance collaboration. Expertise in Python and a strong commitment to continuous improvement drive impactful results in machine learning projects.
Implemented an end-to-end MLOps pipeline using Azure ML CLI v2 and Azure DevOps for environmental sensor anomaly detection
Designed CI/CD pipelines for continuous integration and deployment of ML models using DevOps/YAML pipeline. Implemented quality gates and testing frameworks for ML code and model validations.
Optimized Compute Resources: Implemented dynamic compute scaling in Azure ML clusters with min-max instance configuration, balancing performance needs, with cost efficiency during model training.
Enabled Continuous Improvement: Built MLflow-integrated pipelines that automatically track experiments, metrics, and model artifacts, facilitating model governance and performance analysis.
Maximized Reproducibility: Implemented versioned environments using Conda specifications and Docker containers, ensuring consistent execution across development and production.
Streamlined Deployment Workflows: Created automated deployment pipelines with traffic allocation controls, enabling safe model rollouts with zero-downtime updates to production endpoints.
Azure Resources Utilizes: Azure ML Studio, Compute Cluster, Online Endpoint, ACR, Blob Storage, Key Vault, DevOps, Pipeline, etc.
Built High-Performing Teams: Cultivated a DevOps mindset within engineering teams, mentoring and guiding junior engineers to adopt best practices in MLOps.
Implemented an end-to-end MLOps solution using AWS CDK and GitHub Actions that automated the full machine learning lifecycle
Leveraged AWS SageMaker for model training, evaluation, and registry, while implementing CI/CD pipelines through GitHub Actions workflows triggered by AWS Lambda functions.
Designed a template-based approach with standardized project structures for build and deploy repositories, enabling consistent ML workflows.
Incorporated infrastructure-as-code principles using AWS CDK (Python) to provision cloud resources including SageMaker endpoints, IAM roles, and S3 buckets.
Utilized XGBoost for model training with automated quality gates that conditionally register models to the SageMaker Model Registry.
Productionize MLflow tracking and deployment server on AWS.
To deploy and host an MLflow dashboard with a backend (tracking server, database, artifact store), integrate it with SageMaker for model deployment, and allow access by users and developers securely through AWS infrastructure.
Infra Provisioning: Implemented Infra provision via CloudFormation for deploying MLFLOW dashboard for model metrics monitoring for multiple project through single browser access. CDK toolkit to deploy a tech stack: ECS, ECR, VPC, ELB, S3, RDS, etc., as Infrastructure as Code. Designed to provide a serverless MLFLOW deployment using AWS Fargate with auto-scaling capabilities.
Secure Storage: Created S3 bucket for MLFLOW artifact storage with appropriate access controls.
ML workflow: Experiment runs tracking in Sagemaker using MLFLOW.
Model deployment: Created deployment pipeline from MLFLOW to sagemaker endpoints. Added model registry workflow for versioning and promotion and configuration of Sagemaker endpoints with auto-scaling for production traffic.
Security: Infra setup is backed up with Secure VPC architecture with public, private and isolated subnets, load balancer for high availability and fault tolerance, and security groups.
Enhanced Collaboration: Visibility on experiments run across teams and standardized workflow with reduced dependency on individual team member.
Automated Risk Classification System Using NLP for HSSE Compliance.
Developed an ML-powered solution to automate the classification of HSSE-related incidents using textual data, improving consistency and response time across the QGC production site.
Trained multiple NLP models (TF-IDF + Logistic Regression, BERT, etc.) And selected the best model based on evaluation metrics.
Integrated the deployed model with Power BI to expose predictions and enable real-time risk dashboards for stakeholders.
Collaborated with the team to enhance the machine learning pipeline.
Model Improvement for Object Detection Using YOLOV4 and evaluations.
Predictive modelling on Clinical EHR data
Early Incident Detection
EM Metrics Analysis
Remote Infrastructure Management System