Pyspark, sql, panda,numpy


Results-driven Lead Data Engineer and Data Science/Machine Learning Engineer specializing with 10+ years of experience in building scalable data pipelines and Lakehouse architectures. Developed and deployed production ML/AI solutions on Azure, leveraging expertise in Azure Databricks, Azure Data Factory, and MLOps. Delivered analytics and business insights through effective collaboration and innovative problem-solving.
Tools
Spark SQL, Databricks, PySpark, Confluence, JIRA, ADF, Azure Synapse, ADF pipeline, Logic Apps, Docker, GitHub,
Vercel, Railway, Claude, Codex, OpenAI, n8n pipeline integration, SQL, Power BI, MS PowerPoint, Minitab, MS Word,
Trend Analysis, Data Warehousing, Advanced Excel, ETL Processes, Data Modeling, Variance Analysis, Data Mining,
Natural language processing
Feature engineering
ML Model deployment
Clustering algorithms
Random forests
Decision trees
Statistical modeling
Data analytics
Data exploration
Dimensionality reduction
Support vector machines
K-nearest neighbors
Agile methodologies
Reinforcement learning
Unsupervised learning
Neural networks
Ensemble methods
Semi-supervised learning
Big data analytics
Probabilistic models
Supervised learning
Optimization techniques
Bayesian inference
Gradient boosting machines
Time series analysis
Data storage
Predictive modeling
Data cleaning
DP203
Pyspark, sql, panda,numpy
Photography