Absolutely! Let’s walk through a complete, step-by-step tutorial to help you understand MLflow on Databricks from start to finish, using revised and working code.
We’ll cover all key components of MLflow: ✅ Tracking
✅ Models
✅ Model Registry
✅ Signature + Input Example
🚀 Objective:
Train a classification model on the Iris dataset, log everything with MLflow, and register the model.
✅ Step-by-Step MLflow Lab on Databricks
🎯 Step 1: Setup (Install Required Libraries)
Run this in a cell:
%pip install scikit-learn pandas mlflow
🎯 Step 2: Import Libraries and Load Data
import pandas as pd
from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import mlflow
import mlflow.sklearn
from mlflow.models.signature import infer_signature
Code language: JavaScript (javascript)
🎯 Step 3: Prepare the Data
# Load Iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target
# Split into training and testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
Code language: PHP (php)
🎯 Step 4: Start MLflow Run and Train Model
# Start MLflow experiment run
with mlflow.start_run() as run:
# Train the model
model = RandomForestClassifier(n_estimators=100, max_depth=5, random_state=42)
model.fit(X_train, y_train)
# Make predictions and calculate accuracy
predictions = model.predict(X_test)
acc = accuracy_score(y_test, predictions)
# Log parameters and metric
mlflow.log_param("n_estimators", 100)
mlflow.log_param("max_depth", 5)
mlflow.log_metric("accuracy", acc)
# Create sample input and signature
input_example = X_test[:5]
signature = infer_signature(X_train, model.predict(X_train))
# Log model with signature and input example
mlflow.sklearn.log_model(
sk_model=model,
artifact_path="iris_rf_model",
input_example=input_example,
signature=signature
)
# Save run_id for model registration
run_id = run.info.run_id
print(f"Run ID: {run_id}")
print(f"Accuracy: {acc}")
Code language: PHP (php)
🎯 Step 5: Register the Model
Paste this in a new cell:
model_uri = f"runs:/{run_id}/iris_rf_model"
# Register the model under a name
model_details = mlflow.register_model(
model_uri=model_uri,
name="IrisClassifierModel"
)
Code language: PHP (php)
✅ Now go to “Models” tab in Databricks, and you’ll see
IrisClassifierModelwith versioning.
🎯 Step 6: Promote the Model (via UI)
Go to:
- Models > IrisClassifierModel
- Click on the version (e.g., Version 1)
- Click
Stage→ ChooseStagingorProduction
# To Promore Version1 to Production
from mlflow.tracking import MlflowClient
client = MlflowClient()
client.transition_model_version_stage(
name="IrisClassifierModel",
version=1, # or the actual version you created
stage="Production" # or "Staging", "Archived"
)Code language: PHP (php)
OR
# Assign an alias (like @production) to the version
from mlflow.tracking import MlflowClient
client = MlflowClient()
client.set_registered_model_alias(
name="IrisClassifierModel",
alias="production",
version=1
)
Code language: PHP (php)
# Run this to list your registered models:
client = MlflowClient()
models = client.list_registered_models()
for m in models:
print(m.name)Code language: PHP (php)
🎯 Step 7: Load Model from Registry and Predict
Option A: If your model is in “Production” stage (via UI):
Now that the model is registered and staged, let’s load it and use it:
from mlflow.pyfunc import load_model
# Load model from registry using stage name
model = load_model("models:/IrisClassifierModel/Production")
# Predict
preds = model.predict(X_test)
print(preds)Code language: PHP (php)
Option B: If you used an alias (e.g., production) instead of a stage:
from mlflow.pyfunc import load_model
# Load model using alias
model = load_model("models:/IrisClassifierModel@production")
# Predict
preds = model.predict(X_test)
print(preds)Code language: PHP (php)
🎉 You Did It!
| MLflow Component | Covered ✅ |
|---|---|
| Tracking | Yes (log params, metrics, artifacts) |
| Models | Yes (logged with signature + input example) |
| Model Registry | Yes (registered and promoted via UI) |
| Serving/Loading | Yes (loaded from registry and used for predictions) |
🧠 Bonus Tips:
- Want to serve the model as REST API? MLflow on Databricks supports model serving.
- Want to run this as a Project? We can convert this into an MLflow Project next.
Let me know if you want this exported as a .dbc Databricks Notebook or want to integrate with CI/CD!
Objective: Serve Your MLflow Model via REST API on Databricks
Absolutely! Since you’ve already registered the model and assigned an alias like production, you’re ready to serve the model using Databricks Model Serving.
Let’s go through this step-by-step, assuming you’re using a Databricks Trial or Paid workspace (not Community Edition).
🚀 Objective: Serve Your MLflow Model via REST API on Databricks
We’ll go from registered model with alias ➜ to real-time REST endpoint ➜ to making predictions from code.
✅ Step 1: Confirm Model Registration & Alias
You’ve already done this, but here’s a quick reference for future:
from mlflow.tracking import MlflowClient
client = MlflowClient()
# Register model (already done)
model_uri = f"runs:/{run_id}/iris_rf_model"
client.register_model(model_uri=model_uri, name="IrisClassifierModel")
# Set alias
client.set_registered_model_alias(
name="IrisClassifierModel",
alias="production",
version=1
)
Code language: PHP (php)
✅ Step 2: Enable Model Serving from Databricks UI
- Go to Databricks Workspace.
- In the left sidebar, click “Models”.
- Click on
IrisClassifierModel. - Click on Version 1 (or the version you aliased).
- You should see a “Serving” or “Enable Serving” button.
- Click it, then:
- Choose Real-time serving
- Click Start serving
✅ Once serving is enabled, you’ll see the endpoint URL (copy it!).
✅ Step 3: Use the REST Endpoint for Predictions
Here’s a full Python example to send test data and get predictions:
import requests
import json
# Replace with your actual endpoint from Databricks
url = "https://<your-databricks-instance>/serving-endpoints/IrisClassifierModel/invocations"
# If needed, generate a Personal Access Token from Databricks User Settings
token = "dapiXXXXXXXXXXXXXXXXXXXX"
# Headers
headers = {
"Authorization": f"Bearer {token}",
"Content-Type": "application/json"
}
# Input payload (match your model’s input structure)
data = {
"dataframe_split": {
"columns": ["sepal length (cm)", "sepal width (cm)", "petal length (cm)", "petal width (cm)"],
"data": [[5.1, 3.5, 1.4, 0.2]]
}
}
# Send request
response = requests.post(url, headers=headers, json=data)
# Print response
print("Prediction:", response.json())
Code language: PHP (php)
✅ Step 4: Test It!
Run the above Python code in:
- Databricks notebook
- Jupyter notebook
- Any Python script
🔐 Generate a Personal Access Token (If Needed)
- Click on your profile icon in the top-right corner of Databricks.
- Go to “User Settings” > “Access Tokens”
- Click Generate New Token
- Copy it and use it in your
tokenvariable
🧪 Example Output:
Prediction: [0]
Code language: CSS (css)
This means it predicted class 0 (e.g., Setosa for Iris dataset).
🧠 Summary of Steps:
| Step | Action |
|---|---|
| ✅ 1 | Register model and set alias (production) |
| ✅ 2 | Enable model serving in Databricks UI |
| ✅ 3 | Copy REST endpoint URL |
| ✅ 4 | Send test prediction via Python using requests |
Would you like me to generate a ready-to-run notebook (.dbc) with this entire process? Or help you test it directly with your live Databricks instance?
I’m a DevOps/SRE/DevSecOps/Cloud Expert passionate about sharing knowledge and experiences. I have worked at Cotocus. I share tech blog at DevOps School, travel stories at Holiday Landmark, stock market tips at Stocks Mantra, health and fitness guidance at My Medic Plus, product reviews at TrueReviewNow , and SEO strategies at Wizbrand.
Do you want to learn Quantum Computing?
Please find my social handles as below;
Rajesh Kumar Personal Website
Rajesh Kumar at YOUTUBE
Rajesh Kumar at INSTAGRAM
Rajesh Kumar at X
Rajesh Kumar at FACEBOOK
Rajesh Kumar at LINKEDIN
Rajesh Kumar at WIZBRAND