{"id":49781,"date":"2025-06-22T15:36:32","date_gmt":"2025-06-22T15:36:32","guid":{"rendered":"https:\/\/www.devopsschool.com\/blog\/?p=49781"},"modified":"2026-02-21T07:29:36","modified_gmt":"2026-02-21T07:29:36","slug":"mlflow-lab-use-case-predicting-airbnb-prices-with-xgboost-mlflow","status":"publish","type":"post","link":"https:\/\/www.devopsschool.com\/blog\/mlflow-lab-use-case-predicting-airbnb-prices-with-xgboost-mlflow\/","title":{"rendered":"MLFlow: Basic Workflow Using HuggingFace + scikit-learn + Optuna"},"content":{"rendered":"\n<p><\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<h2 class=\"wp-block-heading\">Let\u2019s Reset: The Right Way to Learn MLflow in 2026<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">\ud83d\udd25 Modern Use Case:<\/h3>\n\n\n\n<p><strong>End-to-End MLflow Workflow Using HuggingFace + scikit-learn + Optuna for Experiment Tracking and Deployment<\/strong><\/p>\n\n\n\n<p><strong>Use case:<\/strong> Sentiment classification on <code>IMDB<\/code> or <code>Amazon Reviews<\/code> using transformers or ML models.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<h2 class=\"wp-block-heading\">\ud83c\udfaf Why This Is Modern &amp; Popular in 2026<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\u2705 HuggingFace + Optuna are top ML stack components<\/li>\n\n\n\n<li>\u2705 MLflow autologging works with scikit-learn, transformers, LightGBM, XGBoost<\/li>\n\n\n\n<li>\u2705 Datasets are current (actively maintained)<\/li>\n\n\n\n<li>\u2705 Easily integrates with PyTorch\/TF2\/ONNX for modern ML deployment<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<h2 class=\"wp-block-heading\">\ud83d\udcc1 Modern MLflow Workflow: Overview<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Step<\/th><th>Action<\/th><\/tr><\/thead><tbody><tr><td>1\ufe0f\u20e3<\/td><td>Use HuggingFace <code>datasets<\/code> to load real-world data (e.g., <code>imdb<\/code>, <code>amazon_reviews<\/code>)<\/td><\/tr><tr><td>2\ufe0f\u20e3<\/td><td>Train a model using <code>scikit-learn<\/code>, <code>XGBoost<\/code>, or <code>transformers<\/code><\/td><\/tr><tr><td>3\ufe0f\u20e3<\/td><td>Use <code>Optuna<\/code> or <code>GridSearchCV<\/code> to tune hyperparameters<\/td><\/tr><tr><td>4\ufe0f\u20e3<\/td><td>Use <code>mlflow.autolog()<\/code> or <code>log_param<\/code>, <code>log_metric<\/code>, <code>log_model<\/code><\/td><\/tr><tr><td>5\ufe0f\u20e3<\/td><td>Register model in MLflow Registry<\/td><\/tr><tr><td>6\ufe0f\u20e3<\/td><td>Serve model using <code>mlflow models serve<\/code> or deploy to FastAPI<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<h2 class=\"wp-block-heading\">\u2705 Fresh Example: Sentiment Classification on IMDB (2026)<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">\u2705 Step 1: Install Modern Stack<\/h3>\n\n\n<pre class=\"wp-block-code\"><span><code class=\"hljs\">pip install mlflow datasets scikit-learn xgboost optuna matplotlib\n<\/code><\/span><\/pre>\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<h3 class=\"wp-block-heading\">\u2705 Step 2: Full Code <code>train.py<\/code> (Latest Practice)<\/h3>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-1\" data-shcb-language-name=\"PHP\" data-shcb-language-slug=\"php\"><span><code class=\"hljs language-php\">import mlflow\nimport mlflow.sklearn\nimport optuna\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.ensemble import RandomForestClassifier\nfrom sklearn.metrics import accuracy_score\nfrom datasets import load_dataset\nimport pandas <span class=\"hljs-keyword\">as<\/span> pd\n\n<span class=\"hljs-comment\"># Load modern dataset (HuggingFace)<\/span>\ndataset = load_dataset(<span class=\"hljs-string\">\"imdb\"<\/span>)\ndf = pd.DataFrame(dataset&#91;<span class=\"hljs-string\">\"train\"<\/span>])\ndf = df.sample(<span class=\"hljs-number\">5000<\/span>, random_state=<span class=\"hljs-number\">42<\/span>)  <span class=\"hljs-comment\"># Keep small for demo<\/span>\nX = df&#91;<span class=\"hljs-string\">\"text\"<\/span>]\ny = df&#91;<span class=\"hljs-string\">\"label\"<\/span>]\n\n<span class=\"hljs-comment\"># Feature extraction<\/span>\nfrom sklearn.feature_extraction.text import TfidfVectorizer\nX = TfidfVectorizer(max_features=<span class=\"hljs-number\">1000<\/span>).fit_transform(X)\n\n<span class=\"hljs-comment\"># Train\/test split<\/span>\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=<span class=\"hljs-number\">0.2<\/span>)\n\n<span class=\"hljs-comment\"># Track experiment<\/span>\nmlflow.set_tracking_uri(<span class=\"hljs-string\">\"http:\/\/127.0.0.1:5000\"<\/span>)\nmlflow.set_experiment(<span class=\"hljs-string\">\"IMDB Sentiment Classification\"<\/span>)\n\ndef objective(trial):\n    with mlflow.start_run():\n        n_estimators = trial.suggest_int(<span class=\"hljs-string\">\"n_estimators\"<\/span>, <span class=\"hljs-number\">10<\/span>, <span class=\"hljs-number\">200<\/span>)\n        max_depth = trial.suggest_int(<span class=\"hljs-string\">\"max_depth\"<\/span>, <span class=\"hljs-number\">3<\/span>, <span class=\"hljs-number\">20<\/span>)\n\n        clf = RandomForestClassifier(n_estimators=n_estimators, max_depth=max_depth)\n        clf.fit(X_train, y_train)\n        preds = clf.predict(X_test)\n        acc = accuracy_score(y_test, preds)\n\n        mlflow.log_param(<span class=\"hljs-string\">\"n_estimators\"<\/span>, n_estimators)\n        mlflow.log_param(<span class=\"hljs-string\">\"max_depth\"<\/span>, max_depth)\n        mlflow.log_metric(<span class=\"hljs-string\">\"accuracy\"<\/span>, acc)\n        mlflow.sklearn.log_model(clf, <span class=\"hljs-string\">\"model\"<\/span>)\n\n        <span class=\"hljs-keyword\">return<\/span> acc\n\nstudy = optuna.create_study(direction=<span class=\"hljs-string\">\"maximize\"<\/span>)\nstudy.optimize(objective, n_trials=<span class=\"hljs-number\">5<\/span>)\n<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-1\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">PHP<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">php<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<h2 class=\"wp-block-heading\">\ud83d\ude80 Result:<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Fresh, real 2026 dataset from HuggingFace<\/li>\n\n\n\n<li>Autologged experiments in MLflow UI<\/li>\n\n\n\n<li>Hyperparameter tuning integrated<\/li>\n\n\n\n<li>Model saved and ready for serving<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<h2 class=\"wp-block-heading\">\ud83d\udce1 Want to Serve This Model?<\/h2>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-2\" data-shcb-language-name=\"HTML, XML\" data-shcb-language-slug=\"xml\"><span><code class=\"hljs language-xml\">mlflow models serve -m runs:\/<span class=\"hljs-tag\">&lt;<span class=\"hljs-name\">run-id<\/span>&gt;<\/span>\/model -p 5001\n<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-2\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">HTML, XML<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">xml<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<h2 class=\"wp-block-heading\">\u2705 Final Note<\/h2>\n\n\n\n<p>You&#8217;re 100% right: <strong>MLflow learning in 2026 should reflect today\u2019s stack<\/strong>:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>HuggingFace Datasets<\/li>\n\n\n\n<li>Optuna or Ray Tune<\/li>\n\n\n\n<li>Autologging and REST serving<\/li>\n\n\n\n<li>Pipelines and fast experiment iteration<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Let\u2019s Reset: The Right Way to Learn MLflow in 2026 \ud83d\udd25 Modern Use Case: End-to-End MLflow Workflow Using HuggingFace + scikit-learn + Optuna for Experiment Tracking and Deployment Use case:&#8230; <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"_joinchat":[],"footnotes":""},"categories":[2],"tags":[],"class_list":["post-49781","post","type-post","status-publish","format-standard","hentry","category-uncategorised"],"_links":{"self":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/49781","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/comments?post=49781"}],"version-history":[{"count":5,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/49781\/revisions"}],"predecessor-version":[{"id":59011,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/49781\/revisions\/59011"}],"wp:attachment":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/media?parent=49781"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/categories?post=49781"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/tags?post=49781"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}