Azure AI Custom Vision Tutorial: Architecture, Pricing, Use Cases, and Hands-On Guide for AI + Machine Learning

1. Introduction

Azure AI Custom Vision is an Azure AI + Machine Learning service for training custom image classification and object detection models using your own labeled images—without needing to build a full ML pipeline from scratch.

In simple terms: you upload images, tag what’s in them (or draw boxes around objects), train a model, test it, and then deploy it behind an API endpoint (or export it to run on edge devices). This is useful when “generic” computer vision doesn’t recognize your domain-specific items (your products, your parts, your defects, your brand packaging, your lab samples).

Technically, Azure AI Custom Vision provides managed training and prediction capabilities. You create a Custom Vision project, upload and label images, train “iterations,” evaluate metrics (precision/recall), and publish a model to a prediction endpoint. You can then call the endpoint from apps and automation, or export certain model types to run offline (for example, on mobile/IoT/edge) depending on the chosen domain and model type.

It solves the problem of building reliable, repeatable image recognition for specialized scenarios—like detecting defects in a manufacturing line or classifying product variants—without needing a dedicated ML platform team, GPU cluster management, or bespoke training scripts for every iteration.

Naming note (important): Microsoft has rebranded many “Cognitive Services” offerings under Azure AI services. The service is commonly referred to as Azure AI Custom Vision in current documentation and the portal experience, while older materials may call it “Custom Vision” or “Custom Vision Service.” This tutorial uses Azure AI Custom Vision as the primary, exact name and calls out legacy terms only when helpful for navigation.

2. What is Azure AI Custom Vision?

Official purpose: Azure AI Custom Vision helps you build custom computer vision models for image classification and object detection using labeled training images, and then deploy them for inference via API or export formats (where supported).

Core capabilities

Image classification
Multiclass (one label per image)
Multilabel (multiple labels per image)
Object detection
Detect and localize objects with bounding boxes
Model training and iteration management
Train multiple iterations, compare performance, choose the best
Evaluation metrics
Precision, recall, and probability thresholds
Deployment options
Hosted prediction endpoint
Model export for certain scenarios (for example, “compact” domains), when supported in your project configuration (verify availability in official docs/portal)

Major components

Custom Vision resources in Azure
Typically separated into:
- Training resource (to train models)
- Prediction resource (to host prediction endpoints)
The exact resource “kind”/creation flow can vary by portal UX updates; verify in official docs if your portal differs.
Custom Vision portal
Web UI to create projects, upload images, label/tag, train, evaluate, and publish
REST APIs / SDKs
Programmatic training, dataset upload, iteration management, and prediction calls

Service type

Managed AI service (PaaS-like): you don’t manage VMs/containers/GPUs for training in the typical hosted workflow.
Can support edge/offline scenarios via export, depending on project/domain settings and supported formats.

Scope: regional vs global, and what’s “scoped” to what

Azure resource scope: created in a specific Azure region and attached to a subscription and resource group.
Project scope: Custom Vision projects are associated with your training resource and exist logically within the Custom Vision service. Access is controlled through portal access and resource keys, plus any supported Azure identity integrations (verify current auth options in official docs).
Endpoints: prediction endpoints are regional and tied to the prediction resource.

How it fits into the Azure ecosystem

Azure AI Custom Vision is commonly used alongside: – Azure Blob Storage (store training images, inference images, outputs) – Azure Functions / Container Apps / App Service (build APIs and automation around model inference) – Azure IoT Edge (edge inference when exporting models/containers is supported) – Azure Monitor (operational monitoring of dependent app services; direct service metrics/logging depend on the resource capabilities—verify diagnostic support in your region/SKU) – Azure DevOps / GitHub Actions (CI/CD for apps that consume the model; model lifecycle can be integrated using APIs)

Official docs entry point: https://learn.microsoft.com/azure/ai-services/custom-vision-service/

3. Why use Azure AI Custom Vision?

Business reasons

Faster time-to-value than building a bespoke CV training pipeline.
Lower barrier for teams without deep ML expertise.
Domain specialization: recognize your specific SKUs, parts, packaging, or defect types.
Iterate quickly with new training images as your environment changes.

Technical reasons

Managed training workflow with iterations and built-in evaluation metrics.
Simple deployment via hosted prediction API.
Supports both classification and detection, covering many practical computer vision needs.
SDK/REST automation for repeatable training and deployment.

Operational reasons

Clear separation between training and inference (common enterprise requirement).
Model versioning via iterations helps operationalize rollbacks and A/B testing (implemented by deploying different published iterations or different project endpoints).
Reduced infra management compared to self-managed GPU training.

Security/compliance reasons

Runs within Azure’s compliance boundary for Azure AI services (always validate your specific compliance requirements against Microsoft’s official compliance documentation for your tenant/region).
Supports standard Azure security patterns: resource keys/secrets, RBAC around resource management, and private networking options for some AI services (verify whether Private Link is supported for your exact Custom Vision resources and region).

Scalability/performance reasons

Hosted inference scales based on the service capabilities/SKU (exact scaling behavior is SKU- and region-dependent).
Export/edge options (when available) can reduce latency and bandwidth cost by running inference locally.

When teams should choose it

Choose Azure AI Custom Vision when: – You need custom classification/detection for a narrow domain. – You have labeled images (or can label them). – You want managed training and easy API deployment. – You prefer a productized workflow over building everything in Azure Machine Learning.

When teams should not choose it

Avoid or reconsider Azure AI Custom Vision when: – You need advanced architectures, full control over training code, or complex multi-stage pipelines (consider Azure Machine Learning). – You require segmentation (pixel-level masks) rather than bounding boxes and labels (Custom Vision focuses on classification/detection; verify current feature set if segmentation is required). – Your environment needs strict offline-only with guaranteed export format availability (export support depends on project settings/domains; validate before committing). – You have extremely large-scale datasets requiring bespoke data engineering and distributed training control.

4. Where is Azure AI Custom Vision used?

Industries

Manufacturing (quality inspection, defect detection)
Retail/e-commerce (product recognition, shelf compliance)
Logistics (package type identification, label presence checks)
Healthcare/life sciences (lab sample classification—subject to regulatory constraints)
Agriculture (crop disease classification, pest detection)
Construction (PPE detection, site safety checks)
Automotive (part verification, damage detection)
Security (object detection for restricted items—ensure lawful, ethical use)

Team types

Application development teams integrating AI into products
DevOps/platform teams operationalizing endpoints and deployments
Data labeling teams supporting model improvements
Innovation/PoC teams validating feasibility
SRE/operations teams monitoring production services

Workloads and architectures

Mobile apps calling hosted prediction endpoints
Edge camera systems performing local inference (if export supported)
Server-side batch inference for image archives (e.g., nightly classification)
Real-time inspection pipelines connected to camera feeds (often via an intermediate service that extracts frames and calls the prediction API)

Real-world deployment contexts

Production: stable iteration publishing, controlled access keys, monitoring/alerting, staged rollouts
Dev/test: frequent retraining, experimental tagging strategies, limited images, free tier where possible

5. Top Use Cases and Scenarios

Below are realistic scenarios where Azure AI Custom Vision is a good fit.

1) Manufacturing defect detection (object detection)

Problem: Identify scratches, dents, missing components on a production line.
Why this service fits: Object detection supports locating defects/parts; iterative training improves accuracy over time.
Example: A camera takes images of assembled units; a service calls the Custom Vision prediction endpoint to flag missing screws.

2) Product variant classification (multiclass classification)

Problem: Distinguish between visually similar SKUs (size, label, color variants).
Why this service fits: Custom classification learns subtle visual differences from your real product images.
Example: Warehouse app classifies a box as “Model A / Model B / Model C” to reduce mis-picks.

3) Shelf compliance in retail (object detection)

Problem: Detect if required products are present on a shelf and placed correctly.
Why this service fits: Object detection with bounding boxes can identify items and count them.
Example: Field staff use a mobile app to photograph shelves; backend checks planogram compliance.

4) PPE detection for safety checks (object detection)

Problem: Detect helmets/vests/gloves in controlled work zones.
Why this service fits: Object detection works well when PPE items are visually distinct and images represent real site conditions.
Example: Site gate camera triggers an alert if a worker is missing a helmet.

5) Food sorting and grading (classification)

Problem: Classify produce into grades (A/B/C) based on appearance.
Why this service fits: Custom image classifiers can learn domain-specific visual cues.
Example: A packing facility classifies apples into grade bins based on camera snapshots.

6) Document/photo triage (classification)

Problem: Sort incoming photos into categories (receipt, invoice, ID, other).
Why this service fits: Quick custom classification can route content to specialized downstream processors.
Example: A customer support portal sorts attachments before sending them to OCR or human review.

7) Brand/logo detection (object detection)

Problem: Detect whether a logo is present and where it appears.
Why this service fits: Object detection finds logos even when small/rotated (within limits of training data).
Example: Marketing team checks if partners used correct logo placement in store photos.

8) Equipment state recognition (classification)

Problem: Identify if a machine is “on/off/error” based on indicator lights.
Why this service fits: Custom classification learns indicator patterns from your exact device models.
Example: A maintenance app classifies panel images to create incident tickets automatically.

9) Waste sorting assistance (classification/detection)

Problem: Identify recyclable vs non-recyclable items (or detect contaminants).
Why this service fits: A custom model trained on local waste streams improves relevance.
Example: Kiosk app helps users sort items by photographing them.

10) Visual regression testing in software/hardware QA (classification)

Problem: Detect changes in UI screenshots or physical assembly images.
Why this service fits: Custom classification can flag “expected” vs “unexpected” states with curated training sets.
Example: QA pipeline classifies screenshots into “pass/fail” categories for fast triage (note: specialized visual diff tools may be better; Custom Vision can complement).

11) Species/pest detection for agriculture (object detection)

Problem: Detect specific pests on sticky traps or leaves.
Why this service fits: Object detection for small targets works if images are high-quality and labeled carefully.
Example: Field team uploads trap images; system counts pests per trap.

12) Packaging integrity checks (detection)

Problem: Detect missing seals, misapplied labels, or damaged packaging.
Why this service fits: Object detection can confirm presence/position of expected elements.
Example: Automated line stops when “seal_missing” is detected above a threshold.

6. Core Features

Feature availability can vary by region/SKU and may evolve. Validate in the Azure portal and official docs.

6.1 Image classification (multiclass and multilabel)

What it does: Predicts category labels for an entire image (one label or multiple).
Why it matters: Many real problems are “what is this image?” rather than “where is the object?”
Practical benefit: Simple labeling (tags) and fast iteration.
Limitations/caveats: Performance depends heavily on representative training data (lighting, background, device camera differences).

6.2 Object detection with bounding boxes

What it does: Detects objects and returns bounding boxes plus probabilities.
Why it matters: Enables localization tasks (counting, presence checks, compliance).
Practical benefit: Supports automation like “if missing part detected then fail.”
Limitations/caveats: Requires more labeling effort; small objects and heavy occlusion can reduce accuracy.

6.3 Project-based workflow (portal + APIs)

What it does: Organizes datasets, tags, iterations, and deployments in a single project.
Why it matters: Improves repeatability and collaboration.
Practical benefit: Easier lifecycle management than ad-hoc scripts alone.
Limitations/caveats: Dataset governance and export/import processes should be planned for enterprise workflows (don’t treat the portal as your only “source of truth”).

6.4 Training iterations and publishing

What it does: Train multiple model iterations; publish a chosen iteration to a prediction endpoint.
Why it matters: Safe rollouts and easy rollback.
Practical benefit: Keep a “known good” published iteration while experimenting.
Limitations/caveats: Versioning semantics are tied to iterations/published names; you must document and automate your promotion process.

6.5 Evaluation metrics and threshold tuning

What it does: Provides precision/recall and probability thresholds for predictions.
Why it matters: You can tune for fewer false positives vs fewer false negatives.
Practical benefit: Aligns model behavior with business risk (e.g., safety checks prefer fewer false negatives).
Limitations/caveats: Offline metrics may not match production performance if your real images drift from training distribution.

6.6 REST APIs and SDK support

What it does: Programmatically upload images, train models, manage iterations, and run inference.
Why it matters: Enables automation and CI/CD-style retraining workflows.
Practical benefit: Repeatable pipelines; fewer manual portal steps.
Limitations/caveats: API versions and endpoints can change; pin SDK versions and follow official version guidance.

6.7 Export / edge deployment (supported projects only)

What it does: Exports trained models to formats suitable for running outside Azure (availability depends on the chosen domain/model settings).
Why it matters: Low latency, offline inference, reduced network cost.
Practical benefit: Run inference on factory floor devices, mobile, or IoT.
Limitations/caveats: Export is not always available for all domains/project types; validate early. Exported models may require device-specific optimization and careful update management.

6.8 Separation of training and prediction resources

What it does: Training and hosted prediction typically use different Azure resources.
Why it matters: Security boundary and cost management (training is spiky; prediction is steady).
Practical benefit: Lock down training keys; scale prediction independently.
Limitations/caveats: Requires planning for resource creation, key management, and region selection.

7. Architecture and How It Works

High-level architecture

At a high level, Azure AI Custom Vision has two primary workflows:

Training workflow – You upload labeled images to a project – The service trains an iteration – You evaluate and optionally publish the iteration
Prediction workflow – Your app sends an image to the prediction endpoint – The service returns predicted labels (classification) or bounding boxes (detection)

Request/data/control flow (typical)

Data plane (images)
Training: image upload + tags/boxes
Inference: image sent to prediction endpoint
Control plane
Resource management (create training/prediction resources, manage keys)
Project/iteration management via portal/API

Integrations with related services

Common patterns: – Blob Storage: source of images; apps fetch from storage and send to endpoint – Functions/Container Apps: lightweight API layer, preprocessing (resize/compress), auth, rate limiting – Event Grid: trigger inference when new blobs arrive – Key Vault: store training/prediction keys – App Insights/Azure Monitor: request tracing/metrics for your app tier that calls Custom Vision

Dependency services

Azure resource group, networking, identity, logging (for your app)
Custom Vision training + prediction resources

Security/authentication model (practical view)

Prediction calls commonly use:
Prediction Key header + endpoint URL (key-based auth)
Training calls commonly use:
Training Key header + training endpoint URL
Access to create/manage resources uses:
Azure RBAC (portal/ARM)
Azure AD-based authentication support varies across Azure AI services and endpoints; verify in official docs for your current requirements.

Networking model

Hosted endpoints are public by default.
Some Azure AI services support private networking via Azure Private Link; availability for Custom Vision can vary by region/SKU and resource type. Verify in official docs before designing a private-only architecture.

Monitoring/logging/governance

Your calling application should log:
request IDs, timestamps, image source, model version (published iteration name), latency, top predictions, confidence
Use governance practices:
resource naming conventions
tags for cost center/environment/owner
key rotation
periodic access review

Simple architecture diagram (Mermaid)

flowchart LR
  U[User / Camera / App] --> A[App Service / Function]
  A -->|HTTPS + Prediction-Key| CVP[Azure AI Custom Vision\nPrediction Endpoint]
  A --> B[Blob Storage\n(Optional: store images)]
  CVP --> A

Production-style architecture diagram (Mermaid)

flowchart TB
  subgraph EdgeOrClient["Edge/Client"]
    C[Camera / Mobile App]
  end

  subgraph Azure["Azure Subscription"]
    EG[Event Grid (optional)]
    ST[(Azure Blob Storage)]
    F[Azure Functions / Container Apps\nPreprocess + Auth + Rate Limit]
    KV[Azure Key Vault]
    MON[Application Insights / Azure Monitor]
    CVT[Azure AI Custom Vision\nTraining Resource]
    CVP[Azure AI Custom Vision\nPrediction Resource]
    DEV[DevOps Pipeline\n(GitHub Actions/Azure DevOps)]
  end

  C -->|Upload| ST
  ST -->|Blob Created Event| EG
  EG --> F
  F -->|Fetch image| ST
  F -->|Get secrets| KV
  F -->|Predict (HTTPS)| CVP
  F --> MON

  DEV -->|Automate training via API| CVT
  DEV -->|Publish iteration| CVT
  CVT -->|Publishes to| CVP

8. Prerequisites

Account/subscription requirements

An active Azure subscription with billing enabled.
Ability to create resources in a resource group.

Permissions / IAM roles

You typically need: – Contributor on the target resource group (to create Custom Vision resources) – Or more restrictive: – Cognitive Services Contributor (or equivalent) for resource creation – Reader for auditors/observers – For secret storage: – Key Vault access roles (for example, Key Vault Secrets Officer/Secrets User depending on your access model)

Billing requirements

Pay-as-you-go or enterprise agreement subscription.
If using free tier, understand its limits (projects, training time, transactions)—confirm on the pricing page.

Tools needed

For the hands-on lab, you can complete everything in the portal. Optional tools: – Azure CLI (optional): https://learn.microsoft.com/cli/azure/install-azure-cli – curl (for quick endpoint tests) – Python 3.10+ (optional, for sample code) – A way to gather images (phone camera is fine)

Region availability

Custom Vision resources are regional.
Choose a region close to your users/cameras for latency and data residency.
Always verify supported regions in official docs and the Azure portal during resource creation.

Quotas/limits

Limits commonly exist around: – Number of projects – Images per project – Training time/iterations – Transactions per second / rate limits – Image size constraints These change over time—verify in official docs and in your subscription quota views where applicable.

Prerequisite services (recommended)

Azure Key Vault for storing keys
Azure Blob Storage for storing training/inference images (optional but recommended for traceability)
A minimal compute tier (Functions/Container Apps/App Service) to call prediction endpoints from a controlled backend rather than directly from clients

9. Pricing / Cost

Azure AI Custom Vision pricing is usage-based and depends on SKU/region. Do not lock a solution design until you confirm prices for your region and billing agreement.

Official pricing page (start here):
https://azure.microsoft.com/pricing/details/cognitive-services/custom-vision-service/
(If the URL redirects under Azure AI services branding, follow the official redirect.)

Azure Pricing Calculator:
https://azure.microsoft.com/pricing/calculator/

Pricing dimensions (typical)

Common cost meters include: – Training – Charged based on training compute/units (often time-based or per training unit) – Prediction (hosted inference) – Charged per number of prediction transactions (often per 1,000 transactions), with different rates for classification vs detection in some pricing models – Resource SKUs – Free tier vs Standard tiers (naming varies; confirm current SKUs in portal)

Free tier (if applicable)

Azure AI services often offer a free tier with: – Limited number of transactions per month – Limited training capacity/projects This changes—verify the current free-tier limits on the official pricing page.

Primary cost drivers

Number of prediction calls (and whether you’re doing detection vs classification)
Size/complexity of images (affects bandwidth and sometimes latency; cost is usually per transaction, but upstream costs can rise)
Frequency of retraining (especially if training is charged)
Environment split: dev/test/prod resources

Hidden or indirect costs

Storage (Blob Storage for images and results)
Data transfer
Upload bandwidth from edge to Azure
Egress costs if you move results across regions or out of Azure
Compute hosting for your app layer (Functions/Container Apps/App Service)
Key Vault operations (usually small, but measurable at scale)
Human labeling time (often the biggest real cost)

Network/data transfer implications

If cameras upload large images frequently, bandwidth can dominate. Consider:
resizing/compressing before upload
sending only necessary frames (sampling)
moving inference closer to the edge (export-supported scenarios)

How to optimize cost

Start with classification if detection is not necessary.
Use sampling (don’t run inference on every frame in a video feed).
Implement confidence thresholds and only escalate uncertain cases to humans.
Cache results for repeated images (where appropriate).
Separate dev/test from prod and apply budgets/alerts.

Example low-cost starter estimate (conceptual)

A typical low-cost learning setup: – 1 training resource + 1 prediction resource (free tier if available) – A few hundred prediction calls for testing – Minimal storage (a few hundred images) Because exact numbers vary, compute an estimate by entering expected monthly prediction transactions and any training units into the Pricing Calculator.

Example production cost considerations (conceptual)

For production, quantify: – peak and average predictions per second – expected monthly transactions – retraining cadence (weekly/monthly) – image sizes and upload patterns Then: – set Azure budgets – add alerts – run a short load test to confirm latency and any throttling behavior

10. Step-by-Step Hands-On Tutorial

This lab builds and deploys a basic image classification model using the Azure AI Custom Vision portal, then calls it via HTTP.

Objective

Create an Azure AI Custom Vision classification project that distinguishes between two categories (example: “Mug” vs “NotMug”), train an iteration, publish it, and call the prediction endpoint with curl.

Lab Overview

You will: 1. Create Azure resources (Training + Prediction). 2. Create a Custom Vision project. 3. Upload and tag images. 4. Train and evaluate a model iteration. 5. Publish the model to an endpoint. 6. Call the endpoint and interpret results. 7. Clean up resources to avoid ongoing charges.

Data note: Use your own photos (recommended). For best results, collect images under realistic conditions (lighting/background/angles) similar to production.

Step 1: Create Azure AI Custom Vision resources (Training and Prediction)

You can do this in the Azure portal.

Go to the Azure portal: https://portal.azure.com
Create a Resource group (if you don’t have one): – Search Resource groups → Create – Name: rg-customvision-lab – Region: choose a region that supports Custom Vision (portal will show availability)
Create the Training resource: – Search Custom Vision (or “Azure AI services” → find Custom Vision) – Create a resource with a name like: cvtrain<unique> – Select an appropriate SKU (free tier if available; otherwise a standard tier)
Create the Prediction resource: – Create a second resource with a name like: cvpred<unique> – Choose the same region if possible (simplifies latency and compatibility)

Expected outcome: You have two Azure resources created: one for training, one for prediction.

Verification: – In the portal, open each resource and confirm: – Status is healthy/ready – You can see Keys and Endpoint (exact blade naming can vary)

Step 2: Open the Custom Vision portal and create a project

Open the Custom Vision portal: https://www.customvision.ai/
Sign in with the same identity you use for Azure.
Select New Project.
Configure the project: – Name: cv-mug-classifier – Resource: select your training resource (cvtrain...) – Project Types: choose Classification – Classification Types: choose Multiclass (one label per image) – Domain: start with General (or a similar general-purpose domain shown in the portal)

Domain options can change. If you plan to export to edge, you may need a “compact” domain. Pick the domain based on your deployment requirements and verify export support early.

Expected outcome: A new, empty classification project is created.

Verification: – You should see the project dashboard with tabs like Training Images, Tags, Train, and Performance (names may vary slightly).

Step 3: Collect and upload training images

You need at least a small set of images per tag to start, but real models require more.

Create two tags: – Tag 1: Mug – Tag 2: NotMug
Gather images: – Mug: 15–30 photos of mugs (different mugs, angles, backgrounds) – NotMug: 15–30 photos of other objects (bottles, bowls, plates, phone, laptop, etc.)
Upload images: – In Training Images → Add images – Upload your mug images and assign the Mug tag – Upload your non-mug images and assign the NotMug tag

Expected outcome: Images appear in the project and each image is tagged.

Verification: – Use filtering by tag to confirm you have images under both tags. – Make sure tags are not imbalanced (for the lab, keep counts roughly similar).

Step 4: Train your first iteration

Click Train.
Choose the training option offered (for example “Quick training” if present).
Start training and wait for completion.

Expected outcome: Training finishes and you get an iteration with performance metrics.

Verification: – Open the Performance view. – Confirm you see metrics like precision/recall per tag and overall.

If your metrics are poor, don’t panic. For a lab model, you are validating the workflow. You can improve accuracy by adding more representative images and reducing label noise.

Step 5: Quick test the model in the portal

Use the portal’s Quick Test feature.
Upload a new image (not part of training): – A mug image → should predict Mug with higher probability – A non-mug image → should predict NotMug with higher probability

Expected outcome: The model returns a predicted tag and probability.

Verification: – Confirm the top prediction matches the image content at least most of the time.

Step 6: Publish the iteration to the prediction resource

In the iteration view, click Publish.
Choose: – Prediction resource: select your prediction resource (cvpred...) – Published name: mugclassifier-v1

Expected outcome: The iteration is now published and available via a hosted prediction endpoint.

Verification: – Look for a “Published” indicator on the iteration. – In the portal, open Prediction URL (or similar) to see example request formats.

Step 7: Call the prediction endpoint with curl

This step uses the Prediction URL info provided by the portal so you don’t have to guess endpoints.

In Custom Vision portal, open Prediction URL for your published iteration.
Copy: – The endpoint URL – The required header name and value (usually Prediction-Key: <your key>) – The correct path for classification (commonly includes /classify/iterations/<publishedName>/image)
Run a curl call using a local test image.

Example (pattern only—use your exact URL and key from the portal):

curl -X POST "<PREDICTION_URL_FROM_PORTAL>" \
  -H "Prediction-Key: <YOUR_PREDICTION_KEY>" \
  -H "Content-Type: application/octet-stream" \
  --data-binary "@./test-mug.jpg"

Expected outcome: A JSON response with predictions and probabilities.

Verification: – Confirm the top prediction is Mug for a mug image. – Save the response JSON for troubleshooting and auditing.

Step 8: (Optional) Improve the model with new images and retrain

Add 10–20 more images for whichever class is underperforming.
Retrain to create a new iteration.
Compare metrics against the published iteration.
Publish the new iteration as mugclassifier-v2 (or overwrite the same published name if your release process allows—be deliberate).

Expected outcome: A better-performing iteration is available.

Verification: – Run the same curl tests and compare probabilities.

Validation

You have successfully completed this lab if: – Two Azure resources exist (training + prediction). – A Custom Vision project exists with at least two tags and tagged images. – At least one iteration is trained and published. – A curl request to the published endpoint returns predictions and the results are directionally correct.

Troubleshooting

Common issues and fixes:

401 Unauthorized / 403 Forbidden – Cause: Wrong key, wrong header name, or using a training key on prediction endpoint. – Fix: Use the Prediction URL panel and copy the exact header and endpoint. Ensure you are using the prediction resource key.
404 Not Found – Cause: Incorrect project/iteration path or wrong region endpoint. – Fix: Don’t hand-construct the URL. Use the portal’s generated prediction URL.
Model predicts the same label for everything – Cause: Too few training images, unbalanced tags, or labels too similar/noisy. – Fix: Add more diverse images, balance classes, remove mislabeled images, retrain.
Poor performance in real photos – Cause: Training images not representative (lighting, background, camera). – Fix: Collect training images in real deployment conditions. Include “hard negatives” (objects similar to mugs).
Throttling / rate limits – Cause: High request rate. – Fix: Add retry with backoff in your app; consider batching/sampling; check SKU limits.

Cleanup

To avoid ongoing costs: 1. In Azure portal, delete the resource group: – rg-customvision-lab → Delete resource group 2. Confirm deletion removes: – Training resource – Prediction resource – Any associated resources you created for the lab

If you want to keep the resource group, delete only the Custom Vision resources you created.

11. Best Practices

Architecture best practices

Put an API/service layer between clients and Custom Vision:
centralizes auth, throttling, logging, and routing
Store images in Blob Storage and pass references internally:
keep audit trails and enable reprocessing
Use event-driven designs for batch/async inference:
Event Grid → Functions → Prediction
Plan model lifecycle:
naming conventions for published iterations (e.g., productdetector-prod, productdetector-canary)
rollback strategy (keep prior iteration published)

IAM/security best practices

Store keys in Azure Key Vault, not in code or client apps.
Rotate keys regularly and on staff changes.
Restrict who can train/publish vs who can only call prediction endpoints.
Use least-privilege RBAC for resource management.

Cost best practices

Use free tier for learning if available and within limits.
Compress/resize images before inference when acceptable.
Reduce inference volume with sampling and business rules.
Separate dev/test and prod resources; use budgets and alerts.

Performance best practices

Keep inference calls close to the endpoint region.
Use timeouts and retries with exponential backoff.
Benchmark with production-like payload sizes.
Consider edge export (if supported) for low-latency/high-volume camera environments.

Reliability best practices

Implement circuit breakers in your app tier.
Cache results for idempotent operations when appropriate.
Have a “degraded mode” if prediction endpoint is temporarily unavailable.

Operations best practices

Log model version (published iteration name) on every prediction.
Capture a sample of inputs/outputs for monitoring drift (ensure privacy compliance).
Use dashboards for:
request volume
latency
error rate
confidence distribution (signals drift)

Governance/tagging/naming best practices

Resource naming: include env/region/app
Example: cvpred-prod-weu-inventory
Tag resources:
env=dev|test|prod
owner=team
costCenter=...
dataClass=public|internal|confidential

12. Security Considerations

Identity and access model

Resource management: controlled by Azure RBAC (who can create/delete resources, access keys).
Service access: commonly via API keys (training and prediction keys).
Some Azure AI services support Azure AD token auth in certain scenarios; for Custom Vision, verify current auth support in official docs for your API version.

Encryption

Data in transit: HTTPS endpoints.
Data at rest: managed by Azure AI services platform. For strict requirements (CMK, customer-managed keys), verify support for your exact resource type/SKU/region in official docs.

Network exposure

By default, prediction endpoints are public.
If you require private-only connectivity:
Check whether Custom Vision supports Private Link for your resource types and region (verify in official docs).
If not supported, mitigate by restricting access through your own backend and network controls.

Secrets handling

Put keys in Key Vault.
Avoid embedding prediction keys in mobile apps or browser code.
Use managed identity from your app tier to retrieve keys from Key Vault.

Audit/logging

Log:
who published a model (change management)
when keys were rotated
prediction requests (at least metadata and outcomes)
For Azure resource changes, use Azure Activity Log and consider exporting to Log Analytics/SIEM.

Compliance considerations

Review data residency and privacy requirements for images.
Implement retention policies and access controls for stored images.
If images include people or sensitive content, perform privacy impact assessment and legal review.

Common security mistakes

Sending prediction keys directly to clients.
No key rotation plan.
Storing training images without access controls.
No audit trail for model iteration promotion to production.

Secure deployment recommendations

Front prediction calls with a backend service.
Use Key Vault + managed identity for secrets.
Add request validation (size/type limits) to prevent abuse.
Maintain separate resources for dev/test/prod.

13. Limitations and Gotchas

Because capabilities evolve, treat these as planning prompts and confirm current constraints in official docs.

Export support varies by project type/domain and may not be available for every model.
Labeling quality dominates results:
mislabeled images can ruin accuracy
inconsistent bounding boxes reduce detection performance
Data drift is common:
new packaging designs, lighting changes, camera upgrades can degrade accuracy
Rate limits/throttling can occur under load:
plan retries and backoff
Class imbalance leads to biased predictions:
keep training sets balanced where feasible
Regional availability:
not every Azure region supports every Azure AI service resource
Privacy/security requirements:
images may be sensitive; plan storage, retention, and access carefully
Testing on “easy” images misleads:
always test with production-like images
Operational coupling:
if you overwrite a published name without change control, you can break downstream assumptions
API version differences:
the prediction URL format and endpoints depend on API versions; use portal-generated URLs and official SDKs

14. Comparison with Alternatives

Within Azure

Azure AI Vision (prebuilt Image Analysis): best for generic tagging, captions, OCR-like tasks; less ideal for your proprietary categories.
Azure Machine Learning: best for full control and advanced ML workflows (custom architectures, MLOps pipelines), but higher complexity.

Other clouds

AWS Rekognition Custom Labels: similar concept—custom image models with managed training/inference.
Google Cloud Vertex AI (AutoML Vision): similar managed custom vision training.

Open-source/self-managed

YOLO (e.g., Ultralytics YOLOv8/YOLOvX) for detection; ResNet/EfficientNet/ViT for classification, trained in PyTorch/TensorFlow.
Strength: full control and potentially lower per-inference cost at scale.
Weakness: you manage training infra, deployment, monitoring, and security.

Comparison table

Option	Best For	Strengths	Weaknesses	When to Choose
Azure AI Custom Vision	Quick custom classification/detection with managed workflow	Fast setup, portal labeling, iterations, hosted prediction API, some export options	Less control than full ML platforms; export/domain constraints; needs representative images	You want managed custom vision without building full ML infrastructure
Azure AI Vision (prebuilt)	General image understanding	No training needed, quick integration	Not tailored to your proprietary classes	Your problem matches generic concepts (objects/scenes/text)
Azure Machine Learning	Advanced/custom ML at scale	Full control, MLOps, custom training code, GPU options	Higher complexity and operational burden	You need custom architectures, pipelines, governance, or multi-modal ML
AWS Rekognition Custom Labels	Similar managed custom vision on AWS	Integrated with AWS ecosystem	Cloud lock-in; different pricing and workflows	Your workloads are primarily on AWS
Google Vertex AI AutoML Vision	Similar managed custom vision on GCP	AutoML + strong ML platform integration	Different tooling/quotas; cloud lock-in	You’re standardized on GCP
Self-managed (YOLO/PyTorch/TensorFlow)	Maximum control and offline-first	Full flexibility, can optimize for hardware, avoid per-call fees	Highest engineering/ops burden	You need strict offline, custom training, or very high volume cost optimization

15. Real-World Example

Enterprise example: Manufacturing quality inspection

Problem: A manufacturer needs to detect missing components and visible defects on an assembly line. They need auditability and controlled rollouts.
Proposed architecture:
Cameras upload images to Blob Storage.
Event Grid triggers Functions to preprocess images and call Azure AI Custom Vision prediction.
Results stored in a database; failures create tickets in an ITSM system.
A gated process publishes new iterations after validation on a gold test set.
Why Azure AI Custom Vision was chosen:
Faster prototyping than building an Azure ML training pipeline.
Clear iteration/publishing workflow.
Hosted endpoint simplifies integration with the line-control software.
Expected outcomes:
Reduced manual inspection workload.
Faster defect detection and improved consistency.
A repeatable retraining cycle as new defect types appear.

Startup/small-team example: E-commerce product photo categorization

Problem: A small e-commerce team wants to auto-route product photos into categories (e.g., “shoes,” “bags,” “accessories”) using their own catalog style and backgrounds.
Proposed architecture:
Admin tool uploads images and calls a small backend API.
Backend calls Azure AI Custom Vision prediction endpoint.
The tool accepts model suggestions and stores the final label for feedback.
Why Azure AI Custom Vision was chosen:
Small team can manage labeling and training in the portal.
Minimal infrastructure; only a lightweight backend is required.
Expected outcomes:
Faster product onboarding.
More consistent categorization.
Ability to improve model quality with incremental labeled data.

16. FAQ

What is Azure AI Custom Vision used for?
Training custom image classification and object detection models using your own labeled images, then deploying them via hosted endpoints or supported export formats.
Do I need ML expertise to use it?
You need basic understanding of labeling, evaluation, and data quality, but you typically don’t need to write training code for the managed workflow.
What’s the difference between classification and object detection?
Classification labels the entire image; detection finds objects and returns bounding boxes and probabilities.
Do I need separate resources for training and prediction?
Commonly yes (Training resource and Prediction resource). The portal and docs reflect the supported setup; verify the latest creation flow in official docs.
How many images do I need?
It depends on variability and difficulty. Start with dozens per class for a prototype, but real production models often require hundreds or thousands per class and ongoing updates.
Can I run it offline on edge devices?
Sometimes, via model export for supported project types/domains. Confirm export support early for your chosen configuration.
How do I version models?
Use iterations and published iteration names. Maintain a release process (dev → staging → prod) and keep a rollback iteration available.
How do I secure the prediction endpoint?
Don’t expose keys to clients. Call prediction from a backend service, store keys in Key Vault, rotate keys, and consider private networking options if supported (verify).
Can I use Azure AD instead of keys?
Azure RBAC governs resource management, but prediction/training APIs frequently use keys. Check official Custom Vision authentication documentation for current Azure AD support.
Why is my model accurate in the portal but fails in production?
Data drift and non-representative training images are the usual cause. Collect training images from production-like conditions and retrain.
How do I reduce false positives?
Increase the probability threshold, add “hard negative” examples, and refine labels. Evaluate impact on recall.
Is Custom Vision suitable for video analytics?
It’s image-based; for video you typically extract frames and run inference on selected frames. Consider sampling to control cost and latency.
What image formats are supported?
Common web image formats are typically supported, but verify current constraints (size, format, max payload) in official docs.
How do I automate retraining?
Use the training API/SDK to upload images, create iterations, evaluate metrics, and publish if metrics meet thresholds—integrated into GitHub Actions/Azure DevOps.
How do I estimate cost?
Use the official pricing page and Azure Pricing Calculator. Main drivers are prediction transactions and training usage.
Can I restrict who can publish models?
Yes, through Azure RBAC for resource management and by controlling who has training keys and portal access. Implement change management around publishing.
What’s the best way to manage datasets?
Keep a “source of truth” dataset in storage with metadata and labeling records, and use Custom Vision as the training/deployment tool. Consider a labeling workflow and a gold test set.

17. Top Online Resources to Learn Azure AI Custom Vision

Resource Type	Name	Why It Is Useful
Official documentation	Azure AI Custom Vision docs — https://learn.microsoft.com/azure/ai-services/custom-vision-service/	Primary reference for concepts, how-tos, API versions, and current capabilities
Official pricing	Azure AI Custom Vision pricing — https://azure.microsoft.com/pricing/details/cognitive-services/custom-vision-service/	Authoritative pricing model and SKU details (confirm region)
Pricing calculator	Azure Pricing Calculator — https://azure.microsoft.com/pricing/calculator/	Build estimates for training/prediction usage and related services
Portal	Custom Vision portal — https://www.customvision.ai/	Main UI for projects, labeling, training iterations, publishing, and prediction URLs
Quickstarts/tutorials	Custom Vision quickstarts (from docs hub) — https://learn.microsoft.com/azure/ai-services/custom-vision-service/	Step-by-step onboarding aligned with current SDKs/APIs
SDK reference	Azure SDK documentation — https://learn.microsoft.com/azure/developer/python/sdk/ (and language selectors)	Guidance on using official SDKs safely and correctly
Samples (GitHub)	Azure Samples on GitHub — https://github.com/Azure-Samples	Search for “custom vision” samples and end-to-end code patterns
Architecture guidance	Azure Architecture Center — https://learn.microsoft.com/azure/architecture/	Reference architectures for event-driven, secure, and scalable Azure solutions (apply patterns to Custom Vision workloads)
Security baseline (general)	Azure security documentation — https://learn.microsoft.com/security/	Broader Azure security practices for identity, networking, and governance
Video learning	Microsoft Azure YouTube — https://www.youtube.com/@MicrosoftAzure	Walkthroughs and demos; validate content recency against current docs

18. Training and Certification Providers

The following training providers may offer Azure and AI + Machine Learning learning paths. Verify current course outlines and delivery modes on their sites.

Institute	Suitable Audience	Likely Learning Focus	Mode	Website URL
DevOpsSchool.com	DevOps engineers, cloud engineers, developers	Azure fundamentals, DevOps, CI/CD, cloud operations; may include AI service integrations	Check website	https://www.devopsschool.com/
ScmGalaxy.com	Beginners to intermediate DevOps learners	DevOps, automation, tooling; potential Azure integration content	Check website	https://www.scmgalaxy.com/
CLoudOpsNow.in	Cloud ops teams, SREs, platform engineers	Cloud operations practices, monitoring, reliability; may include Azure operational patterns	Check website	https://www.cloudopsnow.in/
SreSchool.com	SREs, ops leads, platform teams	Reliability engineering, observability, incident response applied to cloud workloads	Check website	https://www.sreschool.com/
AiOpsSchool.com	Ops + AI practitioners	AIOps concepts, monitoring automation; may complement AI service operations	Check website	https://www.aiopsschool.com/

19. Top Trainers

Presented as training resources/platforms (verify the specific trainer profiles, schedules, and course coverage on each website).

Platform/Site	Likely Specialization	Suitable Audience	Website URL
RajeshKumar.xyz	DevOps/cloud training content (verify scope)	Beginners to intermediate engineers	https://rajeshkumar.xyz/
devopstrainer.in	DevOps training and coaching (verify offerings)	DevOps engineers, developers	https://www.devopstrainer.in/
devopsfreelancer.com	Freelance DevOps/consulting and training (verify services)	Teams seeking short-term help or mentoring	https://www.devopsfreelancer.com/
devopssupport.in	DevOps support/training (verify scope)	Ops/DevOps teams needing practical support	https://www.devopssupport.in/

20. Top Consulting Companies

Descriptions are neutral and based on likely service positioning inferred from public presence—confirm exact service catalogs and references directly with the companies.

Company	Likely Service Area	Where They May Help	Consulting Use Case Examples	Website URL
cotocus.com	Cloud/DevOps consulting (verify specialties)	Cloud architecture, delivery execution, operations	Designing an Azure integration layer for Custom Vision inference; setting up monitoring and secure secret management	https://cotocus.com/
DevOpsSchool.com	DevOps and cloud consulting/training	DevOps transformation, CI/CD, platform engineering	Building CI/CD for apps that consume Custom Vision; governance, IaC, and operational readiness	https://www.devopsschool.com/
DEVOPSCONSULTING.IN	DevOps consulting (verify offerings)	Automation, reliability, operations	Implementing secure deployment patterns for AI-backed services; alerting, incident response setup	https://www.devopsconsulting.in/

21. Career and Learning Roadmap

What to learn before Azure AI Custom Vision

Azure fundamentals
Resource groups, regions, IAM/RBAC, networking basics
HTTP and REST
headers, auth, status codes, payload formats
Computer vision basics
classification vs detection, overfitting, train/test split
Data handling
basic image preprocessing, dataset organization, labeling discipline

What to learn after Azure AI Custom Vision

MLOps practices
dataset versioning, automated evaluation gates, staged releases
Azure Machine Learning
when you need full control, pipelines, model registry, advanced monitoring
Production architecture
event-driven patterns, retries/circuit breakers, secure secret management
Model monitoring and drift
confidence tracking, sampling, human-in-the-loop feedback loops

Job roles that use it

Cloud solutions engineer / cloud architect
DevOps engineer / SRE supporting AI-enabled apps
Application developer integrating AI services
Data analyst / data steward coordinating labeling and evaluation
ML engineer (often as a complementary tool or baseline approach)

Certification path (Azure)

There is no single certification dedicated only to Custom Vision, but relevant Azure certifications often include: – Azure Fundamentals (AZ-900) – Azure AI Fundamentals (AI-900) – Azure role-based certifications (developer/architect) depending on your responsibilities

Always verify current certification offerings and exam objectives on Microsoft Learn: https://learn.microsoft.com/credentials/

Project ideas for practice

Build a “defect detector” prototype with 3–5 defect classes.
Create a home-inventory classifier (electronics, tools, kitchen items).
Implement an event-driven inference pipeline (Blob Storage → Event Grid → Function → results table).
Create an internal model release checklist with iteration naming and rollback steps.
Build a drift dashboard: store top-1 confidence over time and alert on distribution shifts.

22. Glossary

Azure AI Custom Vision: Azure service for training custom image classification and object detection models and deploying them for inference.
Classification: Predicting label(s) for an entire image.
Multiclass classification: Exactly one label per image.
Multilabel classification: Multiple labels can apply to one image.
Object detection: Predicting labels plus bounding boxes locating objects in the image.
Tag: A label/category in Custom Vision used for classification or object labeling.
Bounding box: Rectangle drawn around an object instance for detection training.
Iteration: A trained model version produced by a training run.
Published iteration: An iteration deployed to the prediction endpoint with a published name.
Precision: Of predicted positives, how many were correct (low false positives).
Recall: Of actual positives, how many were found (low false negatives).
Threshold: Confidence cutoff used to decide whether to accept a prediction.
Training resource: Azure resource used to run training operations.
Prediction resource: Azure resource that hosts the prediction endpoint for inference.
Prediction key / Training key: API keys used to authenticate prediction/training calls.
Data drift: Changes in real-world inputs over time that reduce model performance.
Hard negatives: Non-target examples that are visually similar to the target class; useful for improving robustness.

23. Summary

Azure AI Custom Vision is Azure’s managed service for building custom image classification and object detection models using your own labeled images. It matters because many real AI + Machine Learning scenarios require domain-specific visual recognition that prebuilt vision APIs cannot reliably deliver.

In Azure architectures, it typically sits behind a controlled application layer that handles authentication, logging, throttling, and image handling, while Custom Vision provides the training workflow (iterations) and an easy deployment path (hosted prediction endpoint and, where supported, export for edge use). Cost is primarily driven by prediction volume and training usage, plus indirect costs like storage and bandwidth. Security hinges on correct key management (Key Vault, rotation) and restricting endpoint exposure.

Use Azure AI Custom Vision when you want a practical, managed path to custom vision without building a full ML platform—then expand into stronger MLOps and governance practices as your solution matures. Next step: read the official docs hub and run a second lab that automates image upload, training, and publishing through the REST API/SDK, aligned with your team’s CI/CD workflow.

rajeshkumar

Category