Azure Virtual Machine Scale Sets Tutorial: Architecture, Pricing, Use Cases, and Hands-On Guide for Compute

Category

Compute

1. Introduction

Azure Virtual Machine Scale Sets is a Compute service for running and automatically scaling a group of identical (or near-identical) virtual machines (VMs). It helps you deploy and manage many VMs as a single resource so you can handle traffic spikes, improve availability, and reduce operational overhead compared to managing individual VMs.

In simple terms: you define one VM model (image, size, networking, extensions), set how many instances you want, and Azure keeps that fleet running and scaled—often behind a load balancer—so your application can handle changing demand.

Technically, Virtual Machine Scale Sets (often shortened to VMSS) provides an orchestration layer over Azure Virtual Machines. You choose an orchestration mode (notably Uniform or Flexible, depending on your needs), integrate with Azure Load Balancer or Application Gateway, configure health probes and upgrade policies, and optionally attach Azure Monitor autoscale rules. Azure then manages instance placement, updates, and (in many scenarios) automatic repair/replacement of unhealthy instances.

The problem it solves is the classic “how do I run N servers reliably and scale them up/down without babysitting every instance?” Virtual Machine Scale Sets gives you a structured way to run stateless (and in some cases semi-stateful) compute fleets for web tiers, APIs, batch workers, CI agents, and more.

Service naming status: “Virtual Machine Scale Sets” is the current official Azure service name and is active. In documentation you’ll see different capabilities depending on orchestration mode (Uniform vs Flexible). Always verify mode-specific behavior in the official docs before committing to a production design.


2. What is Virtual Machine Scale Sets?

Official purpose

Virtual Machine Scale Sets is designed to help you deploy, manage, and automatically scale a set of Azure VMs. The scale set acts as a single management boundary for a group of instances, which typically share configuration such as VM image, size, networking, and extensions.

Core capabilities

  • Scale out/in VM instances manually or automatically (via Azure Monitor autoscale)
  • High availability patterns with Availability Zones (region permitting) and load balancing
  • Centralized management of updates, configuration, extensions, and instance lifecycle operations
  • Health-based behaviors such as load balancer health probes and (in many designs) repair/replacement of unhealthy instances
  • Integration with common Azure building blocks (VNets, Load Balancer, Application Gateway, Managed Disks, Azure Monitor)

Major components

  • Scale set resource (VMSS): the top-level resource you manage
  • Instance model: the VM configuration (image, SKU, OS disk, data disks, NICs, extensions, boot diagnostics)
  • VM instances: the actual VMs created from the model
  • Autoscale settings: rules that change instance count based on metrics or schedules
  • Load balancing: Azure Load Balancer or Application Gateway to distribute traffic
  • Health signals: probes and application health reporting (mode/capability dependent—verify in docs)

Service type

  • IaaS Compute orchestration over Azure Virtual Machines (VMs)

Scope: regional and zonal

  • Virtual Machine Scale Sets is a regional resource.
  • You can design for zonal resiliency by placing instances across Availability Zones in regions that support zones.
  • All resources still live inside an Azure subscription, resource group, and region.

How it fits into the Azure ecosystem

Virtual Machine Scale Sets sits in the “fleet compute” layer: – It uses Azure Virtual Machines as the underlying compute. – It commonly pairs with: – Azure Load Balancer (Layer 4) for TCP/UDP traffic distribution – Azure Application Gateway (Layer 7) for HTTP(S) routing + WAF options – Azure Monitor for metrics, logs, autoscale, and alerts – Azure Virtual Network (VNet) for networking – Azure Key Vault for secrets/certificates (accessed via Managed Identity) – Azure Compute Gallery (formerly Shared Image Gallery) for custom images

Official docs entry point: https://learn.microsoft.com/azure/virtual-machine-scale-sets/


3. Why use Virtual Machine Scale Sets?

Business reasons

  • Handle growth without redesign: add capacity when demand increases.
  • Improve uptime: distribute instances across fault domains / availability zones and replace unhealthy nodes.
  • Lower toil: manage one scale set instead of dozens/hundreds of separate VM resources.
  • Predictable deployments: “golden image + scale set model” helps standardize environments.

Technical reasons

  • Horizontal scalability: add more VMs to increase throughput.
  • Integration with load balancing: place a stable frontend in front of changing backends.
  • Immutable-ish infra pattern: update by rolling out a new image/model rather than patching every VM manually (design-dependent).
  • Autoscale based on CPU, memory (via guest metrics), queue length (with custom metrics), or schedules.

Operational reasons

  • Central place to:
  • Scale
  • Upgrade (rolling upgrades in many patterns)
  • Apply extensions
  • Run commands across instances
  • Monitor health and capacity
  • Enables “cattle not pets” operations for stateless tiers.

Security/compliance reasons

  • Standardize baseline hardening (image + extensions).
  • Control access via Azure RBAC and Managed Identity.
  • Improve auditability through Azure Activity Log and Azure Monitor logging.
  • Use network isolation with VNets, NSGs, private subnets, and controlled inbound paths.

Scalability/performance reasons

  • Add instances to absorb spikes.
  • Distribute load across zones for resiliency and latency improvement (regional considerations apply).
  • Reduce single-node bottlenecks by scaling workers and web tiers horizontally.

When teams should choose it

Choose Virtual Machine Scale Sets when you need: – A VM-based compute fleet (OS access, custom binaries, special drivers, legacy software). – Predictable scaling and availability. – Integration with VNets, load balancers, and enterprise networking patterns. – A path that’s “more controllable than PaaS,” but less manual than managing individual VMs.

When teams should not choose it

Avoid or reconsider VMSS if: – You don’t need VM-level control and prefer simpler operations (consider Azure App Service, Azure Container Apps, or Azure Functions). – Your workload is primarily containerized and you want Kubernetes orchestration (consider Azure Kubernetes Service (AKS)). – Your application is heavily stateful on local disk and can’t tolerate instance replacement without careful state management. – You need a single powerful machine rather than a horizontally scaled fleet (consider a standalone Azure VM or specialized services).


4. Where is Virtual Machine Scale Sets used?

Industries

  • SaaS and software companies (web/API tiers, background workers)
  • Finance and insurance (risk compute, batch processing, regulated environments)
  • Retail/e-commerce (traffic bursts, seasonal spikes)
  • Media/streaming (transcoding workers, content pipelines)
  • Manufacturing/IoT backends (ingestion processors)
  • Education (labs, training environments)
  • Gaming (matchmaking services, scalable backends)

Team types

  • Platform engineering teams building reusable infrastructure patterns
  • DevOps/SRE teams operating scalable services
  • Application teams needing VM-level dependencies
  • Security teams enforcing standardized images and access controls
  • Data/ML engineering teams running batch/worker fleets on VMs

Workloads

  • Stateless web frontends
  • REST/gRPC API services
  • CI/CD build agents (self-hosted runners)
  • Queue-driven workers (processing jobs from Service Bus/Storage queues)
  • Batch compute nodes (sometimes alongside Azure Batch)
  • Compute-intensive processing (rendering, simulation) where horizontal scale helps

Architectures

  • Classic 3-tier (web tier in VMSS, app tier in VMSS, data tier managed service)
  • Microservices hosted on VMs (when containers aren’t an option)
  • Hybrid hub-and-spoke VNets with shared egress and centralized security controls
  • Blue/green or canary deployments using multiple scale sets behind traffic routing

Real-world deployment contexts

  • Production: zonal scale sets behind WAF + Application Gateway, integrated monitoring, controlled rollout policies.
  • Dev/test: small instance count, manual scale, scheduled scale-out for load tests, then scale-in to reduce cost.

5. Top Use Cases and Scenarios

Below are 10 realistic ways teams use Virtual Machine Scale Sets.

1) Autoscaling web tier behind a load balancer

  • Problem: Web traffic varies; static server counts either overload or waste money.
  • Why VMSS fits: Scale out/in automatically, integrate with Azure Load Balancer or Application Gateway.
  • Example: Two instances at baseline; scale to 10 during peak shopping hours.

2) API service tier with rolling image upgrades

  • Problem: Updating 30 VMs without downtime is hard.
  • Why VMSS fits: Model-based deployment supports rolling upgrades in many designs.
  • Example: Bake a new image version; roll across instances while keeping capacity.

3) Queue-driven background workers

  • Problem: Job backlog grows; workers need to scale with queue depth.
  • Why VMSS fits: Autoscale can be driven by metrics; instances are interchangeable.
  • Example: Process images from a queue; scale out when queue length increases (custom metric pattern).

4) Self-hosted CI/CD agents

  • Problem: Build bursts cause long queue times; idle agents waste budget.
  • Why VMSS fits: Quickly scale build agents, replace unhealthy agents automatically.
  • Example: Scale out to 50 build agents during work hours, scale in at night.

5) Stateless game backend services

  • Problem: Player count spikes after releases/events.
  • Why VMSS fits: Horizontal scaling + load balancing.
  • Example: Matchmaking API scales from 5 to 40 instances.

6) Batch processing fleet for nightly jobs

  • Problem: Need large compute only during a limited processing window.
  • Why VMSS fits: Schedule-based autoscale patterns reduce cost.
  • Example: Scale to 100 instances from 1 AM–3 AM for ETL processing, scale back to 0–2.

7) Edge-like regional deployment for latency

  • Problem: Users in multiple geographies need low latency.
  • Why VMSS fits: Deploy scale sets per region, front with Azure Front Door.
  • Example: VMSS in East US + West Europe; Front Door routes users to closest region.

8) GPU worker nodes for rendering/transcoding

  • Problem: Need bursts of GPU compute.
  • Why VMSS fits: VMSS can manage a fleet of GPU-enabled VMs (SKU availability varies by region—verify).
  • Example: Video transcode workers scale to meet upload demand.

9) Legacy application modernization (lift-and-improve)

  • Problem: Legacy app needs VMs and can’t be containerized easily.
  • Why VMSS fits: VM-based deployment, but with modern scaling and health.
  • Example: Migrate IIS-based app to VMSS; use Application Gateway for HTTP routing.

10) Zero-trust hardened “compute pool”

  • Problem: Need standardized hardened nodes with controlled outbound and no inbound SSH.
  • Why VMSS fits: One hardened image + policy + NSG; operational access via Azure Bastion/Run Command.
  • Example: Workers in private subnet access storage via private endpoints; no public IPs.

6. Core Features

Note: Some features differ by orchestration mode (Uniform vs Flexible) and by VM image/OS. Validate mode-specific details in official docs before adopting.

1) Orchestration modes (Uniform and Flexible)

  • What it does: Determines how Azure manages VM instances and what features are available.
  • Why it matters: It affects upgrade behavior, scaling semantics, and VM feature parity.
  • Practical benefit: Pick Uniform for classic identical-instance fleets; pick Flexible for scenarios needing more VM-like behaviors (verify capability matrix).
  • Caveats: Not all features exist in both modes; confirm requirements early.

2) Autoscaling (Azure Monitor autoscale)

  • What it does: Automatically changes instance count based on metrics (CPU, custom metrics) or schedules.
  • Why it matters: Keeps performance stable while controlling cost.
  • Practical benefit: Scale out under load, scale in when idle.
  • Caveats: Autoscale reacts to metrics with some delay; plan capacity buffers.

3) Manual scaling

  • What it does: You set the instance count directly.
  • Why it matters: Useful for predictable or controlled scaling events (deployments, load tests).
  • Practical benefit: Simple and deterministic.
  • Caveats: Manual scaling alone doesn’t respond to sudden spikes.

4) Integration with Azure Load Balancer

  • What it does: Distributes L4 traffic (TCP/UDP) across instances.
  • Why it matters: Provides a stable frontend IP/DNS while backend instances change.
  • Practical benefit: Common for web/API traffic and non-HTTP protocols.
  • Caveats: Health probes must be correct; Standard Load Balancer is common for production.

5) Integration with Azure Application Gateway

  • What it does: Distributes HTTP(S) traffic with L7 routing features; supports WAF SKUs.
  • Why it matters: Enables path-based routing, TLS termination, cookie affinity, WAF policies.
  • Practical benefit: Better for complex HTTP routing than a basic L4 load balancer.
  • Caveats: Costs and configuration complexity are higher than Load Balancer.

6) Availability Zones support (region permitting)

  • What it does: Spreads instances across zones to survive a zonal outage.
  • Why it matters: Improves resiliency and availability.
  • Practical benefit: Zone-redundant design for critical tiers.
  • Caveats: Zone support varies by region and SKU; verify.

7) VM image flexibility (Marketplace, custom images, Azure Compute Gallery)

  • What it does: Lets you use Microsoft/Azure Marketplace images or your own golden images.
  • Why it matters: Standardizes OS + packages + hardening.
  • Practical benefit: Faster provisioning, consistent security posture.
  • Caveats: Image versioning and rollout must be managed carefully.

8) Extensions and cloud-init/custom data

  • What it does: Bootstrap configuration at provision time (install packages, configure apps).
  • Why it matters: Reduces manual steps and drift.
  • Practical benefit: Deploy “ready-to-serve” instances automatically.
  • Caveats: Overusing extensions can slow provisioning; prefer baked images for large fleets.

9) Upgrade policies and rolling upgrades (mode-dependent)

  • What it does: Controls how updates are applied across instances.
  • Why it matters: Minimizes downtime and risk.
  • Practical benefit: Controlled rollout with capacity preserved.
  • Caveats: Behavior varies by orchestration mode and configuration; verify exact capabilities.

10) Automatic instance repair / health-based replacement (design-dependent)

  • What it does: Detects unhealthy instances and replaces or repairs them.
  • Why it matters: Improves availability without manual intervention.
  • Practical benefit: Faster recovery from node-level failures.
  • Caveats: “Unhealthy” depends on signals (LB probe, application health). Misconfigured probes can cause churn.

11) Spot instances (cost optimization pattern)

  • What it does: Allows using Azure Spot VMs in scale sets (when suitable).
  • Why it matters: Can reduce compute cost significantly for interruptible workloads.
  • Practical benefit: Great for batch, CI, and fault-tolerant workers.
  • Caveats: Spot VMs can be evicted; design for interruption.

12) Managed disks and disk options

  • What it does: Uses Azure Managed Disks for OS and data.
  • Why it matters: Managed disks simplify storage management and reliability.
  • Practical benefit: Standardized, managed storage with performance tiers.
  • Caveats: Disk costs and performance tiers can dominate cost; plan carefully.

13) Diagnostics and monitoring integration

  • What it does: Emits metrics/logs for performance, health, and operations.
  • Why it matters: Scaling and reliability depend on observability.
  • Practical benefit: Alert on CPU, memory, failed health probes, instance count, etc.
  • Caveats: Log ingestion has cost; define retention and sampling.

7. Architecture and How It Works

High-level architecture

A typical VMSS-based application tier looks like: – Clients connect to a stable endpoint (public IP, DNS, Application Gateway, Front Door). – Traffic is distributed to VMSS instances through a load balancer. – Instances run identical app versions (or controlled mix during upgrades). – Health probes determine which instances receive traffic. – Autoscale changes instance count based on demand metrics.

Request/data/control flow

  • Data plane (request path): 1. Client requests https://app.example.com 2. (Optional) Azure Front Door routes to a region 3. Application Gateway or Load Balancer forwards to VMSS instance 4. Instance processes request and interacts with data services (SQL, Storage, Redis)
  • Control plane (management path):
  • You change the VMSS model, instance count, or autoscale rules via Azure Portal, Azure CLI, ARM/Bicep, or Terraform.
  • Azure Resource Manager applies changes; VMSS orchestration creates/removes/updates instances.

Integrations with related services

Common integrations include: – Azure Load Balancer: L4 load distribution and health probes – Azure Application Gateway: L7 routing and WAF – Azure Front Door: global routing, caching, TLS at the edge (for multi-region) – Azure Monitor: metrics, autoscale, alerts – Log Analytics workspace: centralized logging (via Azure Monitor Agent) – Azure Key Vault: secrets/certs, often accessed using Managed Identity – Azure Compute Gallery: image management and versioning – Azure Policy: enforce tags, SKUs, security baselines – Microsoft Defender for Cloud: security posture management and recommendations (verify licensing requirements)

Dependency services

VMSS depends on: – Azure Compute capacity in the chosen region/SKU – Networking (VNet/subnet, NSG, route tables) – Storage for managed disks and (optionally) boot diagnostics – Identity (Entra ID / Azure AD) for RBAC and Managed Identity usage

Security/authentication model

  • Management access uses Azure RBAC (Entra ID identities, service principals, managed identities).
  • VM-to-Azure-service access commonly uses Managed Identity + role assignments.
  • Admin access to the OS uses SSH keys (Linux) or secure credential management (Windows), ideally without exposing public management ports.

Networking model

  • Instances attach NICs into a subnet.
  • Inbound traffic is typically via Load Balancer/App Gateway; instances usually do not need public IPs.
  • Outbound traffic depends on design: NAT Gateway, Load Balancer outbound rules, Azure Firewall, or direct SNAT behavior (choose explicitly for production).

Monitoring/logging/governance considerations

  • Metrics: VMSS instance count, CPU, network, disk, probe health.
  • Logs: OS/application logs via Azure Monitor Agent to Log Analytics.
  • Governance: consistent tags, naming, policy enforcement; track changes in Activity Log.

Simple architecture diagram (Mermaid)

flowchart LR
  U[Users] --> PIP[Public Endpoint]
  PIP --> LB[Azure Load Balancer]
  LB --> VMSS[Virtual Machine Scale Sets<br/>VM instances]
  VMSS --> DATA[(Data service<br/>e.g., Azure SQL/Storage)]
  VMSS --> MON[Azure Monitor]

Production-style architecture diagram (Mermaid)

flowchart TB
  Users[Internet Users] --> FD[Azure Front Door<br/>Global routing + TLS]
  FD --> WAF[Application Gateway (WAF)<br/>Regional L7 routing]
  WAF --> LB[Internal/Regional Load Balancer<br/>or App Gateway backend pool]

  subgraph VNet[Azure Virtual Network]
    subgraph WebSubnet[Web Subnet]
      VMSS[VMSS across Availability Zones]
    end

    subgraph DataSubnet[Data/Private Subnet]
      KV[Azure Key Vault<br/>(Private Endpoint)]
      ST[Azure Storage<br/>(Private Endpoint)]
      SQL[Azure SQL<br/>(Private Endpoint optional)]
    end

    VMSS --> KV
    VMSS --> ST
    VMSS --> SQL
  end

  VMSS --> AM[Azure Monitor Metrics]
  VMSS --> LA[Log Analytics Workspace]
  AM --> AS[Autoscale Rules]
  SOC[Security Team] --> Def[Defender for Cloud]
  Def --> VNet

8. Prerequisites

Account/subscription/tenant requirements

  • An active Azure subscription with billing enabled.
  • Access to an Azure resource group in a supported region.

Permissions / IAM roles

Minimum recommended permissions for the lab: – At subscription or resource-group scope: – Contributor (for creating resources), or – A combination of roles allowing Compute/Network/Monitor creation – If using policies/locked-down environments, you may need additional permissions for: – Creating public IPs / load balancers – Creating autoscale settings – Creating managed identities (optional)

Billing requirements

  • You will incur charges for:
  • VM compute (per second/minute depending on billing model—see VM pricing)
  • Managed disks
  • Public IP and Load Balancer (SKU dependent)
  • Monitoring/log ingestion if enabled

CLI/SDK/tools needed

Choose one: – Azure Cloud Shell (recommended for quick labs): https://shell.azure.com/ – Or install locally: – Azure CLI: https://learn.microsoft.com/cli/azure/install-azure-cli

Optional tools: – ssh client (Linux/macOS/Windows) – curl for HTTP tests

Region availability

  • Virtual Machine Scale Sets is widely available. Availability Zones and certain VM SKUs vary by region.
  • Verify region features: https://azure.microsoft.com/explore/global-infrastructure/products-by-region/

Quotas/limits

  • VMSS is constrained by:
  • Regional vCPU quota for the chosen VM family
  • Subscription and regional limits for public IPs, NICs, and other resources
  • VMSS-specific limits (instance count, placement behavior) that can vary by orchestration mode
    Verify limits: https://learn.microsoft.com/azure/azure-resource-manager/management/azure-subscription-service-limits

Prerequisite services

For common architectures: – Azure Virtual Network – Azure Load Balancer or Application Gateway – Azure Monitor (for autoscale/alerts)


9. Pricing / Cost

Pricing model (accurate, no fabricated numbers)

Virtual Machine Scale Sets itself does not have a separate “VMSS fee” in typical billing. You pay for the underlying resources you deploy, including:

  • Compute: VM instances (Azure Virtual Machines pricing by size/SKU, OS, region)
  • Storage: managed disks (OS and data), snapshots, images (if stored), and storage transactions where applicable
  • Networking:
  • Load Balancer (SKU and rules can affect cost)
  • Public IP addresses (SKU dependent)
  • Bandwidth/data transfer (egress is typically billable; ingress often free—verify current bandwidth pricing)
  • NAT Gateway/Azure Firewall (if used)
  • Monitoring:
  • Azure Monitor metrics are generally included at basic levels, but
  • Log Analytics ingestion and retention are billable
  • Optional security services:
  • Defender for Cloud plans (if enabled) can add cost—verify plan and scope

Official VMSS docs: https://learn.microsoft.com/azure/virtual-machine-scale-sets/
Azure Pricing Calculator: https://azure.microsoft.com/pricing/calculator/
Azure Virtual Machines pricing: https://azure.microsoft.com/pricing/details/virtual-machines/
Azure Load Balancer pricing: https://azure.microsoft.com/pricing/details/load-balancer/
Bandwidth pricing: https://azure.microsoft.com/pricing/details/bandwidth/
Managed Disks pricing: https://azure.microsoft.com/pricing/details/managed-disks/
Azure Monitor pricing: https://azure.microsoft.com/pricing/details/monitor/

Pricing dimensions

Key pricing dimensions to understand: – VM size (vCPU/RAM), region, and OS licensing (Linux vs Windows) – Disk type and size (Standard HDD/SSD, Premium SSD, Ultra Disk where applicable) – Instance count (baseline + peak) – Outbound data volume (internet egress) – Monitoring ingestion volume (GB/day) and retention period – Additional components (WAF, NAT Gateway, Firewall, Bastion)

Free tier

  • There is no universal “free tier” for VMSS fleets; some accounts may have Azure free credits. VM compute generally incurs charges.
  • Some services have limited free quotas (varies; verify in official pricing pages).

Cost drivers (what usually makes VMSS expensive)

  • Over-provisioning: running too many instances all the time
  • Wrong VM SKU: using larger VMs instead of scaling out smaller ones (or vice versa)
  • Premium disks everywhere: premium storage on non-critical tiers
  • Logging everything: verbose logs into Log Analytics without retention controls
  • Outbound traffic: significant egress to internet or cross-region transfers
  • Perimeter components: Application Gateway WAF, Azure Firewall, Front Door can exceed VM costs for small fleets

Hidden/indirect costs to watch

  • Public IP + Load Balancer SKU costs in production designs
  • Autoscale oscillation: frequent scale in/out can create churn and operational overhead
  • Image build pipeline: Compute Gallery storage and CI build minutes
  • Backup: Recovery Services vault or snapshots if used
  • Patch management tooling: Update management solutions and their data ingestion

Network/data transfer implications

  • Design for private endpoints and same-region dependencies to reduce egress/cross-region charges.
  • If clients are global, consider Front Door; but evaluate its cost against performance needs.
  • For outbound internet access, choose an explicit strategy (NAT Gateway/Firewall) to control SNAT and logging costs.

How to optimize cost (practical)

  • Set autoscale min/max and choose a realistic baseline.
  • Use scheduled scaling when patterns are predictable.
  • Consider Spot VMs for interruptible worker pools.
  • Right-size VM SKUs using observed metrics (CPU, memory, IO).
  • Choose disk tiers per workload and consider ephemeral OS disks where appropriate (verify suitability).
  • Apply log ingestion controls:
  • collect only needed logs
  • set retention policies
  • route high-volume logs to cheaper storage if appropriate

Example low-cost starter estimate (how to think about it)

A low-cost lab usually includes: – 1 small Standard Load Balancer + 1 Public IP – 2 small Linux VMs (e.g., B-series or small D-series—availability varies) – Standard OS disks – Minimal logging

Exact prices vary by region and VM SKU. Use the Pricing Calculator with your region and expected hours.

Example production cost considerations

In production, your monthly cost is often dominated by: – Baseline instance count * VM hourly rate – Premium storage for performance-sensitive apps – WAF/App Gateway/Front Door + logging – NAT Gateway/Firewall for controlled egress – Log Analytics ingestion and retention – Multi-zone or multi-region deployments (duplicated capacity)


10. Step-by-Step Hands-On Tutorial

This lab deploys a small, low-cost web fleet using Virtual Machine Scale Sets, installs NGINX automatically, and configures autoscale.

Objective

Deploy an Ubuntu-based Virtual Machine Scale Sets web tier behind an Azure Load Balancer, verify HTTP traffic distribution, and configure Azure Monitor autoscale rules.

Lab Overview

You will: 1. Create a resource group. 2. Create a cloud-init config to install and customize NGINX. 3. Create a Virtual Machine Scale Sets with a public load balancer endpoint. 4. Test access via the load balancer public IP/DNS. 5. Configure autoscale rules based on CPU. 6. (Optional) Generate CPU load to observe scaling. 7. Clean up all resources.

Estimated time: 30–60 minutes
Cost: Depends on VM SKU and runtime; keep instance counts small and delete resources afterward.

You can run this in Azure Cloud Shell (Bash) to avoid local setup.


Step 1: Set variables and create a resource group

In Cloud Shell (Bash) or your terminal:

# Change these to your preference
export LOCATION="eastus"
export RG="rg-vmss-lab"
export VMSS="vmss-web"
export ADMIN_USER="azureuser"

az account show >/dev/null 2>&1 || az login

az group create \
  --name "$RG" \
  --location "$LOCATION"

Expected outcome: A resource group exists in your chosen region.

Verify:

az group show --name "$RG" --query "{name:name, location:location}" -o table

Step 2: Create cloud-init to install NGINX and return instance identity

Create a file named cloud-init.txt:

cat > cloud-init.txt <<'EOF'
#cloud-config
package_update: true
packages:
  - nginx
write_files:
  - path: /var/www/html/index.html
    permissions: '0644'
    owner: root:root
    content: |
      <html>
      <head><title>VMSS NGINX</title></head>
      <body>
        <h1>Hello from Azure Virtual Machine Scale Sets</h1>
        <p>Hostname: <b>__HOSTNAME__</b></p>
      </body>
      </html>
runcmd:
  - sed -i "s/__HOSTNAME__/$(hostname)/g" /var/www/html/index.html
  - systemctl enable nginx
  - systemctl restart nginx
EOF

Expected outcome: A cloud-init file exists locally.

Verify:

head -n 20 cloud-init.txt

Step 3: Create the Virtual Machine Scale Sets

This command creates a VMSS and (for simplicity) also creates supporting resources such as: – VNet + subnet – NSG rules for inbound HTTP – Standard Load Balancer + public IP (depending on parameters)

Azure CLI behavior can change over time; if any parameter differs in your environment, verify with az vmss create -h and the official docs.

# DNS name must be globally unique in Azure for the chosen region
export DNS_NAME="vmsslab$RANDOM$RANDOM"

az vmss create \
  --resource-group "$RG" \
  --name "$VMSS" \
  --location "$LOCATION" \
  --image "Ubuntu2204" \
  --admin-username "$ADMIN_USER" \
  --generate-ssh-keys \
  --instance-count 2 \
  --vm-sku "Standard_B1s" \
  --upgrade-policy-mode "automatic" \
  --custom-data cloud-init.txt \
  --public-ip-address "" \
  --public-ip-address-allocation static \
  --public-ip-address-dns-name "$DNS_NAME" \
  --lb-sku "Standard"

Notes: – If Standard_B1s is not available in your region/subscription, pick another small SKU available to you (list SKUs with az vm list-skus). – The command intentionally keeps the fleet small.

Expected outcome: VMSS is created with 2 instances and an internet-facing endpoint.

Verify VMSS provisioning:

az vmss show -g "$RG" -n "$VMSS" --query "{name:name, location:location, sku:sku.name, capacity:sku.capacity}" -o table

List instances:

az vmss list-instances -g "$RG" -n "$VMSS" -o table

Step 4: Test the web endpoint

Get the public IP / FQDN:

az network public-ip list -g "$RG" --query "[].{name:name, ipAddress:ipAddress, fqdn:dnsSettings.fqdn}" -o table

Then test HTTP (use the FQDN from output):

export FQDN=$(az network public-ip list -g "$RG" --query "[0].dnsSettings.fqdn" -o tsv)
curl -s "http://$FQDN" | head

Expected outcome: You receive an HTML page saying “Hello from Azure Virtual Machine Scale Sets” and the hostname.

To see load distribution, run a few requests:

for i in {1..10}; do
  curl -s "http://$FQDN" | grep Hostname
done

You should see hostnames from different instances (not guaranteed every time, depending on connection reuse and LB behavior).


Step 5: Configure autoscale rules (CPU-based)

Create an autoscale setting targeting the VMSS:

az monitor autoscale create \
  --resource-group "$RG" \
  --resource "/subscriptions/$(az account show --query id -o tsv)/resourceGroups/$RG/providers/Microsoft.Compute/virtualMachineScaleSets/$VMSS" \
  --name "as-$VMSS" \
  --min-count 2 \
  --max-count 5 \
  --count 2

Add a rule to scale out when average CPU is high:

az monitor autoscale rule create \
  --resource-group "$RG" \
  --autoscale-name "as-$VMSS" \
  --condition "Percentage CPU > 70 avg 5m" \
  --scale out 1

Add a rule to scale in when average CPU is low:

az monitor autoscale rule create \
  --resource-group "$RG" \
  --autoscale-name "as-$VMSS" \
  --condition "Percentage CPU < 30 avg 10m" \
  --scale in 1

Expected outcome: Autoscale configuration exists and will adjust capacity between 2 and 5.

Verify autoscale configuration:

az monitor autoscale show -g "$RG" -n "as-$VMSS" --query "{name:name, enabled:enabled, profiles:profiles[].capacity}" -o json

Step 6 (Optional): Generate CPU load to observe scaling

To trigger scaling, you can run a short CPU stress test on each instance using VMSS Run Command.

List instance IDs:

az vmss list-instances -g "$RG" -n "$VMSS" --query "[].instanceId" -o tsv

Run stress on each instance (this can take a few minutes):

for id in $(az vmss list-instances -g "$RG" -n "$VMSS" --query "[].instanceId" -o tsv); do
  az vmss run-command invoke \
    -g "$RG" -n "$VMSS" \
    --instance-id "$id" \
    --command-id RunShellScript \
    --scripts "sudo apt-get update && sudo apt-get -y install stress-ng && stress-ng --cpu 2 --timeout 300s" \
    --query "value[0].message" -o tsv
done

Then periodically check instance count:

watch -n 20 "az vmss show -g $RG -n $VMSS --query sku.capacity -o tsv"

Expected outcome: After several minutes (autoscale evaluation periods apply), the instance count may increase. Scaling is not instantaneous.


Validation

  1. VMSS exists and has instances bash az vmss list-instances -g "$RG" -n "$VMSS" -o table

  2. Web endpoint works bash curl -I "http://$FQDN"

  3. Autoscale is configured bash az monitor autoscale show -g "$RG" -n "as-$VMSS" -o table

  4. (Optional) Observe scaling events – In Azure Portal: VMSS → Scaling / Insights – In Azure Monitor: check metrics for CPU and capacity


Troubleshooting

Common issues and fixes:

  1. DNS name not unique – Error: conflict on public IP DNS label. – Fix: change DNS_NAME and re-run create.

  2. VM SKU not available / quota exceeded – Error: “SKU not available” or “Operation could not be completed as it results in exceeding approved Total Regional Cores quota”. – Fix:

    • Choose a different SKU or region.
    • Request quota increase in Azure Portal (Subscription → Usage + quotas).
  3. Can’t reach the website – Causes:

    • NGINX install failed
    • NSG missing inbound 80 rule
    • Load balancer rule/probe misconfigured (if you built custom LB)
    • Fix:
    • Check VMSS instance boot output/logs (cloud-init logs on Ubuntu: /var/log/cloud-init.log)
    • Use Run Command to inspect: bash az vmss run-command invoke -g "$RG" -n "$VMSS" --instance-id 0 \ --command-id RunShellScript --scripts "systemctl status nginx --no-pager; curl -I http://localhost"
  4. Autoscale not triggering – Reasons:

    • CPU not high enough long enough
    • cooldown periods / evaluation windows not met
    • Metric collection delays
    • Fix:
    • Wait longer, increase stress duration, or adjust thresholds conservatively.

Cleanup

Delete the resource group to remove all lab resources:

az group delete --name "$RG" --yes --no-wait

Verify deletion (optional):

az group exists --name "$RG"

11. Best Practices

Architecture best practices

  • Put VMSS instances behind Load Balancer (L4) or Application Gateway (L7) to avoid direct exposure.
  • Prefer stateless designs:
  • Store session state in Redis/cache
  • Store files in Azure Storage
  • Store data in managed databases
  • Use Availability Zones for higher resiliency where supported.
  • Consider multi-region for critical workloads (Front Door + separate regional stacks).

IAM/security best practices

  • Use least privilege RBAC:
  • Separate roles for deployers vs operators vs auditors
  • Use Managed Identity for VMSS to access Key Vault/Storage without secrets in code.
  • Avoid opening SSH/RDP to the internet; use:
  • Azure Bastion
  • Just-in-time (JIT) access (Defender for Cloud, where applicable)
  • Private connectivity + jump host
  • Use Azure Policy to enforce:
  • Approved VM SKUs/images
  • Required tags
  • Disk encryption settings (where applicable)
  • No public IPs on NICs (common requirement)

Cost best practices

  • Set autoscale minimum to what you truly need for baseline availability.
  • Use scheduled autoscale for predictable patterns.
  • Consider Spot VMSS for interruptible compute pools.
  • Right-size disks and limit Log Analytics ingestion/retention.

Performance best practices

  • Choose VM SKUs based on CPU/memory/network needs; validate with load testing.
  • Ensure your load balancer health probes match application readiness.
  • Use connection pooling wisely; avoid long-lived connections if they cause uneven load distribution.

Reliability best practices

  • Design for instance replacement:
  • Instances can be reimaged or replaced during updates or failures.
  • Use health probes and (where supported) application health reporting.
  • Use rolling deployments and keep enough capacity during upgrades.

Operations best practices

  • Standardize images through Azure Compute Gallery and a CI pipeline.
  • Implement patch strategy:
  • Image-based patching or update management (verify current Azure offerings and your org requirements).
  • Capture logs/metrics centrally and define SLO-based alerts (latency, error rate, saturation).
  • Use tags for ownership, environment, cost center, and data classification.

Governance/tagging/naming best practices

  • Naming:
  • vmss-<app>-<env>-<region>
  • Tags (minimum):
  • env, owner, costCenter, app, dataClass, lifecycle
  • Lock critical production resource groups where appropriate (with clear change processes).

12. Security Considerations

Identity and access model

  • Azure RBAC controls management operations on VMSS and related resources.
  • Use Managed Identity (system-assigned or user-assigned) for:
  • Key Vault secret retrieval
  • Storage access
  • Metric publishing (custom patterns)
  • Avoid embedding credentials in:
  • custom-data
  • scripts
  • VM images

Encryption

  • Data at rest:
  • Managed disks support server-side encryption by default (details depend on configuration—verify).
  • For stricter requirements, evaluate customer-managed keys (CMK) and disk encryption options.
  • Data in transit:
  • Use TLS end-to-end for HTTP apps.
  • Terminate TLS at Application Gateway/Front Door when appropriate and re-encrypt to backends if needed.

Network exposure

  • Prefer no public IPs on instances.
  • Restrict inbound traffic with NSGs:
  • allow only from load balancer/app gateway subnets
  • deny all other inbound
  • Control outbound:
  • NAT Gateway or Azure Firewall for consistent egress and logging
  • private endpoints for PaaS dependencies

Secrets handling

  • Use Key Vault + Managed Identity.
  • Rotate secrets and certificates.
  • Use short-lived credentials where feasible.

Audit/logging

  • Use Azure Activity Log for control-plane auditing.
  • Enable diagnostic settings for relevant resources (Load Balancer, Application Gateway, Key Vault).
  • Centralize logs in Log Analytics or a SIEM (Microsoft Sentinel) based on your org policy.

Compliance considerations

  • Data residency: keep compute and data in approved regions.
  • Use Azure Policy and Blueprints-like patterns (Azure Policy initiatives) to enforce controls.
  • Maintain patching, vulnerability scanning, and baseline hardening documentation.

Common security mistakes

  • Exposing SSH/RDP to 0.0.0.0/0
  • Storing secrets in cloud-init/custom scripts
  • No egress control (data exfil risk)
  • Over-permissive RBAC (e.g., everyone is Owner)
  • Lack of logging/retention strategy (either none, or excessive cost)

Secure deployment recommendations

  • Private subnet + load balancer/app gateway in a controlled DMZ.
  • Managed Identity + Key Vault references.
  • Explicit outbound path (NAT Gateway/Firewall) and private endpoints to dependencies.
  • Regular image rebuild pipeline (patched golden images).

13. Limitations and Gotchas

Limits and behaviors change over time and vary by orchestration mode and region. Verify in official docs for your chosen mode and SKU.

  • Orchestration mode differences: Uniform and Flexible do not offer identical feature sets.
  • Quota constraints: Regional vCPU quotas are the most common deployment blocker.
  • SKU availability: Some VM sizes (GPU, memory optimized) are not in all regions.
  • Scaling is not instantaneous: Provisioning new VMs can take minutes, especially with many extensions.
  • Health probe pitfalls: Misconfigured probes can route traffic to unhealthy instances or cause unnecessary replacements.
  • Statefulness issues: Local state disappears when instances are replaced; design for statelessness.
  • Outbound networking surprises: Default SNAT/outbound behavior can cause port exhaustion or unpredictable egress IPs; use explicit egress architecture for production.
  • Logging costs: High-volume logs to Log Analytics can become a major cost driver.
  • Image rollout discipline: Updating images without a rollout plan can cause fleet-wide regressions.

14. Comparison with Alternatives

Virtual Machine Scale Sets is one option in Azure Compute. Here’s how it compares to common alternatives in Azure and other clouds.

Option Best For Strengths Weaknesses When to Choose
Azure Virtual Machine Scale Sets VM fleets needing autoscale and centralized management VM-level control + scaling + LB integration More ops than PaaS; patching/image mgmt is on you When you need VM access, custom OS deps, or lift-and-improve
Azure Virtual Machines (single VMs) Single server workloads Simple, direct No fleet management; manual scaling When you only need 1–2 servers or special snowflake workloads
Azure App Service Web apps/APIs with minimal infra management PaaS simplicity, fast deploy, built-in scaling options Less OS control; platform constraints When your app fits the supported runtimes and you want less ops
Azure Kubernetes Service (AKS) Container orchestration at scale Powerful scheduling, rolling deploys, service mesh ecosystem Complexity, cluster operations When you are container-native and need Kubernetes capabilities
Azure Functions Event-driven serverless compute Scale-to-zero, pay-per-execution patterns Runtime constraints, cold starts For event-driven workloads and bursty execution
Azure Container Apps Managed containers without full Kubernetes ops Simpler than AKS, autoscaling incl. KEDA patterns Platform constraints, still evolving features When you want containers + scaling with minimal cluster management
AWS Auto Scaling Groups (EC2) AWS VM fleets Mature autoscaling for EC2 Different ecosystem; migration effort Choose when workload is on AWS
Google Managed Instance Groups (MIG) GCP VM fleets Strong VM fleet patterns Different ecosystem Choose when workload is on GCP
Self-managed (VMs + scripts) Small fleets, bespoke needs Maximum control High toil and risk Only when tooling constraints require it

15. Real-World Example

Enterprise example: Zonal web/API tier with compliance controls

  • Problem: A regulated enterprise needs a scalable API tier with tight network controls, auditability, and predictable rollouts.
  • Proposed architecture:
  • Azure Front Door (global entry) → Application Gateway WAF (regional) → VMSS (zonal) in private subnet
  • Private endpoints to Key Vault, Storage, and Azure SQL
  • Azure Monitor + Log Analytics + alerting; Activity Log retention to centralized logging
  • Golden images stored in Azure Compute Gallery; deployments via CI/CD
  • Why VMSS was chosen:
  • VM-level control required for custom agents and hardened baselines
  • Autoscale supports variable demand
  • Fits hub-and-spoke network governance
  • Expected outcomes:
  • Reduced operational overhead vs individual VMs
  • Improved resiliency across zones
  • Standardized patching via image pipeline and controlled rollouts

Startup/small-team example: Cost-controlled API service

  • Problem: A startup needs a scalable API with predictable costs and simple operations, but requires VM-level control for a specialized dependency.
  • Proposed architecture:
  • Standard Load Balancer + VMSS with 2–6 instances
  • Azure Database for PostgreSQL (managed) for persistence
  • Autoscale based on CPU; scheduled scale-in at night
  • Minimal logging with targeted metrics and alerts
  • Why VMSS was chosen:
  • App Service wasn’t suitable due to OS-level dependency
  • VMSS offers autoscale without building custom orchestration
  • Expected outcomes:
  • Meets demand spikes without constant overprovisioning
  • Small baseline cost with growth path
  • Simple “one resource” compute fleet management

16. FAQ

  1. Is Virtual Machine Scale Sets a PaaS service?
    No. VMSS is an IaaS fleet orchestration service for Azure Virtual Machines. You still manage OS/application configuration (often via images and automation).

  2. Do I pay extra for VMSS itself?
    Typically, no separate VMSS fee—you pay for VMs, disks, networking, and monitoring resources you deploy.

  3. What’s the difference between Uniform and Flexible orchestration?
    They represent different ways Azure manages instances and features. The best choice depends on upgrade behavior, VM feature parity needs, and operational model. Verify the latest capability matrix in official docs because details evolve.

  4. Can VMSS scale to zero?
    Some designs allow scaling very low, but whether “zero” is appropriate depends on your inbound traffic model and architecture. For always-on endpoints, you usually keep at least 1–2 instances.

  5. How does autoscale decide when to add instances?
    Autoscale uses Azure Monitor metrics and rules (thresholds and time windows). There can be a delay due to metric aggregation and VM provisioning time.

  6. Do VMSS instances need public IPs?
    Usually no. Best practice is to put instances in a private subnet and expose only a load balancer/application gateway.

  7. How do I deploy application code to VMSS instances?
    Common patterns: – Bake a golden VM image (Compute Gallery) – Use cloud-init or VM extensions to pull artifacts at boot – Use a configuration management tool (Ansible/Chef/Puppet)
    For large fleets, image-based is often more consistent.

  8. How do rolling upgrades work?
    Rolling upgrades update instances gradually based on policy (mode-dependent). You aim to keep enough healthy capacity while updating. Verify exact behavior in your orchestration mode.

  9. Can I use a custom image?
    Yes. VMSS supports marketplace images and custom images; Azure Compute Gallery is a common enterprise approach for versioned images.

  10. How do I monitor VMSS health?
    Use: – Load balancer/application gateway health probes – Azure Monitor metrics (CPU, network, disk) – Guest OS logs via Azure Monitor Agent to Log Analytics

  11. What happens if an instance is unhealthy?
    It may be removed from load balancer rotation (probe failure). Depending on configuration, the platform may also repair/replace instances (capability varies—verify).

  12. Is VMSS suitable for stateful workloads?
    It’s best for stateless workloads. Stateful workloads require careful design (externalized state, durable storage, controlled instance replacement).

  13. Can VMSS run Windows Server?
    Yes, VMSS supports Windows and Linux images (licensing and pricing differ).

  14. How do I control outbound IP addresses?
    Use explicit egress architecture such as NAT Gateway or Azure Firewall depending on requirements. Avoid relying on defaults for production.

  15. What’s the simplest way to get started?
    Use Azure CLI az vmss create to deploy a small scale set behind a standard load balancer, then add autoscale rules and basic monitoring.


17. Top Online Resources to Learn Virtual Machine Scale Sets

Resource Type Name Why It Is Useful
Official documentation Virtual Machine Scale Sets documentation (Learn) – https://learn.microsoft.com/azure/virtual-machine-scale-sets/ Canonical docs: concepts, orchestration modes, scaling, upgrades
Official quickstarts Search VMSS quickstarts on Learn – https://learn.microsoft.com/azure/virtual-machine-scale-sets/ Step-by-step deployment guides and examples
Official pricing Azure Virtual Machines pricing – https://azure.microsoft.com/pricing/details/virtual-machines/ VM compute pricing (main cost component)
Official pricing Azure Load Balancer pricing – https://azure.microsoft.com/pricing/details/load-balancer/ Understand LB SKU cost implications
Official pricing Azure Managed Disks pricing – https://azure.microsoft.com/pricing/details/managed-disks/ Disk tier/size cost planning
Official pricing Azure Monitor pricing – https://azure.microsoft.com/pricing/details/monitor/ Log ingestion/retention cost planning
Official tool Azure Pricing Calculator – https://azure.microsoft.com/pricing/calculator/ Build region/SKU-specific estimates
Architecture guidance Azure Architecture Center – https://learn.microsoft.com/azure/architecture/ Reference architectures and best practices for scalable designs
CLI reference az vmss command group – https://learn.microsoft.com/cli/azure/vmss Up-to-date CLI syntax for scripting deployments
Samples Azure Quickstart Templates (ARM) – https://github.com/Azure/azure-quickstart-templates Many infrastructure patterns include VMSS examples (verify template currency)
Videos Microsoft Azure YouTube – https://www.youtube.com/@MicrosoftAzure Product walkthroughs; verify recency of specific VMSS content
Community learning Microsoft Q&A (Azure) – https://learn.microsoft.com/answers/topics/azure-virtual-machines.html Real troubleshooting threads; validate answers against docs

18. Training and Certification Providers

Institute Suitable Audience Likely Learning Focus Mode Website URL
DevOpsSchool.com DevOps engineers, SREs, platform teams Azure DevOps, infrastructure automation, operational practices Check website https://www.devopsschool.com/
ScmGalaxy.com Beginners to intermediate engineers DevOps fundamentals, tools, CI/CD concepts Check website https://www.scmgalaxy.com/
CLoudOpsNow.in Cloud engineers, operations teams Cloud operations, monitoring, reliability practices Check website https://www.cloudopsnow.in/
SreSchool.com SREs, platform engineers SRE principles, incident response, observability Check website https://www.sreschool.com/
AiOpsSchool.com Ops/SRE teams exploring AIOps AIOps concepts, monitoring automation, operations analytics Check website https://www.aiopsschool.com/

19. Top Trainers

Platform/Site Likely Specialization Suitable Audience Website URL
RajeshKumar.xyz Cloud/DevOps training content (verify offerings) Beginners to intermediate https://www.rajeshkumar.xyz/
devopstrainer.in DevOps training (tools + cloud) DevOps engineers, students https://www.devopstrainer.in/
devopsfreelancer.com Freelance DevOps consulting/training platform (verify services) Teams seeking short-term expertise https://www.devopsfreelancer.com/
devopssupport.in DevOps support and training resources (verify services) Ops teams needing troubleshooting help https://www.devopssupport.in/

20. Top Consulting Companies

Company Name Likely Service Area Where They May Help Consulting Use Case Examples Website URL
cotocus.com Cloud/DevOps consulting (verify offerings) Architecture, automation, deployments VMSS landing zones, autoscale setups, monitoring baselines https://cotocus.com/
DevOpsSchool.com DevOps and cloud consulting/training Delivery enablement, DevOps transformation CI/CD + image pipelines, VMSS operational runbooks https://www.devopsschool.com/
DEVOPSCONSULTING.IN DevOps consulting (verify offerings) Toolchain integration, infrastructure automation IaC standardization, security hardening, cost optimization reviews https://www.devopsconsulting.in/

21. Career and Learning Roadmap

What to learn before Virtual Machine Scale Sets

  • Azure fundamentals:
  • subscriptions, resource groups, regions
  • Azure RBAC and Entra ID basics
  • Azure networking:
  • VNets, subnets, NSGs, routing
  • Load Balancer basics and health probes
  • Virtual machine basics:
  • Linux/Windows administration fundamentals
  • SSH keys, package management
  • Infrastructure as Code basics:
  • ARM/Bicep or Terraform fundamentals
  • Monitoring basics:
  • metrics vs logs, alerts, dashboards

What to learn after Virtual Machine Scale Sets

  • Image pipelines:
  • Azure Compute Gallery
  • Packer-based image builds (common pattern)
  • Advanced routing and security:
  • Application Gateway (WAF), Front Door
  • Private endpoints, NAT Gateway, Azure Firewall
  • Operational excellence:
  • SRE practices: SLOs, error budgets, incident management
  • Azure Monitor at scale, Log Analytics cost management
  • Modernization paths:
  • AKS (if moving to containers)
  • App Service / Container Apps (if moving to PaaS)

Job roles that use VMSS

  • Cloud engineer / cloud infrastructure engineer
  • DevOps engineer
  • Site Reliability Engineer (SRE)
  • Platform engineer
  • Solutions architect (designing scalable compute tiers)
  • Security engineer (hardening and governance patterns)

Certification path (Azure)

Depending on your goals, consider Microsoft certifications that cover Azure compute/networking/operations. Start with: – AZ-900 (Azure Fundamentals) for baseline concepts – Role-based certifications (administrator/architect/devops) depending on your track
Verify current certification names and objectives here: https://learn.microsoft.com/credentials/

Project ideas for practice

  • Build a VMSS-based web tier with:
  • autoscale rules
  • rolling upgrades via new image versions
  • blue/green deployment using two scale sets behind Application Gateway
  • Create a queue worker VMSS:
  • poll Azure Service Bus
  • scale based on custom metric (verify approach)
  • Secure VMSS design:
  • private subnet
  • Key Vault secrets via Managed Identity
  • private endpoints for Storage/SQL
  • centralized egress through NAT Gateway/Firewall

22. Glossary

  • VMSS (Virtual Machine Scale Sets): Azure service for managing and scaling a group of VMs as one unit.
  • Instance: A single VM created as part of a scale set.
  • Autoscale: Automatic adjustment of instance count based on metrics or schedules.
  • Azure Monitor: Platform service for metrics, logs, alerting, and autoscale settings.
  • Health probe: A periodic check (e.g., HTTP/TCP) used by a load balancer/application gateway to decide if an instance should receive traffic.
  • Azure Load Balancer: Layer 4 load balancer (TCP/UDP) often used with VMSS.
  • Azure Application Gateway: Layer 7 HTTP(S) load balancer with routing features; can include WAF.
  • Availability Zone: Physically separate datacenter locations within an Azure region.
  • Azure Compute Gallery: Service for managing and sharing custom VM images (official name; formerly Shared Image Gallery).
  • Managed Disk: Azure-managed block storage for VM OS and data disks.
  • NSG (Network Security Group): Stateful firewall rules for subnets/NICs.
  • Managed Identity: Azure identity for a resource to authenticate to other Azure services without storing credentials.
  • Golden image: A prebuilt VM image containing OS patches and required software, used to create consistent instances.
  • Egress: Outbound network traffic from Azure to the internet or other destinations (often billable).

23. Summary

Azure Virtual Machine Scale Sets is a Compute service for running scalable VM fleets with centralized configuration, health-aware load balancing integration, and autoscaling through Azure Monitor. It fits best when you need VM-level control (custom OS dependencies, agents, legacy software) while still wanting modern cloud patterns like horizontal scaling and automated instance management.

Cost is primarily driven by VM compute, disks, networking (load balancer/public IP/egress), and monitoring logs—not by VMSS itself as a separate line item. Security success depends on avoiding public management exposure, using Managed Identity, controlling egress, and standardizing hardened images with strong governance.

Use VMSS for stateless web/API tiers, worker pools, CI agents, and batch compute where instance replacement is acceptable. If you want less infrastructure management, evaluate Azure PaaS options; if you’re container-native at scale, evaluate AKS or Container Apps.

Next step: extend the lab by introducing a custom image pipeline (Azure Compute Gallery) and implementing a blue/green deployment with two scale sets and controlled traffic shifting.