Category
Containers
1. Introduction
What this service is
Azure Kubernetes Fleet Manager is an Azure service for managing multiple Kubernetes clusters (typically multiple AKS clusters) as a single “fleet” so platform teams can apply consistent configuration and orchestrate operations across many clusters.
Simple explanation (one paragraph)
If you run more than one AKS cluster—across regions, environments (dev/test/prod), or tenants—Azure Kubernetes Fleet Manager helps you treat them as one logical group. Instead of repeating the same deployment or operational action per cluster, you manage at the fleet level and let the service coordinate across member clusters.
Technical explanation (one paragraph)
Technically, Azure Kubernetes Fleet Manager introduces an Azure Fleet resource that can register multiple Kubernetes clusters as fleet members and, depending on enabled capabilities, provide mechanisms for multi-cluster resource placement/propagation and orchestrated operations across those member clusters. Some features use a dedicated “hub” concept (often implemented as a Kubernetes API surface you interact with using kubectl) to define placement policies and observe propagation status.
What problem it solves
Organizations adopting Kubernetes at scale quickly run into a “multi-cluster tax”:
- Repeating deployments and policy changes across clusters
- Maintaining consistency between environments and regions
- Coordinating staged rollouts/upgrades safely
- Reducing operational overhead while still keeping clusters separated for isolation, compliance, and blast-radius control
Azure Kubernetes Fleet Manager targets these pain points by adding a fleet-level management plane for many clusters.
Service status note: Azure features and command surfaces evolve quickly. Verify the latest feature availability, API versions, and CLI extension requirements in the official documentation before production use: https://learn.microsoft.com/azure/kubernetes-fleet/
2. What is Azure Kubernetes Fleet Manager?
Official purpose
Azure Kubernetes Fleet Manager is designed to help you manage multiple Kubernetes clusters (commonly multiple Azure Kubernetes Service (AKS) clusters) through a single fleet abstraction, improving consistency and reducing repetitive operational work.
Core capabilities (high level)
Commonly documented capabilities include:
- Fleet resource to represent a logical group of clusters
- Membership: add/remove clusters as fleet members
- Multi-cluster resource placement/propagation (often via fleet “hub” APIs) so Kubernetes objects can be applied across selected member clusters
- Orchestrated operations across clusters (for example, coordinated workflows across a set of clusters—verify the current supported operations in official docs)
Major components
While exact implementation details and naming can evolve, you should expect the following conceptual components:
- Fleet (Azure resource): the management object in Azure
- Fleet members: the clusters registered into the fleet
- Fleet hub (capability-dependent): a Kubernetes API endpoint used to define placements and view status (when enabled/required)
- Placement and scheduling constructs: Kubernetes custom resources (CRDs) used to select clusters and propagate resources (API group/version can change—verify on your hub cluster)
Service type
- Type: Azure managed control-plane service for multi-cluster management (integrates with Kubernetes APIs and Azure ARM)
- Client interfaces: Azure Portal, Azure CLI (often via an extension), ARM/Bicep/Terraform (provider support varies by maturity—verify), and
kubectlfor hub-side CRDs
Scope: regional/global and resource scope
- Typically created as an Azure resource in a resource group within a subscription.
- The fleet resource is created in a specific Azure region, but it can logically manage clusters across regions (capabilities and support matrix may vary—verify).
- Fleet membership is generally subscription-scoped by RBAC and permissions; cross-subscription or cross-tenant scenarios may be possible but require explicit permissions and are subject to product support boundaries—verify in docs.
How it fits into the Azure ecosystem
Azure Kubernetes Fleet Manager typically complements:
- AKS for cluster hosting
- Azure RBAC / Microsoft Entra ID for access control
- Azure Policy for Kubernetes / Gatekeeper for governance (depending on your approach)
- Azure Monitor (Container insights) for monitoring
- GitOps tooling (Flux/Argo CD) as an alternative or companion to fleet placement, depending on operational model
3. Why use Azure Kubernetes Fleet Manager?
Business reasons
- Lower operational overhead: fewer manual, repeated cluster-by-cluster actions
- Standardization: enforce baseline configurations and reduce configuration drift
- Faster delivery: push platform changes or common apps to many clusters with less friction
- Risk reduction: staged rollouts and consistent practices reduce outages caused by “one-off” changes
Technical reasons
- Multi-cluster application placement: define where Kubernetes objects should run using selectors/labels rather than hard-coding cluster lists
- Separation with central control: maintain multiple clusters for isolation (regional, environment, compliance) while managing them consistently
- Declarative workflows: use Kubernetes-style APIs (on the hub) to express intent and observe status
Operational reasons
- Fleet-wide visibility (capability-dependent): status of placement/propagation and membership from one place
- Repeatability: fewer “runbook forks” per cluster
- Scalability of operations: patterns that work for 2 clusters can extend to 20+ clusters
Security/compliance reasons
- Reduced human access: instead of logging into every cluster, you can centralize certain management actions
- Consistent policy distribution: distribute baseline policies/guardrails across clusters (when used with policy tooling)
- Auditability: leverage Azure activity logs + Kubernetes audit logs (if enabled) for change tracking
Scalability/performance reasons
- Helps you scale Kubernetes operations without building your own multi-cluster control plane.
- Supports architectures where workloads are intentionally spread across multiple clusters to meet latency, availability, or compliance goals.
When teams should choose it
Choose Azure Kubernetes Fleet Manager when you have:
- Multiple AKS clusters (now or soon)
- A platform team responsible for consistent baseline capabilities across clusters
- The need to deploy shared components (ingress controllers, policies, agents, namespaces, config) to multiple clusters
- A desire for staged, controlled changes across environments/regions
When teams should not choose it
Avoid or defer Azure Kubernetes Fleet Manager when:
- You only have a single AKS cluster and don’t expect to grow soon
- You require an advanced multi-cluster service mesh or global traffic routing (Fleet Manager is not a service mesh or global load balancer)
- Your org already standardized on another multi-cluster platform (Anthos, Rancher, Open Cluster Management, Karmada) and migration cost outweighs benefits
- You need to manage many non-AKS clusters and the service support matrix doesn’t match your estate (verify supported cluster types)
4. Where is Azure Kubernetes Fleet Manager used?
Industries
- SaaS and software companies running multi-region services
- Financial services and healthcare (environment isolation, compliance boundaries)
- Retail and media (spiky traffic, regional presence)
- Manufacturing/IoT (regional plants, edge-ish patterns—often hybrid with Arc; verify fit)
Team types
- Platform engineering teams
- SRE/operations teams
- DevOps and release engineering teams
- Security engineering and compliance teams (baseline controls)
Workloads
- Microservices deployed across multiple clusters
- Shared platform add-ons (ingress, cert-manager, external-dns, monitoring agents)
- Regionalized workloads (data residency, latency)
- Multi-environment deployments (dev/test/stage/prod)
Architectures
- Multi-region active/active: same app stack in multiple regions
- Multi-cluster per environment: strict separation for compliance
- Cell-based architecture: many similar clusters (“cells”) for scale-out and blast-radius control
- Tenant isolation: separate clusters per customer/tenant for higher isolation
Real-world deployment contexts
- Central platform team curates cluster baselines; app teams deploy workloads to namespaces
- Shared add-ons pushed to many clusters with controlled placement rules
- DR strategy where identical stacks exist in secondary regions
Production vs dev/test usage
- Dev/test: distribute “golden” namespaces, RBAC, and toolchains; validate placement behavior
- Production: carefully controlled rollouts with strong governance and change management, plus strict RBAC boundaries
5. Top Use Cases and Scenarios
Below are realistic scenarios where Azure Kubernetes Fleet Manager is commonly considered. Exact feature support can vary; verify the specific capability in official docs.
1) Deploy a shared baseline namespace to every cluster
- Problem: You need standard namespaces (
platform-system,observability,security) across clusters. - Why this service fits: Fleet-level placement can propagate namespace objects consistently.
- Example: Automatically ensure
observabilitynamespace exists on all prod clusters in all regions.
2) Roll out a cluster-wide ConfigMap or Secret pattern (with external secret storage)
- Problem: Repeating “same config everywhere” changes causes drift.
- Why it fits: Fleet placement supports consistent distribution (secrets should be handled carefully—see Security).
- Example: Distribute a non-sensitive ConfigMap containing organization-wide proxy settings.
3) Publish a common ingress controller configuration across clusters
- Problem: Each cluster ends up with slightly different ingress annotations and defaults.
- Why it fits: Place ingress-related Kubernetes resources to selected clusters using labels (e.g.,
env=prod). - Example: Ensure all internet-facing clusters share the same ingress class defaults.
4) Multi-region “cell” deployment of a stateless microservice
- Problem: You run the same service in many clusters and need consistent deployment manifests.
- Why it fits: Define the deployment once and place it to multiple member clusters.
- Example: A public API service runs in 6 regional clusters; fleet placement selects them by label
tier=api.
5) Progressive rollout across environments
- Problem: You want dev → staging → prod promotion without re-authoring per cluster.
- Why it fits: Use cluster labels and multiple placement objects (or selectors) per environment.
- Example: A new sidecar configuration rolls to
env=devfirst, thenenv=stage, thenenv=prod.
6) Disaster recovery readiness validation
- Problem: DR clusters lag behind and are missing key objects until failover day.
- Why it fits: Keep DR clusters as fleet members and continuously place baseline resources.
- Example: Ensure DR clusters always have the same namespace, RBAC, and service accounts as primary.
7) Standardize cluster add-ons (observability, security agents)
- Problem: Monitoring/logging agents vary across clusters, breaking dashboards and alerts.
- Why it fits: Place add-on manifests to selected clusters (or all).
- Example: Standardize Azure Monitor agents or Prometheus exporters across all clusters.
8) Run a centralized policy bundle to enforce guardrails everywhere
- Problem: Policies are applied inconsistently, leading to non-compliant workloads.
- Why it fits: Fleet placement can distribute policy-related manifests (actual enforcement depends on the policy engine).
- Example: Distribute Gatekeeper constraints and templates (if you use Gatekeeper) across clusters.
9) Manage clusters by “capability groups”
- Problem: Not every cluster should receive every component (GPU, PCI workloads, internet-facing).
- Why it fits: Use labels/selectors to target only capable clusters.
- Example: Place GPU device plugin only to clusters labeled
hardware=gpu.
10) Reduce toil for many small AKS clusters
- Problem: You intentionally run many smaller clusters for blast radius, but operations don’t scale.
- Why it fits: Fleet abstractions help you scale repeated actions and baseline configuration.
- Example: A SaaS provider with 30 clusters uses fleet placement to keep them consistent.
11) Separation of duties (platform vs app teams)
- Problem: Platform team needs to manage shared components without full admin access everywhere.
- Why it fits: Centralize specific changes through fleet processes with tight RBAC.
- Example: Platform engineers can manage baseline manifests via hub access; app teams only touch their namespaces.
12) Standardize environment bootstrapping for new clusters
- Problem: New clusters take days to become “ready” due to manual setup.
- Why it fits: Add the cluster to the fleet and let baseline placements populate required objects.
- Example: New regional cluster joins fleet, automatically receives namespaces, network policies, and observability.
6. Core Features
Note: Feature names, API versions, and required components can differ by release wave. Confirm the latest capabilities at https://learn.microsoft.com/azure/kubernetes-fleet/
Fleet resource (logical grouping)
- What it does: Represents a collection of Kubernetes clusters as one fleet.
- Why it matters: It becomes the anchor point for membership and fleet-level operations.
- Practical benefit: A single inventory of clusters for platform workflows.
- Caveats: Cross-subscription/tenant use cases may require additional configuration and may not be supported in all scenarios—verify.
Fleet membership management
- What it does: Adds/removes member clusters to/from the fleet.
- Why it matters: Membership is the foundation for any multi-cluster operation.
- Practical benefit: Standard onboarding/offboarding flow for clusters.
- Caveats: Requires permissions both in Azure (ARM) and within the cluster (Kubernetes RBAC), depending on operation.
Fleet hub (capability-dependent)
- What it does: Provides a Kubernetes API surface to define placements and view status (implemented as a “hub” control plane that you can interact with via
kubectl). - Why it matters: Enables Kubernetes-native declarative multi-cluster placement patterns.
- Practical benefit: You can define “place this object onto clusters matching these labels.”
- Caveats: The hub may be implemented using an AKS cluster or a managed hub component depending on current design. This can introduce additional cost and operational considerations. Verify how hub is realized in your chosen mode.
Multi-cluster resource placement / propagation
- What it does: Lets you define placement policies so Kubernetes objects are propagated to selected member clusters.
- Why it matters: This is often the core “fleet” capability for day-2 operations.
- Practical benefit: Define once, deploy to many—reduces drift.
- Caveats:
- Not all Kubernetes object types or edge cases are equally suitable (e.g., cluster-scoped resources need careful governance).
- Conflicts can occur if app teams also manage the same objects through other pipelines (GitOps, Helm) without clear ownership.
Cluster selection via labels/selectors
- What it does: Targets clusters based on metadata (e.g.,
env=prod,region=eastus,tier=frontend). - Why it matters: Enables scalable targeting without enumerating cluster names.
- Practical benefit: Simple progressive rollout patterns.
- Caveats: Label taxonomy must be standardized or selection becomes fragile.
Status, health, and rollout visibility (capability-dependent)
- What it does: Surfaces whether resources were successfully placed and applied across clusters.
- Why it matters: Multi-cluster operations without feedback quickly become unsafe.
- Practical benefit: Easier troubleshooting of “why didn’t cluster X receive this deployment?”
- Caveats: The depth of status/telemetry varies; you may still rely heavily on per-cluster logs and events.
Integration with Azure identity and governance
- What it does: Uses Azure RBAC and resource-level permissions for fleet and membership operations; integrates into Azure activity logs.
- Why it matters: Helps align multi-cluster control with enterprise access models.
- Practical benefit: Centralized access control and audit trail at Azure resource level.
- Caveats: You still need strong Kubernetes RBAC and namespace boundaries inside each cluster.
7. Architecture and How It Works
High-level architecture
At a high level, Azure Kubernetes Fleet Manager introduces a fleet management plane:
- You create a Fleet resource in Azure.
- You register AKS clusters as fleet members (membership is tracked as Azure resources).
- If using hub-based placement, you connect to the fleet hub Kubernetes API (via kubeconfig) and apply placement resources.
- Agents/controllers coordinate propagation to member clusters.
- You monitor placement status centrally and validate results on each member cluster.
Request/data/control flow (typical hub-based placement)
- Control plane (Azure): You manage the Fleet resource and membership via ARM (Portal/CLI/SDK).
- Control plane (Kubernetes): You apply placement and workload manifests to the hub API.
- Controllers/agents: Reconcile desired state and apply objects to selected member clusters.
- Data plane: Your applications run on the member clusters; Fleet Manager is not in your request path.
Integrations with related services
Common integrations in Azure environments include:
- AKS: the managed Kubernetes clusters being orchestrated
- Microsoft Entra ID / Azure RBAC: authentication and authorization (for Azure and sometimes for Kubernetes API access)
- Azure Monitor: logs/metrics (Container insights), activity logs
- Azure Policy: governance of Azure resources and potentially Kubernetes policies (depending on your chosen policy stack)
- Key Vault + CSI driver / External Secrets: secrets management (recommended over distributing raw Kubernetes Secrets)
Dependency services
- AKS clusters as members
- Potentially an additional hub cluster (depending on how hub is implemented in the current service mode)
- Azure networking, identity, and logging services typically used with AKS
Security/authentication model (overview)
- Azure-side: ARM calls secured by Azure AD (Entra) and Azure RBAC.
- Kubernetes-side: Hub API access uses kubeconfig credentials (often integrated with Entra/AKS auth). Member cluster access is performed by controllers/agents with their own identities/credentials.
- Best practice: Use least privilege, separate roles for fleet administration vs application deployment.
Networking model (overview)
- Member clusters usually need outbound connectivity to Azure control plane endpoints and any fleet-specific endpoints used by the service.
- Hub-to-member traffic patterns depend on the architecture mode and may rely on agents initiating outbound connections, reducing the need for inbound connectivity between clusters.
- For private clusters, additional DNS/firewall/endpoint planning may be required—verify the official networking requirements.
Monitoring/logging/governance
- Use Azure Activity Log for fleet resource operations.
- Use per-cluster Kubernetes events/logs for applied resources.
- Consider centralized log aggregation (Azure Monitor / Log Analytics) across clusters with consistent workspace design.
Simple architecture diagram (Mermaid)
flowchart LR
A[Platform engineer] -->|Azure CLI/Portal| ARM[Azure Resource Manager]
ARM --> F[Azure Kubernetes Fleet Manager<br/>Fleet resource]
F --> M1[AKS Cluster A<br/>(member)]
F --> M2[AKS Cluster B<br/>(member)]
A -->|kubectl (hub mode)| HUB[Fleet hub Kubernetes API]
HUB -->|placement/propagation| M1
HUB -->|placement/propagation| M2
Production-style architecture diagram (Mermaid)
flowchart TB
subgraph Mgmt[Management Subscription]
ARM[Azure Resource Manager]
FLEET[Azure Kubernetes Fleet Manager<br/>Fleet]
LOG[Azure Monitor / Log Analytics]
KV[Azure Key Vault]
end
subgraph Region1[Region 1]
AKS1[AKS Prod Cluster - eastus]
ACR1[Azure Container Registry]
end
subgraph Region2[Region 2]
AKS2[AKS Prod Cluster - westeurope]
ACR2[Azure Container Registry (optional)]
end
subgraph Shared[Shared Controls]
ENTRA[Microsoft Entra ID]
POLICY[Azure Policy / Policy-as-code]
CICD[CI/CD or GitOps (optional)]
end
ADMIN[Platform/SRE Team] --> ENTRA
ADMIN -->|ARM| ARM
ARM --> FLEET
FLEET --> AKS1
FLEET --> AKS2
CICD -->|Push manifests to hub or repos| FLEET
AKS1 -->|Pull images| ACR1
AKS2 -->|Pull images| ACR1
AKS1 --> LOG
AKS2 --> LOG
AKS1 -->|CSI/Secrets| KV
AKS2 -->|CSI/Secrets| KV
POLICY --> AKS1
POLICY --> AKS2
8. Prerequisites
Account/subscription requirements
- An Azure subscription where you can create:
- Resource groups
- AKS clusters
- Fleet resources (Azure Kubernetes Fleet Manager)
Permissions / IAM roles
You typically need:
- At Azure scope (subscription/resource group):
OwnerorContributorto create resources- Plus the ability to assign roles if needed (for managed identities/service principals)
- At AKS/Kubernetes scope:
- Cluster admin or appropriate RBAC to install/operate any fleet agents/controllers (often handled automatically, but permissions still matter)
For least privilege in production, split roles: – Fleet administrators (create/manage fleet + membership) – Cluster operators (AKS ops) – Application operators (namespace-scoped)
Billing requirements
- A valid payment method is required because AKS and any associated resources (node pools, load balancers, public IPs, log analytics) incur cost.
CLI/SDK/tools needed
- Azure CLI: https://learn.microsoft.com/cli/azure/install-azure-cli
kubectl: https://kubernetes.io/docs/tasks/tools/- Optional:
helmif you standardize packaging (not required for Fleet Manager itself)
Azure CLI extensions:
– Fleet-related commands may require an Azure CLI extension (name and installation can change).
– Verify with:
– az extension list-available --output table | findstr -i fleet (Windows)
– az extension list-available --output table | grep -i fleet (macOS/Linux)
– Also verify current docs for the correct extension and minimum Azure CLI version.
Region availability
- Fleet Manager availability can be region- and feature-dependent.
- Verify supported regions and feature status in official docs: https://learn.microsoft.com/azure/kubernetes-fleet/
Quotas/limits
Plan for: – AKS core quotas (vCPU per region, node limits) – IP address capacity if using Azure CNI – Potential fleet-specific limits (number of member clusters, placements, objects)—verify in official docs
Prerequisite services
- AKS clusters to manage (for this tutorial, you’ll create two small clusters)
- Optional but common:
- Azure Container Registry (ACR)
- Log Analytics workspace (Azure Monitor)
- Key Vault (secrets)
9. Pricing / Cost
Current pricing model (what to expect)
Azure Kubernetes Fleet Manager pricing is typically described as:
- Fleet management plane: often no separate line-item cost (verify in official pricing/docs).
- Underlying resources: you pay for what Fleet Manager uses or touches:
- AKS clusters (node pools/VMs, load balancers, disks)
- If a hub cluster is created/required for your chosen features, that hub’s AKS cost applies
- Log Analytics ingestion and retention (if using Azure Monitor / Container insights)
- Networking egress between regions (if clusters are cross-region and data moves)
Because Azure pricing and feature packaging can change, validate with: – AKS pricing page: https://azure.microsoft.com/pricing/details/kubernetes-service/ – Azure Pricing Calculator: https://azure.microsoft.com/pricing/calculator/ – Fleet docs (pricing notes): https://learn.microsoft.com/azure/kubernetes-fleet/ (verify current guidance)
Pricing dimensions (cost drivers)
Even if the fleet control plane is “free,” real costs come from:
- Compute: VM sizes and node counts in member clusters (and hub if applicable)
- Networking: – Inter-region data transfer – Load balancer and public IP usage – NAT Gateway (if used)
- Observability: – Log Analytics data ingestion – Metrics retention
- Storage: – Managed disks for workloads – Container registry storage (ACR)
- Operational tooling: – Security scanning, policy tooling, backup tooling
Free tier
- AKS has a pricing model where the Kubernetes control plane is typically managed by Azure; charges are primarily for worker nodes and attached resources. The exact free/paid breakdown depends on AKS tier/feature choices—verify current AKS pricing details.
- Fleet Manager itself may not have a distinct free tier; instead it may be priced implicitly (or not billed separately). Verify in official docs.
Hidden or indirect costs
- Hub cluster cost (if hub mode requires an AKS cluster)
- Log volume explosion when rolling out agents or changing logging configurations across many clusters at once
- Data egress if you centrally aggregate logs across regions or replicate images cross-region
- Management overhead: more clusters = more upgrade testing, more policy review, more incident surface area
Network/data transfer implications
- Propagating Kubernetes objects is usually control-plane traffic (small), but cross-region operations and observability pipelines can create sustained data transfer.
- If your architecture pulls images from a registry in a different region, egress charges can appear.
How to optimize cost
- Start with hubless or minimal capability mode if possible (verify feature requirements).
- Use the smallest practical node sizes for non-production clusters.
- Centralize ACR per geography; use geo-replication if needed.
- Right-size Log Analytics:
- Set retention to what you need
- Filter noisy logs
- Use metrics where possible
- Use labels/selectors to avoid deploying heavy add-ons to every cluster by default.
Example low-cost starter estimate (no fabricated numbers)
A low-cost lab typically includes: – Two small AKS clusters (single node pool, 1–2 nodes each) – Minimal load balancers (or none if not exposing services) – No Log Analytics (or minimal ingestion)
Exact cost depends on: – Region – VM size (e.g., B-series vs D-series) – Node count and uptime
Use the calculator with: – “Kubernetes Service” (AKS worker nodes) – “Virtual Machines” – “Load Balancer” – “Public IP” – “Log Analytics”
Example production cost considerations
In production, expect cost growth from: – Multiple node pools (system + user + spot) – Higher availability (multiple zones) – Multiple clusters per region – A hub cluster (if used) – Centralized logging at scale – Security tooling and backups
10. Step-by-Step Hands-On Tutorial
This lab builds a small, realistic multi-cluster setup on Azure and demonstrates hub-based multi-cluster placement concepts. The exact CRD API versions and CLI command groups can change—this tutorial includes verification steps so you can adapt safely.
Objective
Create two AKS clusters, register them into Azure Kubernetes Fleet Manager, connect to the fleet hub Kubernetes API, and propagate a sample namespace + deployment to selected member clusters.
Lab Overview
You will:
- Create an Azure resource group
- Create two small AKS clusters (
aks-a,aks-b) - Create an Azure Kubernetes Fleet Manager fleet (
demo-fleet) with hub capability (if required for placement) - Add both AKS clusters as fleet members
- Connect to the fleet hub using
kubectl - Discover fleet CRDs and apply a placement policy to deploy a sample app to both clusters
- Validate results by checking each member cluster
- Clean up everything
Cost note: If hub mode creates an additional AKS cluster (or similar billable resource), your cost increases. If you want the lowest-cost lab, verify whether you can run the features you need without hub mode, and prefer a single cluster lab for learning.
Step 1: Set variables and sign in
Actions
- Open a terminal (Cloud Shell is fine).
- Sign in and select your subscription.
az login
az account show --output table
az account set --subscription "<YOUR_SUBSCRIPTION_ID>"
- Set environment variables:
export LOCATION="eastus"
export RG="rg-fleet-lab"
export AKS_A="aks-a-fleetlab"
export AKS_B="aks-b-fleetlab"
export FLEET="demo-fleet"
Expected outcome
- Your Azure CLI context points to the correct subscription.
- Variables are set for repeatable commands.
Step 2: Create a resource group
az group create --name "$RG" --location "$LOCATION"
Expected outcome
- Resource group
$RGexists in$LOCATION.
Verify:
az group show -n "$RG" --query "{name:name,location:location,provisioningState:properties.provisioningState}"
Step 3: Create two small AKS clusters
AKS cluster creation can take 5–15+ minutes per cluster. The exact flags you need may vary based on your org policies (private cluster, Azure CNI, etc.).
Actions
Create AKS cluster A:
az aks create \
--resource-group "$RG" \
--name "$AKS_A" \
--location "$LOCATION" \
--enable-managed-identity \
--node-count 1 \
--generate-ssh-keys
Create AKS cluster B:
az aks create \
--resource-group "$RG" \
--name "$AKS_B" \
--location "$LOCATION" \
--enable-managed-identity \
--node-count 1 \
--generate-ssh-keys
Expected outcome
- Two running AKS clusters.
Verify:
az aks list -g "$RG" --query "[].{name:name, location:location, provisioningState:provisioningState}" -o table
Step 4: Install/enable Azure CLI fleet commands (if required)
Fleet commands may require an extension. Because extension naming can change, follow a discover-first approach.
Actions
List available extensions containing “fleet”:
az extension list-available --output table | grep -i fleet || true
If documentation instructs a specific extension (for example, fleet), install it:
# Example only — verify the correct extension name in official docs
az extension add --name fleet
If the extension is already installed, update it:
# Example only — verify the correct extension name in official docs
az extension update --name fleet
Expected outcome
- You can run
az fleet -hsuccessfully.
Verify:
az fleet -h
If az fleet is not found:
– Check the official getting started guide for the current CLI requirement: https://learn.microsoft.com/azure/kubernetes-fleet/
Step 5: Create an Azure Kubernetes Fleet Manager fleet
Fleet creation can support multiple modes. For multi-cluster placement, you may need to enable a hub capability.
Actions
- Review help for the exact required flags:
az fleet create -h
- Create the fleet (example pattern):
# Example: create a fleet in a resource group/region
# Add hub-related flags if required by your scenario (verify in help/docs).
az fleet create \
--resource-group "$RG" \
--name "$FLEET" \
--location "$LOCATION"
If docs require hub enablement for placement, you may need something like --enable-hub (flag name may differ). Use the CLI help output to confirm.
Expected outcome
- A fleet resource exists.
Verify:
az fleet show -g "$RG" -n "$FLEET" --query "{name:name, location:location, provisioningState:provisioningState}" -o table
Step 6: Add both AKS clusters as fleet members
Actions
Get the AKS resource IDs:
AKS_A_ID=$(az aks show -g "$RG" -n "$AKS_A" --query id -o tsv)
AKS_B_ID=$(az aks show -g "$RG" -n "$AKS_B" --query id -o tsv)
echo "$AKS_A_ID"
echo "$AKS_B_ID"
Add cluster A as a member:
az fleet member create \
--resource-group "$RG" \
--fleet-name "$FLEET" \
--name "member-a" \
--member-cluster-id "$AKS_A_ID"
Add cluster B as a member:
az fleet member create \
--resource-group "$RG" \
--fleet-name "$FLEET" \
--name "member-b" \
--member-cluster-id "$AKS_B_ID"
Expected outcome
- Two fleet members are registered.
Verify:
az fleet member list -g "$RG" --fleet-name "$FLEET" -o table
Step 7: Connect to the fleet hub Kubernetes API (hub mode)
If your fleet supports hub access, you typically fetch kubeconfig credentials similar to AKS.
Actions
Check whether a hub exists and how to get credentials:
az fleet get-credentials -h
Then run:
az fleet get-credentials \
--resource-group "$RG" \
--name "$FLEET"
List contexts:
kubectl config get-contexts
kubectl config current-context
Expected outcome
- Your kubeconfig now includes a context for the fleet hub.
kubectlcan talk to the hub API.
Verify:
kubectl get nodes 2>/dev/null || true
kubectl cluster-info
If the hub is not implemented as a typical node-based AKS cluster,
kubectl get nodesmay not be meaningful. The safer verification iskubectl cluster-infoand CRD discovery below.
Step 8: Discover fleet CRDs and member cluster objects
Because API group names/versions can change between previews and GA, discover what the hub currently exposes.
Actions
List CRDs related to fleet/placement:
kubectl get crds | grep -i -E "fleet|placement|member|cluster" || true
List API resources containing “fleet”:
kubectl api-resources | grep -i fleet || true
kubectl api-resources | grep -i placement || true
kubectl api-resources | grep -i member || true
Try listing member clusters (resource name can vary; commonly something like memberclusters):
kubectl get memberclusters -A 2>/dev/null || true
If that fails, locate the exact resource name from kubectl api-resources output and use it.
Expected outcome
- You can identify:
- The member cluster resource type
- The placement resource type (for example, something like
clusterresourceplacements)
Step 9: Create a sample workload on the hub and a placement policy
This step demonstrates the workflow pattern:
- Create Kubernetes objects (namespace + deployment) in the hub.
- Create a placement object that selects the target clusters.
- The system propagates objects to member clusters.
Actions
Create a namespace and deployment manifest:
cat > fleet-demo-workload.yaml <<'EOF'
apiVersion: v1
kind: Namespace
metadata:
name: fleet-demo
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
namespace: fleet-demo
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:stable
ports:
- containerPort: 80
EOF
Apply it to the hub:
kubectl apply -f fleet-demo-workload.yaml
Now create a placement policy.
Because the exact API version and kind can vary, do this safely:
- Identify the placement kind:
kubectl api-resources | grep -i -E "clusterresourceplacement|placement"
- Inspect a sample schema (if supported):
# Replace <resource> with what you found, e.g. clusterresourceplacements
kubectl explain clusterresourceplacement --recursive 2>/dev/null || true
If your hub supports a ClusterResourcePlacement kind, the manifest often follows this conceptual structure:
- Select clusters by name or by labels
- Select resources to propagate (namespace-scoped and/or cluster-scoped)
- Observe status fields
Create a placement manifest template and adjust apiVersion/kind/fields to match what kubectl explain shows in your environment:
cat > fleet-demo-placement.yaml <<'EOF'
# VERIFY apiVersion/kind/fields using:
# kubectl explain <kind> --recursive
apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
name: place-fleet-demo
spec:
# The exact selector structure depends on the CRD version.
# Common intent: target all member clusters.
clusterSelector: {}
# The exact "resourceSelectors" shape may vary.
resourceSelectors:
- group: ""
version: "v1"
kind: Namespace
name: fleet-demo
- group: "apps"
version: "v1"
kind: Deployment
namespace: fleet-demo
name: nginx
EOF
Apply:
kubectl apply -f fleet-demo-placement.yaml
Expected outcome
- The hub accepts the workload objects.
- The hub accepts the placement object.
- Placement status begins reflecting scheduling/propagation to member clusters.
Verify placement status (resource name may differ):
kubectl get clusterresourceplacements 2>/dev/null || true
kubectl describe clusterresourceplacement place-fleet-demo 2>/dev/null || true
If resource names differ, use what kubectl api-resources reported.
Step 10: Validate on each member cluster
Now verify that the namespace and deployment exist on aks-a and aks-b.
Actions
Get kubeconfig contexts for each AKS cluster:
az aks get-credentials -g "$RG" -n "$AKS_A" --overwrite-existing
az aks get-credentials -g "$RG" -n "$AKS_B" --overwrite-existing
kubectl config get-contexts
Switch to cluster A context (name varies; pick it from the output):
kubectl config use-context "$AKS_A"
kubectl get ns | grep fleet-demo
kubectl -n fleet-demo get deploy nginx
kubectl -n fleet-demo get pods -l app=nginx -o wide
Switch to cluster B:
kubectl config use-context "$AKS_B"
kubectl get ns | grep fleet-demo
kubectl -n fleet-demo get deploy nginx
kubectl -n fleet-demo get pods -l app=nginx -o wide
Expected outcome
fleet-demonamespace exists on both clusters.nginxdeployment exists and a pod is Running (or Pending briefly during image pull).
Validation
Use this checklist:
- Azure:
az fleet showreturns provisioningStateSucceeded(or equivalent)az fleet member listshows both members registered- Hub:
kubectl get crdsshows fleet/placement CRDskubectl get <placement>shows placement object and status progressing to placed/applied- Member clusters:
- Namespace and deployment appear and pods are running
Troubleshooting
Problem: az fleet command not found
- Install/update the correct Azure CLI extension.
- Verify the official docs for current CLI guidance: https://learn.microsoft.com/azure/kubernetes-fleet/
Problem: Membership creation fails with authorization errors
- Ensure you have Azure RBAC permissions on:
- Fleet resource group
- AKS cluster resources
- If your org uses Azure Policy, creation of identities/role assignments may be blocked.
Problem: az fleet get-credentials fails
- Hub might not be enabled or might require additional flags at creation time.
- Verify whether your fleet was created with hub capability (and how the hub is exposed).
Problem: Placement CRD apiVersion mismatch
- Don’t guess versions.
- Use:
kubectl api-resources | grep -i placementkubectl explain <kind> --recursive- Adjust
apiVersion,specfields accordingly.
Problem: Objects appear on one cluster but not the other
- Check placement status on hub:
kubectl describe <placement> <name>- Confirm member cluster labels/selectors if your placement uses label selection.
- Ensure both member clusters are healthy and reachable by the fleet controllers/agents.
Problem: Pods stuck in Pending
- Likely node capacity constraints (1-node clusters are tight).
- Check:
kubectl describe pod <pod>kubectl get events -n fleet-demo --sort-by=.lastTimestamp
Cleanup
To avoid ongoing charges, delete resources.
- Delete placement and hub objects (optional but cleaner):
# Run against hub context
kubectl delete -f fleet-demo-placement.yaml 2>/dev/null || true
kubectl delete -f fleet-demo-workload.yaml 2>/dev/null || true
- Delete fleet members and fleet (optional; if you delete the RG this is not required):
# Optional explicit deletion (commands may vary)
az fleet member delete -g "$RG" --fleet-name "$FLEET" -n member-a --yes || true
az fleet member delete -g "$RG" --fleet-name "$FLEET" -n member-b --yes || true
az fleet delete -g "$RG" -n "$FLEET" --yes || true
- Delete the resource group (recommended for full cleanup):
az group delete --name "$RG" --yes --no-wait
Expected outcome:
– All resources in $RG are deleted, stopping compute charges.
11. Best Practices
Architecture best practices
- Prefer multiple clusters for isolation, not as a substitute for namespaces when hard boundaries are required (compliance, tenancy, blast radius).
- Standardize cluster “profiles” (e.g.,
prod-standard,prod-internet,gpu) and label clusters accordingly for clean placement rules. - Keep a clear separation between:
- Platform baseline resources (managed by platform team)
- Application resources (managed by app teams)
IAM/security best practices
- Use least privilege:
- Separate roles for fleet administration vs cluster administration vs application deployment.
- Use Entra ID-backed access where possible.
- Treat hub access as highly privileged and secure it like a production admin endpoint.
Cost best practices
- Don’t enable hub mode or additional features unless required.
- In early phases, use fewer clusters and smaller node pools; scale out once workflows are proven.
- Watch Log Analytics ingestion when rolling out agents/fleet-propagated telemetry changes.
Performance best practices
- Use selectors to avoid pushing heavy add-ons everywhere.
- Keep placement objects focused (smaller blast radius, easier debugging).
Reliability best practices
- Use progressive rollouts:
- Start with dev clusters, then staging, then production clusters.
- Maintain environment parity but allow for regional differences (instance types, zones) through controlled overlays (not ad hoc edits).
Operations best practices
- Maintain a runbook:
- How to add a cluster to the fleet
- How to label clusters
- How to troubleshoot failed placement
- Use consistent observability:
- Same metrics/logging agents across member clusters
- Tag resources:
env,owner,costCenter,service,dataClassification
Governance/tagging/naming best practices
- Adopt naming standards:
- Fleet:
fleet-<org>-<platform>-<scope> - Members:
member-<region>-<env>-<cluster> - Standardize labels used for selection:
env=dev|stage|prodregion=eastus|westeuropetier=frontend|backend|platformexposure=internal|internet
12. Security Considerations
Identity and access model
- Azure RBAC controls who can create/manage fleets and membership.
- Kubernetes RBAC controls what identities can do on hub and member clusters.
- Prefer Entra-integrated authentication to AKS APIs where feasible and align roles with job functions.
Encryption
- AKS provides encryption at rest for managed disks; etcd encryption and customer-managed keys depend on AKS features—verify AKS security options.
- For hub components, confirm how data is stored and whether you can use CMK—verify in official docs.
Network exposure
- Treat the hub endpoint as sensitive:
- Restrict access using private networking where supported
- Use conditional access and MFA for admin identities
- For private AKS clusters:
- Ensure required control-plane and outbound endpoints are reachable for fleet operations.
Secrets handling
- Avoid propagating raw Kubernetes
Secretobjects across clusters unless you have a very clear security model. - Prefer:
- Azure Key Vault + CSI driver
- External Secrets Operator patterns (evaluate governance carefully)
- If you must distribute secrets, ensure:
- Encryption in transit
- Least-privilege access
- Strong audit logging
Audit/logging
- Enable and centralize:
- Azure Activity Logs for fleet operations
- AKS audit logs (if enabled in your configuration)
- Kubernetes events and controller logs for troubleshooting placement
Compliance considerations
- Multi-region fleets can implicate data residency requirements (logs, metadata).
- Ensure that placement doesn’t unintentionally push resources into restricted regions/environments.
- Use policy-as-code and approvals for changes that affect production clusters.
Common security mistakes
- Giving broad “cluster-admin everywhere” to too many humans
- Allowing app teams to modify baseline resources that the platform team expects to own
- Mixing multiple deployment controllers (GitOps + fleet placement + Helm runbooks) with no ownership boundaries
- Distributing secrets through manifests instead of external secret stores
Secure deployment recommendations
- Start with non-sensitive, non-critical objects (namespaces, configmaps, service accounts) and build confidence.
- Use layered controls:
- Azure RBAC + Kubernetes RBAC + policy guardrails
- Implement change control for fleet-level operations (pull requests, approvals, audit).
13. Limitations and Gotchas
Because this service evolves, validate current limits and behavior in official docs. Common gotchas in multi-cluster management include:
Known limitations (verify current list)
- Feature availability may be:
- Region-limited
- Preview-limited
- Limited to certain cluster configurations (private clusters, network plugin choices)
- Not all resource types and edge cases propagate cleanly (especially cluster-scoped objects).
Quotas
- Maximum number of member clusters per fleet (verify)
- Limits on placement objects or object sizes (verify)
- AKS quotas (vCPU, nodes, load balancers) still apply
Regional constraints
- Hub availability might be region-bound.
- Cross-region propagation is usually supported logically but can introduce latency and governance complexity.
Pricing surprises
- If hub mode deploys an extra AKS cluster, that cluster’s compute and networking costs can dominate.
- Centralized logging across many clusters can be expensive quickly.
Compatibility issues
- If you also use GitOps (Flux/Argo), avoid managing the same Kubernetes object from two controllers.
- Differences in Kubernetes versions and enabled admission policies can cause objects to apply in one cluster but fail in another.
Operational gotchas
- Label drift: if cluster labels drive selection, changing labels can cause unexpected rollout or removal.
- Partial failure modes: one cluster might be down or blocked by policy, leading to inconsistent state unless you monitor placement status and reconcile.
Migration challenges
- Moving from existing GitOps “app-of-apps” to fleet placement requires:
- Ownership model redesign
- CI/CD changes
- Reworking overlays (kustomize/helm) into a model that aligns with placement and per-cluster differences
Vendor-specific nuances
- Azure RBAC and identity integration can be powerful but also complex—especially in locked-down enterprise subscriptions.
14. Comparison with Alternatives
Azure Kubernetes Fleet Manager is one approach to multi-cluster management. Alternatives include Azure-native and third-party/open-source tools.
| Option | Best For | Strengths | Weaknesses | When to Choose |
|---|---|---|---|---|
| Azure Kubernetes Fleet Manager | Multi-AKS fleet management with Azure integration | Azure-native, fleet abstraction, hub-based placement patterns, integrates with Azure RBAC | Feature maturity/availability can vary; may require hub resources; learning curve for CRDs | You’re standardizing on AKS and want Azure-supported fleet workflows |
| Azure Arc-enabled Kubernetes + GitOps | Managing heterogeneous clusters (on-prem, other clouds) | Strong hybrid story; GitOps at scale; Azure governance | Different model than fleet placement; Arc introduces agents and its own ops overhead | You must manage non-AKS clusters and want Azure governance + GitOps |
| AKS + GitOps (Flux/Argo CD) without fleet | Multi-cluster deployments via repo structure | Mature OSS workflows; portable; clear desired state | You build/maintain multi-cluster patterns yourself; consistency depends on repo discipline | You prefer pure GitOps and already have patterns for multi-cluster |
| Rancher | Central management for many clusters across environments | Mature UI; multi-cluster governance | Additional platform to run/manage | You already use Rancher or need its ecosystem features |
| Google GKE Fleet / Anthos | Google Cloud-centric multi-cluster management | Strong multi-cluster story in GCP | Different cloud; migration complexity | You’re on GCP and want first-class fleet features there |
| Open Cluster Management / Karmada | Open-source multi-cluster control planes | Powerful; portable; extensible | You operate/control plane yourself; higher complexity | You need cloud-agnostic multi-cluster scheduling and accept ops burden |
| Cluster API (CAPI) | Cluster lifecycle management at scale | Declarative cluster management | Not a full app placement solution by itself | You need consistent cluster provisioning across many environments |
15. Real-World Example
Enterprise example (regulated industry)
Problem A financial services company runs 20+ AKS clusters across regions and environments. They need consistent baseline namespaces, network policies, and observability agents, and they must roll out changes progressively with auditability.
Proposed architecture
– Multiple AKS clusters per region (prod/stage/dev)
– Azure Kubernetes Fleet Manager fleet:
– All clusters enrolled as members
– Placement rules based on labels: env, region, exposure
– Azure Monitor and Log Analytics per environment (separate workspaces)
– Key Vault for secrets with CSI driver
Why Azure Kubernetes Fleet Manager was chosen – Azure-native integration with RBAC and subscription governance – Fleet abstraction for membership and consistent baseline propagation – Reduced “script sprawl” and manual cluster-by-cluster operations
Expected outcomes – Faster baseline rollouts with fewer inconsistencies – Clearer audit trail for platform changes – Reduced risk via environment-based progressive deployment
Startup/small-team example (SaaS)
Problem A startup runs three AKS clusters: one for staging and two for production (two regions). They want to keep the clusters intentionally separate but don’t want to manually maintain identical platform components.
Proposed architecture
– Fleet with 3 member clusters
– Labels:
– env=stage|prod
– region=eastus|westus
– Fleet placement used for:
– Namespaces
– Shared configmaps
– Standard service accounts
– GitOps for application workloads (separate pipeline), with strict ownership boundaries
Why Azure Kubernetes Fleet Manager was chosen – Keeps multi-cluster baseline management simple and Azure-aligned – Minimizes the need to write custom tooling as they scale cluster count
Expected outcomes – Fewer production drift issues – Faster onboarding of new clusters as the startup grows – Cleaner separation of platform vs application changes
16. FAQ
-
Is Azure Kubernetes Fleet Manager the same as AKS?
No. AKS runs Kubernetes clusters. Azure Kubernetes Fleet Manager manages multiple clusters as a fleet. -
Do I need more than one cluster to use it?
Practically, yes. It provides the most value when you have multiple AKS clusters. -
Does it replace GitOps tools like Flux or Argo CD?
Not necessarily. Many teams use Fleet Manager for baseline/platform distribution and GitOps for application delivery, but you must define ownership to avoid conflicts. -
Does it require a “hub” cluster?
Some multi-cluster placement capabilities commonly use a hub API surface. Whether this is mandatory depends on the feature set you enable—verify in official docs. -
Does hub mode add cost?
If hub mode uses an AKS cluster or other billable infrastructure, yes. Confirm the hub implementation and costs for your configuration. -
Can I manage clusters across different Azure regions?
Usually yes in concept, but availability and constraints may apply. Verify region support and cross-region behavior. -
Can I manage clusters across subscriptions?
Possibly, but it depends on supported scenarios and RBAC. Verify official docs for cross-subscription membership. -
Can I manage on-prem or other-cloud clusters?
Fleet Manager is primarily positioned around AKS fleet scenarios. For heterogeneous environments, Azure Arc-enabled Kubernetes is often evaluated. Verify the supported cluster types. -
How does it select which clusters receive a resource?
Typically via labels/selectors on member clusters and placement policies defined on the hub. Exact fields depend on CRD versions. -
What kinds of Kubernetes objects should I propagate?
Start with low-risk objects: namespaces, configmaps, service accounts, baseline policies. Be cautious with cluster-scoped resources and secrets. -
What happens if one cluster is unhealthy during propagation?
You’ll typically see partial rollout and status indicating failure for that cluster. Monitor placement status and build operational procedures for remediation. -
How do I prevent developers from modifying fleet-managed resources?
Use Kubernetes RBAC and admission controls to prevent modifications in platform-owned namespaces, and clearly document ownership. -
Is Fleet Manager in the data path of my applications?
No. It’s a management/control-plane mechanism. Application traffic goes through your normal service endpoints (ingress/LB/service mesh). -
How do I monitor fleet operations?
Use Azure Activity Log for ARM operations and Kubernetes logs/events/status in hub and member clusters. Also use Azure Monitor/Container insights for cluster telemetry. -
What’s the safest way to start in production?
Start with a small subset of clusters (dev), propagate only baseline non-sensitive resources, implement rollback strategies, and require pull-request approvals for fleet-level changes.
17. Top Online Resources to Learn Azure Kubernetes Fleet Manager
| Resource Type | Name | Why It Is Useful |
|---|---|---|
| Official documentation | Azure Kubernetes Fleet Manager docs — https://learn.microsoft.com/azure/kubernetes-fleet/ | Primary source for capabilities, supported regions, and current APIs |
| Official overview | Overview page (within docs) — https://learn.microsoft.com/azure/kubernetes-fleet/ | Explains concepts like fleet, members, and hub/placement models |
| Official pricing | AKS pricing (cost baseline for clusters) — https://azure.microsoft.com/pricing/details/kubernetes-service/ | Fleet cost often maps to underlying AKS and dependent services |
| Pricing calculator | Azure Pricing Calculator — https://azure.microsoft.com/pricing/calculator/ | Build region-specific estimates for AKS nodes, networking, logging |
| Azure CLI | Install Azure CLI — https://learn.microsoft.com/cli/azure/install-azure-cli | Required for scripting fleet and AKS setup |
| Kubernetes tooling | kubectl install — https://kubernetes.io/docs/tasks/tools/ | Needed to interact with hub APIs and validate member clusters |
| Architecture guidance | Azure Architecture Center — https://learn.microsoft.com/azure/architecture/ | Reference patterns for AKS, multi-region design, governance |
| AKS docs | AKS documentation — https://learn.microsoft.com/azure/aks/ | Core operational guidance for cluster security, networking, upgrades |
| GitHub (highly relevant) | Azure Fleet OSS (if referenced by docs) — https://github.com/Azure/fleet | Implementation details, samples, CRDs (use with version caution) |
| Videos | Microsoft Azure YouTube — https://www.youtube.com/@MicrosoftAzure | Conference talks and feature walkthroughs (search for Fleet Manager) |
18. Training and Certification Providers
| Institute | Suitable Audience | Likely Learning Focus | Mode | Website |
|---|---|---|---|---|
| DevOpsSchool.com | Beginners to advanced DevOps/SRE/platform engineers | AKS, Kubernetes operations, DevOps pipelines, platform engineering basics | Check website | https://www.devopsschool.com/ |
| ScmGalaxy.com | DevOps learners and managers | DevOps process, CI/CD, SCM, release management | Check website | https://www.scmgalaxy.com/ |
| CLoudOpsNow.in | Cloud/ops engineers | Cloud operations, monitoring, reliability practices | Check website | https://www.cloudopsnow.in/ |
| SreSchool.com | SREs, operations teams, platform engineers | SRE practices, incident response, observability, reliability engineering | Check website | https://www.sreschool.com/ |
| AiOpsSchool.com | Ops teams exploring AIOps | AIOps concepts, automation, monitoring analytics | Check website | https://www.aiopsschool.com/ |
19. Top Trainers
| Platform/Site | Likely Specialization | Suitable Audience | Website |
|---|---|---|---|
| RajeshKumar.xyz | DevOps/Kubernetes/cloud coaching (verify offerings) | Individuals seeking guided training | https://www.rajeshkumar.xyz/ |
| devopstrainer.in | DevOps training (verify course catalog) | Beginners to intermediate DevOps learners | https://www.devopstrainer.in/ |
| devopsfreelancer.com | Freelance DevOps help/training (verify services) | Teams needing short-term expertise | https://www.devopsfreelancer.com/ |
| devopssupport.in | DevOps support and training resources (verify scope) | Ops teams needing troubleshooting support | https://www.devopssupport.in/ |
20. Top Consulting Companies
| Company | Likely Service Area | Where They May Help | Consulting Use Case Examples | Website |
|---|---|---|---|---|
| cotocus.com | Cloud/DevOps consulting (verify offerings) | Architecture, CI/CD, Kubernetes adoption | AKS platform setup, multi-cluster operating model design | https://cotocus.com/ |
| DevOpsSchool.com | DevOps/Kubernetes consulting and training | Delivery acceleration, platform engineering enablement | Fleet onboarding patterns, governance/RBAC models, rollout practices | https://www.devopsschool.com/ |
| DEVOPSCONSULTING.IN | DevOps consulting (verify offerings) | DevOps transformation and automation | AKS landing zone, observability rollout, operational runbooks | https://www.devopsconsulting.in/ |
21. Career and Learning Roadmap
What to learn before this service
- Kubernetes fundamentals:
- Pods, Deployments, Services, Ingress
- Namespaces, RBAC, ConfigMaps/Secrets
- AKS fundamentals:
- Node pools, networking basics, identity integration
- Cluster upgrade strategy and workload disruption handling
- Infrastructure-as-code basics:
- Bicep/Terraform concepts (optional but very helpful)
- Basic security:
- Least privilege, network segmentation, secret management patterns
What to learn after this service
- Multi-cluster delivery patterns:
- GitOps at scale, environment promotion strategies
- Governance and policy:
- Azure Policy, Gatekeeper/Kyverno (depending on your standards)
- Observability at scale:
- Azure Monitor, OpenTelemetry, Prometheus/Grafana patterns
- Reliability engineering:
- SLOs, error budgets, incident response, chaos testing
- Multi-region architecture:
- Traffic management, failover, data replication patterns
Job roles that use it
- Platform Engineer
- Kubernetes Administrator / AKS Engineer
- DevOps Engineer
- Site Reliability Engineer (SRE)
- Cloud Solutions Architect
- Security Engineer (Kubernetes governance)
Certification path (Azure)
A practical Azure-aligned path often includes: – Azure fundamentals (AZ-900) – Azure Administrator (AZ-104) – Azure DevOps Engineer Expert (AZ-400) – Kubernetes-specific certifications (CKA/CKAD) for Kubernetes depth
(Choose based on your role; Fleet Manager work benefits from both Azure and Kubernetes credibility.)
Project ideas for practice
- Build a 3-cluster fleet (dev/stage/prod) and propagate baseline namespaces + RBAC
- Implement progressive rollout using cluster labels and staged placements
- Create a “cluster onboarding” automation that: – Creates an AKS cluster – Registers it into the fleet – Applies baseline placement
- Implement policy guardrails and prove that non-compliant deployments are blocked
- Centralize logging and create a fleet-wide dashboard per cluster label (env/region)
22. Glossary
- AKS (Azure Kubernetes Service): Azure managed Kubernetes service for running clusters.
- Fleet: An Azure resource representing a group of Kubernetes clusters managed together.
- Fleet member: A Kubernetes cluster registered into a fleet.
- Hub (fleet hub): A Kubernetes API endpoint/control plane used to define placements and manage multi-cluster propagation (capability-dependent).
- Placement / Propagation: The mechanism for applying Kubernetes objects to selected member clusters.
- Selector: A rule (often label-based) used to choose which clusters receive a resource.
- RBAC: Role-Based Access Control, used both in Azure and Kubernetes.
- Microsoft Entra ID: Azure’s identity provider (formerly Azure Active Directory).
- Control plane: Management components (APIs/controllers) that coordinate Kubernetes resources.
- Data plane: Where application workloads run (pods and nodes).
- Drift: Configuration differences between clusters that should be consistent.
- Blast radius: The scope of impact when something fails (smaller is safer).
- GitOps: Managing infrastructure and apps using Git as the source of truth with automated reconciliation.
23. Summary
Azure Kubernetes Fleet Manager (Azure, Containers) is a service for managing multiple Kubernetes clusters—most commonly multiple AKS clusters—as a single fleet. It helps platform and operations teams reduce the multi-cluster tax by providing fleet membership management and, in hub-based scenarios, Kubernetes-native placement/propagation workflows to distribute resources across selected clusters.
Cost-wise, you should plan primarily for AKS worker node costs, plus any additional infrastructure required by hub-based capabilities and observability (Log Analytics). Security-wise, treat fleet and hub access as highly privileged: use least-privilege Azure RBAC and Kubernetes RBAC, keep secrets in Key Vault rather than distributing raw Secrets, and implement clear ownership boundaries to avoid controller conflicts.
Use Azure Kubernetes Fleet Manager when you have (or will have) multiple AKS clusters and need consistent, scalable operations across them. Next, deepen your skills by pairing fleet concepts with a strong AKS security baseline and a multi-cluster delivery model (often GitOps plus clear platform/app separation), using the official docs as your source of truth: https://learn.microsoft.com/azure/kubernetes-fleet/