ā What’s NOT Available in EKS
- Kubernetes Scheduler Configuration: EKS doesn’t allow modification ofĀ
KubeSchedulerConfigurationĀ to set custom default topology spread constraints - Built-in Cluster Defaults: No native EKS setting to enforce multi-AZ distribution automatically
ā Available Solutions (Ranked by Effectiveness)
Solution 1: Mutating Admission Webhook (Recommended)
This automatically injects topology spread constraints into all deployments at creation time.
A. Deploy the Mutating Webhook
# mutating-webhook-deployment.yamlapiVersion: apps/v1kind: Deploymentmetadata:name: multiaz-webhooknamespace: kube-systemlabels:app: multiaz-webhookspec:replicas: 2selector:matchLabels:app: multiaz-webhooktemplate:metadata:labels:app: multiaz-webhookspec:serviceAccountName: multiaz-webhookcontainers:- name: webhookimage: your-registry/multiaz-webhook:latestports:- containerPort: 8443env:- name: TLS_CERT_FILEvalue: /etc/certs/tls.crt- name: TLS_PRIVATE_KEY_FILEvalue: /etc/certs/tls.keyvolumeMounts:- name: certsmountPath: /etc/certsreadOnly: trueresources:requests:cpu: 100mmemory: 128Milimits:cpu: 200mmemory: 256Mivolumes:- name: certssecret:secretName: multiaz-webhook-certs---apiVersion: v1kind: Servicemetadata:name: multiaz-webhook-servicenamespace: kube-systemspec:selector:app: multiaz-webhookports:- port: 443targetPort: 8443protocol: TCP---apiVersion: v1kind: ServiceAccountmetadata:name: multiaz-webhooknamespace: kube-system---apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRolemetadata:name: multiaz-webhookrules:- apiGroups: [""]resources: ["nodes"]verbs: ["get", "list"]- apiGroups: ["apps"]resources: ["deployments", "replicasets"]verbs: ["get", "list"]---apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRoleBindingmetadata:name: multiaz-webhookroleRef:apiGroup: rbac.authorization.k8s.iokind: ClusterRolename: multiaz-webhooksubjects:- kind: ServiceAccountname: multiaz-webhooknamespace: kube-system
B. Webhook Configuration
# mutating-webhook-config.yamlapiVersion: admissionregistration.k8s.io/v1kind: MutatingAdmissionWebhookmetadata:name: multiaz-enforcerwebhooks:- name: multiaz.enforcer.eks.awsclientConfig:service:name: multiaz-webhook-servicenamespace: kube-systempath: "/mutate"rules:- operations: ["CREATE", "UPDATE"]apiGroups: ["apps"]apiVersions: ["v1"]resources: ["deployments"]- operations: ["CREATE", "UPDATE"]apiGroups: ["apps"]apiVersions: ["v1"]resources: ["replicasets"]namespaceSelector:matchExpressions:- key: nameoperator: NotInvalues: ["kube-system", "kube-public", "kube-node-lease"]objectSelector:matchExpressions:- key: multiaz.enforcer.eks.aws/skipoperator: DoesNotExistadmissionReviewVersions: ["v1", "v1beta1"]sideEffects: NonefailurePolicy: Fail
C. Sample Webhook Code (Go)
// webhook-server.gopackage mainimport ("context""encoding/json""fmt""net/http"admissionv1 "k8s.io/api/admission/v1"appsv1 "k8s.io/api/apps/v1"corev1 "k8s.io/api/core/v1"metav1 "k8s.io/apimachinery/pkg/apis/meta/v1""k8s.io/apimachinery/pkg/runtime")type WebhookServer struct {server *http.Server}func (ws *WebhookServer) mutate(w http.ResponseWriter, r *http.Request) {var body []byteif r.Body != nil {if data, err := ioutil.ReadAll(r.Body); err == nil {body = data}}var admissionResponse *admissionv1.AdmissionResponsear := admissionv1.AdmissionReview{}if err := json.Unmarshal(body, &ar); err != nil {admissionResponse = &admissionv1.AdmissionResponse{Result: &metav1.Status{Message: err.Error(),},}} else {admissionResponse = ws.mutateDeployment(&ar)}admissionReview := admissionv1.AdmissionReview{}if admissionResponse != nil {admissionReview.Response = admissionResponseif ar.Request != nil {admissionReview.Response.UID = ar.Request.UID}}respBytes, _ := json.Marshal(admissionReview)w.Header().Set("Content-Type", "application/json")w.Write(respBytes)}func (ws *WebhookServer) mutateDeployment(ar *admissionv1.AdmissionReview) *admissionv1.AdmissionResponse {req := ar.Requestvar deployment appsv1.Deploymentif err := json.Unmarshal(req.Object.Raw, &deployment); err != nil {return &admissionv1.AdmissionResponse{Result: &metav1.Status{Message: err.Error(),},}}// Check if topology spread constraints already existif hasTopologySpreadConstraints(&deployment) {return &admissionv1.AdmissionResponse{Allowed: true}}// Add multi-AZ topology spread constraintspatches := createMultiAZPatches(&deployment)patchBytes, _ := json.Marshal(patches)return &admissionv1.AdmissionResponse{Allowed: true,Patch: patchBytes,PatchType: func() *admissionv1.PatchType {pt := admissionv1.PatchTypeJSONPatchreturn &pt}(),}}func hasTopologySpreadConstraints(deployment *appsv1.Deployment) bool {return len(deployment.Spec.Template.Spec.TopologySpreadConstraints) > 0}func createMultiAZPatches(deployment *appsv1.Deployment) []map[string]interface{} {patches := []map[string]interface{}{}// Add topology spread constraints for AZ distributiontopologyConstraints := []corev1.TopologySpreadConstraint{{MaxSkew: 1,TopologyKey: "topology.kubernetes.io/zone",WhenUnsatisfiable: corev1.DoNotSchedule,MinDomains: func() *int32 { i := int32(3); return &i }(),LabelSelector: &metav1.LabelSelector{MatchLabels: deployment.Spec.Selector.MatchLabels,},},{MaxSkew: 2,TopologyKey: "kubernetes.io/hostname",WhenUnsatisfiable: corev1.ScheduleAnyway,LabelSelector: &metav1.LabelSelector{MatchLabels: deployment.Spec.Selector.MatchLabels,},},}patches = append(patches, map[string]interface{}{"op": "add","path": "/spec/template/spec/topologySpreadConstraints","value": topologyConstraints,})return patches}
Solution 2: OPA Gatekeeper (Policy-Based Approach)
This validates and can mutate deployments to enforce multi-AZ distribution.
A. Install Gatekeeper
kubectl apply -f https://raw.githubusercontent.com/open-policy-agent/gatekeeper/release-3.14/deploy/gatekeeper.yaml
Code language: JavaScript (javascript)
B. Create Constraint Template
# gatekeeper-multiaz-template.yamlapiVersion: templates.gatekeeper.sh/v1beta1kind: ConstraintTemplatemetadata:name: k8smultiazrequiredspec:crd:spec:names:kind: K8sMultiAZRequiredvalidation:openAPIV3Schema:type: objectproperties:message:type: stringexemptNamespaces:type: arrayitems:type: stringtargets:- target: admission.k8s.gatekeeper.shrego: |package k8smultiazrequiredviolation[{"msg": msg}] {input.review.kind.kind == "Deployment"input.review.object.spec.replicas > 1not input.review.object.spec.template.spec.topologySpreadConstraintsnot is_exempt_namespacemsg := sprintf("Deployment %s must have topology spread constraints for multi-AZ distribution", [input.review.object.metadata.name])}violation[{"msg": msg}] {input.review.kind.kind == "Deployment"input.review.object.spec.replicas > 1input.review.object.spec.template.spec.topologySpreadConstraintsnot has_zone_topology_constraintnot is_exempt_namespacemsg := sprintf("Deployment %s must have zone topology spread constraint", [input.review.object.metadata.name])}has_zone_topology_constraint {constraint := input.review.object.spec.template.spec.topologySpreadConstraints[_]constraint.topologyKey == "topology.kubernetes.io/zone"}is_exempt_namespace {exempt_namespaces := input.parameters.exemptNamespacesinput.review.object.metadata.namespace == exempt_namespaces[_]}
C. Create Constraint
# gatekeeper-multiaz-constraint.yamlapiVersion: constraints.gatekeeper.sh/v1beta1kind: K8sMultiAZRequiredmetadata:name: must-have-multiaz-topologyspec:match:kinds:- apiGroups: ["apps"]kinds: ["Deployment"]excludedNamespaces: ["kube-system", "kube-public", "gatekeeper-system"]parameters:message: "All deployments with more than 1 replica must have multi-AZ topology spread constraints"exemptNamespaces: ["kube-system", "kube-public", "gatekeeper-system"]
Solution 3: Kyverno (Alternative Policy Engine)
A. Install Kyverno
kubectl create -f https://github.com/kyverno/kyverno/releases/latest/download/install.yaml
Code language: JavaScript (javascript)
B. Create Multi-AZ Policy
# kyverno-multiaz-policy.yamlapiVersion: kyverno.io/v1kind: ClusterPolicymetadata:name: require-multiaz-topologyspec:validationFailureAction: enforcebackground: truerules:- name: check-multiaz-topologymatch:any:- resources:kinds:- Deploymentnamespaces:- "!kube-system"- "!kube-public"- "!kyverno"validate:message: "Deployments with replicas > 1 must have topology spread constraints for multi-AZ distribution"pattern:spec:=(replicas): "1"- name: require-zone-topologymatch:any:- resources:kinds:- Deploymentnamespaces:- "!kube-system"- "!kube-public"- "!kyverno"validate:message: "Deployments with replicas > 1 must have zone topology spread constraints"anyPattern:- spec:replicas: 1- spec:template:spec:topologySpreadConstraints:- topologyKey: "topology.kubernetes.io/zone"---apiVersion: kyverno.io/v1kind: ClusterPolicymetadata:name: add-multiaz-topologyspec:validationFailureAction: enforcebackground: falserules:- name: add-topology-constraintsmatch:any:- resources:kinds:- Deploymentnamespaces:- "!kube-system"- "!kube-public"- "!kyverno"preconditions:all:- key: "{{ request.object.spec.replicas }}"operator: GreaterThanvalue: 1- key: "{{ request.object.spec.template.spec.topologySpreadConstraints || `[]` | length(@) }}"operator: Equalsvalue: 0mutate:patchStrategicMerge:spec:template:spec:topologySpreadConstraints:- maxSkew: 1topologyKey: "topology.kubernetes.io/zone"whenUnsatisfiable: DoNotScheduleminDomains: 3labelSelector:matchLabels:"{{ request.object.spec.selector.matchLabels }}"- maxSkew: 2topologyKey: "kubernetes.io/hostname"whenUnsatisfiable: ScheduleAnywaylabelSelector:matchLabels:"{{ request.object.spec.selector.matchLabels }}"
Solution 4: Namespace-Level Defaults with LimitRanges
While not directly enforcing topology constraints, you can use this approach combined with other solutions:
# namespace-defaults.yamlapiVersion: v1kind: LimitRangemetadata:name: multiaz-defaultsnamespace: productionspec:limits:- default:cpu: "500m"memory: "512Mi"defaultRequest:cpu: "100m"memory: "128Mi"type: Container---apiVersion: v1kind: ResourceQuotametadata:name: multiaz-quotanamespace: productionspec:hard:requests.cpu: "10"requests.memory: 20Gilimits.cpu: "20"limits.memory: 40Gipods: "50"
Implementation Guide
Step 1: Choose Your Approach
For Maximum Control: Use Mutating Admission Webhook For Policy Management: Use Kyverno (easier) or OPA Gatekeeper (more powerful) For Simple Validation: Use Gatekeeper validation only
Step 2: Deploy the Solution
# For Kyverno (Recommended for simplicity)kubectl create -f https://github.com/kyverno/kyverno/releases/latest/download/install.yamlkubectl apply -f kyverno-multiaz-policy.yaml# Verify installationkubectl get clusterpolicyRun in CloudShell
Step 3: Test the Enforcement
# test-deployment.yamlapiVersion: apps/v1kind: Deploymentmetadata:name: test-appnamespace: defaultspec:replicas: 3selector:matchLabels:app: test-apptemplate:metadata:labels:app: test-appspec:containers:- name: nginximage: nginx:1.21resources:requests:cpu: 100mmemory: 128Mi
# This should either be automatically modified (mutating webhook/Kyverno)# or rejected (validation-only policies)kubectl apply -f test-deployment.yaml# Check if topology constraints were addedkubectl get deployment test-app -o yaml | grep -A 20 topologySpreadConstraintsRun in CloudShell
Step 4: Create Exemption Mechanism
# For deployments that should skip multi-AZ enforcementapiVersion: apps/v1kind: Deploymentmetadata:name: single-az-appnamespace: defaultannotations:multiaz.enforcer.eks.aws/skip: "true" # For webhookpolicies.kyverno.io/skip: "true" # For Kyvernospec:replicas: 1# ... rest of deployment
Monitoring and Validation
Check Policy Compliance
# For Kyvernokubectl get cpolkubectl describe cpol require-multiaz-topology# For Gatekeeperkubectl get constraintskubectl describe k8smultiazrequired must-have-multiaz-topology# Check violationskubectl get events --field-selector reason=PolicyViolationRun in CloudShell
Validate Existing Deployments
#!/bin/bash# check-multiaz-compliance.shecho "Checking Multi-AZ compliance for all deployments..."kubectl get deployments --all-namespaces -o json | jq -r '.items[] |select(.spec.replicas > 1) |select(.spec.template.spec.topologySpreadConstraints == null or(.spec.template.spec.topologySpreadConstraints |map(select(.topologyKey == "topology.kubernetes.io/zone")) |length == 0)) |"\(.metadata.namespace)/\(.metadata.name) - Missing multi-AZ topology constraints"'Run in CloudShell
Recommended Implementation
For your EKS Auto Mode cluster, I recommend:
- Start with KyvernoĀ – easier to implement and maintain
- Use the mutating policyĀ to automatically add topology constraints
- Set up monitoringĀ to track compliance
- Create exemption mechanismsĀ for special cases
- Test thoroughlyĀ in a non-production environment first
This approach ensures that every deployment with more than 1 replica automatically gets multi-AZ distribution without requiring developers to remember to add topology spread constraints manually.
Sources
Spread workloads across nodes and Availability Zones – AWS Prescriptive Guidance
Running highly-available applications – Amazon EKS
EKS Control Plane – Amazon EKS
Managing webhook failures on Amazon EKS | AWS re:Post
TopologySpreadConstraints in EKS auto mode does not seem to work | AWS re:Post
Troubleshoot cluster scaling in Amazon EKS with Karpenter autoscaler | AWS re:Post
Scheduling – Containers on AWS
Kubernetes Data Plane – Amazon EKS
Customizing scheduling on Amazon EKS | Containers
VPC and Subnet Considerations – Amazon EKS
Manage CoreDNS for DNS in Amazon EKS clusters – Amazon EKS
Iām a DevOps/SRE/DevSecOps/Cloud Expert passionate about sharing knowledge and experiences. I have worked at Cotocus. I share tech blog at DevOps School, travel stories at Holiday Landmark, stock market tips at Stocks Mantra, health and fitness guidance at My Medic Plus, product reviews at TrueReviewNow , and SEO strategies at Wizbrand.
Do you want to learn Quantum Computing?
Please find my social handles as below;
Rajesh Kumar Personal Website
Rajesh Kumar at YOUTUBE
Rajesh Kumar at INSTAGRAM
Rajesh Kumar at X
Rajesh Kumar at FACEBOOK
Rajesh Kumar at LINKEDIN
Rajesh Kumar at WIZBRAND
Find Trusted Cardiac Hospitals
Compare heart hospitals by city and services ā all in one place.
Explore Hospitals
This post does a great job explaining how enforcing multiāAZ configurations at the cluster level in AWS EKS can significantly improve application resilience and uptime. Ensuring high availability across multiple availability zones is essential for production workloads, and the guidance here helps clarify both the āwhyā and the āhowā behind it. I especially found the emphasis on architectural best practices and automated enforcement insightful, as these are key to avoiding singleāpoint failures and maintaining consistent performance. Overall, itās a practical resource for DevOps engineers working with Kubernetes on AWS.