ā What’s NOT Available in EKS
- Kubernetes Scheduler Configuration: EKS doesn’t allow modification ofĀ
KubeSchedulerConfigurationĀ to set custom default topology spread constraints - Built-in Cluster Defaults: No native EKS setting to enforce multi-AZ distribution automatically
ā Available Solutions (Ranked by Effectiveness)
Solution 1: Mutating Admission Webhook (Recommended)
This automatically injects topology spread constraints into all deployments at creation time.
A. Deploy the Mutating Webhook
# mutating-webhook-deployment.yamlapiVersion: apps/v1kind: Deploymentmetadata:name: multiaz-webhooknamespace: kube-systemlabels:app: multiaz-webhookspec:replicas: 2selector:matchLabels:app: multiaz-webhooktemplate:metadata:labels:app: multiaz-webhookspec:serviceAccountName: multiaz-webhookcontainers:- name: webhookimage: your-registry/multiaz-webhook:latestports:- containerPort: 8443env:- name: TLS_CERT_FILEvalue: /etc/certs/tls.crt- name: TLS_PRIVATE_KEY_FILEvalue: /etc/certs/tls.keyvolumeMounts:- name: certsmountPath: /etc/certsreadOnly: trueresources:requests:cpu: 100mmemory: 128Milimits:cpu: 200mmemory: 256Mivolumes:- name: certssecret:secretName: multiaz-webhook-certs---apiVersion: v1kind: Servicemetadata:name: multiaz-webhook-servicenamespace: kube-systemspec:selector:app: multiaz-webhookports:- port: 443targetPort: 8443protocol: TCP---apiVersion: v1kind: ServiceAccountmetadata:name: multiaz-webhooknamespace: kube-system---apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRolemetadata:name: multiaz-webhookrules:- apiGroups: [""]resources: ["nodes"]verbs: ["get", "list"]- apiGroups: ["apps"]resources: ["deployments", "replicasets"]verbs: ["get", "list"]---apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRoleBindingmetadata:name: multiaz-webhookroleRef:apiGroup: rbac.authorization.k8s.iokind: ClusterRolename: multiaz-webhooksubjects:- kind: ServiceAccountname: multiaz-webhooknamespace: kube-system
B. Webhook Configuration
# mutating-webhook-config.yamlapiVersion: admissionregistration.k8s.io/v1kind: MutatingAdmissionWebhookmetadata:name: multiaz-enforcerwebhooks:- name: multiaz.enforcer.eks.awsclientConfig:service:name: multiaz-webhook-servicenamespace: kube-systempath: "/mutate"rules:- operations: ["CREATE", "UPDATE"]apiGroups: ["apps"]apiVersions: ["v1"]resources: ["deployments"]- operations: ["CREATE", "UPDATE"]apiGroups: ["apps"]apiVersions: ["v1"]resources: ["replicasets"]namespaceSelector:matchExpressions:- key: nameoperator: NotInvalues: ["kube-system", "kube-public", "kube-node-lease"]objectSelector:matchExpressions:- key: multiaz.enforcer.eks.aws/skipoperator: DoesNotExistadmissionReviewVersions: ["v1", "v1beta1"]sideEffects: NonefailurePolicy: Fail
C. Sample Webhook Code (Go)
// webhook-server.gopackage mainimport ("context""encoding/json""fmt""net/http"admissionv1 "k8s.io/api/admission/v1"appsv1 "k8s.io/api/apps/v1"corev1 "k8s.io/api/core/v1"metav1 "k8s.io/apimachinery/pkg/apis/meta/v1""k8s.io/apimachinery/pkg/runtime")type WebhookServer struct {server *http.Server}func (ws *WebhookServer) mutate(w http.ResponseWriter, r *http.Request) {var body []byteif r.Body != nil {if data, err := ioutil.ReadAll(r.Body); err == nil {body = data}}var admissionResponse *admissionv1.AdmissionResponsear := admissionv1.AdmissionReview{}if err := json.Unmarshal(body, &ar); err != nil {admissionResponse = &admissionv1.AdmissionResponse{Result: &metav1.Status{Message: err.Error(),},}} else {admissionResponse = ws.mutateDeployment(&ar)}admissionReview := admissionv1.AdmissionReview{}if admissionResponse != nil {admissionReview.Response = admissionResponseif ar.Request != nil {admissionReview.Response.UID = ar.Request.UID}}respBytes, _ := json.Marshal(admissionReview)w.Header().Set("Content-Type", "application/json")w.Write(respBytes)}func (ws *WebhookServer) mutateDeployment(ar *admissionv1.AdmissionReview) *admissionv1.AdmissionResponse {req := ar.Requestvar deployment appsv1.Deploymentif err := json.Unmarshal(req.Object.Raw, &deployment); err != nil {return &admissionv1.AdmissionResponse{Result: &metav1.Status{Message: err.Error(),},}}// Check if topology spread constraints already existif hasTopologySpreadConstraints(&deployment) {return &admissionv1.AdmissionResponse{Allowed: true}}// Add multi-AZ topology spread constraintspatches := createMultiAZPatches(&deployment)patchBytes, _ := json.Marshal(patches)return &admissionv1.AdmissionResponse{Allowed: true,Patch: patchBytes,PatchType: func() *admissionv1.PatchType {pt := admissionv1.PatchTypeJSONPatchreturn &pt}(),}}func hasTopologySpreadConstraints(deployment *appsv1.Deployment) bool {return len(deployment.Spec.Template.Spec.TopologySpreadConstraints) > 0}func createMultiAZPatches(deployment *appsv1.Deployment) []map[string]interface{} {patches := []map[string]interface{}{}// Add topology spread constraints for AZ distributiontopologyConstraints := []corev1.TopologySpreadConstraint{{MaxSkew: 1,TopologyKey: "topology.kubernetes.io/zone",WhenUnsatisfiable: corev1.DoNotSchedule,MinDomains: func() *int32 { i := int32(3); return &i }(),LabelSelector: &metav1.LabelSelector{MatchLabels: deployment.Spec.Selector.MatchLabels,},},{MaxSkew: 2,TopologyKey: "kubernetes.io/hostname",WhenUnsatisfiable: corev1.ScheduleAnyway,LabelSelector: &metav1.LabelSelector{MatchLabels: deployment.Spec.Selector.MatchLabels,},},}patches = append(patches, map[string]interface{}{"op": "add","path": "/spec/template/spec/topologySpreadConstraints","value": topologyConstraints,})return patches}
Solution 2: OPA Gatekeeper (Policy-Based Approach)
This validates and can mutate deployments to enforce multi-AZ distribution.
A. Install Gatekeeper
kubectl apply -f https://raw.githubusercontent.com/open-policy-agent/gatekeeper/release-3.14/deploy/gatekeeper.yaml
Code language: JavaScript (javascript)
B. Create Constraint Template
# gatekeeper-multiaz-template.yamlapiVersion: templates.gatekeeper.sh/v1beta1kind: ConstraintTemplatemetadata:name: k8smultiazrequiredspec:crd:spec:names:kind: K8sMultiAZRequiredvalidation:openAPIV3Schema:type: objectproperties:message:type: stringexemptNamespaces:type: arrayitems:type: stringtargets:- target: admission.k8s.gatekeeper.shrego: |package k8smultiazrequiredviolation[{"msg": msg}] {input.review.kind.kind == "Deployment"input.review.object.spec.replicas > 1not input.review.object.spec.template.spec.topologySpreadConstraintsnot is_exempt_namespacemsg := sprintf("Deployment %s must have topology spread constraints for multi-AZ distribution", [input.review.object.metadata.name])}violation[{"msg": msg}] {input.review.kind.kind == "Deployment"input.review.object.spec.replicas > 1input.review.object.spec.template.spec.topologySpreadConstraintsnot has_zone_topology_constraintnot is_exempt_namespacemsg := sprintf("Deployment %s must have zone topology spread constraint", [input.review.object.metadata.name])}has_zone_topology_constraint {constraint := input.review.object.spec.template.spec.topologySpreadConstraints[_]constraint.topologyKey == "topology.kubernetes.io/zone"}is_exempt_namespace {exempt_namespaces := input.parameters.exemptNamespacesinput.review.object.metadata.namespace == exempt_namespaces[_]}
C. Create Constraint
# gatekeeper-multiaz-constraint.yamlapiVersion: constraints.gatekeeper.sh/v1beta1kind: K8sMultiAZRequiredmetadata:name: must-have-multiaz-topologyspec:match:kinds:- apiGroups: ["apps"]kinds: ["Deployment"]excludedNamespaces: ["kube-system", "kube-public", "gatekeeper-system"]parameters:message: "All deployments with more than 1 replica must have multi-AZ topology spread constraints"exemptNamespaces: ["kube-system", "kube-public", "gatekeeper-system"]
Solution 3: Kyverno (Alternative Policy Engine)
A. Install Kyverno
kubectl create -f https://github.com/kyverno/kyverno/releases/latest/download/install.yaml
Code language: JavaScript (javascript)
B. Create Multi-AZ Policy
# kyverno-multiaz-policy.yamlapiVersion: kyverno.io/v1kind: ClusterPolicymetadata:name: require-multiaz-topologyspec:validationFailureAction: enforcebackground: truerules:- name: check-multiaz-topologymatch:any:- resources:kinds:- Deploymentnamespaces:- "!kube-system"- "!kube-public"- "!kyverno"validate:message: "Deployments with replicas > 1 must have topology spread constraints for multi-AZ distribution"pattern:spec:=(replicas): "1"- name: require-zone-topologymatch:any:- resources:kinds:- Deploymentnamespaces:- "!kube-system"- "!kube-public"- "!kyverno"validate:message: "Deployments with replicas > 1 must have zone topology spread constraints"anyPattern:- spec:replicas: 1- spec:template:spec:topologySpreadConstraints:- topologyKey: "topology.kubernetes.io/zone"---apiVersion: kyverno.io/v1kind: ClusterPolicymetadata:name: add-multiaz-topologyspec:validationFailureAction: enforcebackground: falserules:- name: add-topology-constraintsmatch:any:- resources:kinds:- Deploymentnamespaces:- "!kube-system"- "!kube-public"- "!kyverno"preconditions:all:- key: "{{ request.object.spec.replicas }}"operator: GreaterThanvalue: 1- key: "{{ request.object.spec.template.spec.topologySpreadConstraints || `[]` | length(@) }}"operator: Equalsvalue: 0mutate:patchStrategicMerge:spec:template:spec:topologySpreadConstraints:- maxSkew: 1topologyKey: "topology.kubernetes.io/zone"whenUnsatisfiable: DoNotScheduleminDomains: 3labelSelector:matchLabels:"{{ request.object.spec.selector.matchLabels }}"- maxSkew: 2topologyKey: "kubernetes.io/hostname"whenUnsatisfiable: ScheduleAnywaylabelSelector:matchLabels:"{{ request.object.spec.selector.matchLabels }}"
Solution 4: Namespace-Level Defaults with LimitRanges
While not directly enforcing topology constraints, you can use this approach combined with other solutions:
# namespace-defaults.yamlapiVersion: v1kind: LimitRangemetadata:name: multiaz-defaultsnamespace: productionspec:limits:- default:cpu: "500m"memory: "512Mi"defaultRequest:cpu: "100m"memory: "128Mi"type: Container---apiVersion: v1kind: ResourceQuotametadata:name: multiaz-quotanamespace: productionspec:hard:requests.cpu: "10"requests.memory: 20Gilimits.cpu: "20"limits.memory: 40Gipods: "50"
Implementation Guide
Step 1: Choose Your Approach
For Maximum Control: Use Mutating Admission Webhook For Policy Management: Use Kyverno (easier) or OPA Gatekeeper (more powerful) For Simple Validation: Use Gatekeeper validation only
Step 2: Deploy the Solution
# For Kyverno (Recommended for simplicity)kubectl create -f https://github.com/kyverno/kyverno/releases/latest/download/install.yamlkubectl apply -f kyverno-multiaz-policy.yaml# Verify installationkubectl get clusterpolicyRun in CloudShell
Step 3: Test the Enforcement
# test-deployment.yamlapiVersion: apps/v1kind: Deploymentmetadata:name: test-appnamespace: defaultspec:replicas: 3selector:matchLabels:app: test-apptemplate:metadata:labels:app: test-appspec:containers:- name: nginximage: nginx:1.21resources:requests:cpu: 100mmemory: 128Mi
# This should either be automatically modified (mutating webhook/Kyverno)# or rejected (validation-only policies)kubectl apply -f test-deployment.yaml# Check if topology constraints were addedkubectl get deployment test-app -o yaml | grep -A 20 topologySpreadConstraintsRun in CloudShell
Step 4: Create Exemption Mechanism
# For deployments that should skip multi-AZ enforcementapiVersion: apps/v1kind: Deploymentmetadata:name: single-az-appnamespace: defaultannotations:multiaz.enforcer.eks.aws/skip: "true" # For webhookpolicies.kyverno.io/skip: "true" # For Kyvernospec:replicas: 1# ... rest of deployment
Monitoring and Validation
Check Policy Compliance
# For Kyvernokubectl get cpolkubectl describe cpol require-multiaz-topology# For Gatekeeperkubectl get constraintskubectl describe k8smultiazrequired must-have-multiaz-topology# Check violationskubectl get events --field-selector reason=PolicyViolationRun in CloudShell
Validate Existing Deployments
#!/bin/bash# check-multiaz-compliance.shecho "Checking Multi-AZ compliance for all deployments..."kubectl get deployments --all-namespaces -o json | jq -r '.items[] |select(.spec.replicas > 1) |select(.spec.template.spec.topologySpreadConstraints == null or(.spec.template.spec.topologySpreadConstraints |map(select(.topologyKey == "topology.kubernetes.io/zone")) |length == 0)) |"\(.metadata.namespace)/\(.metadata.name) - Missing multi-AZ topology constraints"'Run in CloudShell
Recommended Implementation
For your EKS Auto Mode cluster, I recommend:
- Start with KyvernoĀ – easier to implement and maintain
- Use the mutating policyĀ to automatically add topology constraints
- Set up monitoringĀ to track compliance
- Create exemption mechanismsĀ for special cases
- Test thoroughlyĀ in a non-production environment first
This approach ensures that every deployment with more than 1 replica automatically gets multi-AZ distribution without requiring developers to remember to add topology spread constraints manually.
Sources
Spread workloads across nodes and Availability Zones – AWS Prescriptive Guidance
Running highly-available applications – Amazon EKS
EKS Control Plane – Amazon EKS
Managing webhook failures on Amazon EKS | AWS re:Post
TopologySpreadConstraints in EKS auto mode does not seem to work | AWS re:Post
Troubleshoot cluster scaling in Amazon EKS with Karpenter autoscaler | AWS re:Post
Scheduling – Containers on AWS
Kubernetes Data Plane – Amazon EKS
Customizing scheduling on Amazon EKS | Containers
VPC and Subnet Considerations – Amazon EKS
Manage CoreDNS for DNS in Amazon EKS clusters – Amazon EKS
Iām a DevOps/SRE/DevSecOps/Cloud Expert passionate about sharing knowledge and experiences. I have worked at Cotocus. I share tech blog at DevOps School, travel stories at Holiday Landmark, stock market tips at Stocks Mantra, health and fitness guidance at My Medic Plus, product reviews at TrueReviewNow , and SEO strategies at Wizbrand.
Do you want to learn Quantum Computing?
Please find my social handles as below;
Rajesh Kumar Personal Website
Rajesh Kumar at YOUTUBE
Rajesh Kumar at INSTAGRAM
Rajesh Kumar at X
Rajesh Kumar at FACEBOOK
Rajesh Kumar at LINKEDIN
Rajesh Kumar at WIZBRAND
Find Trusted Cardiac Hospitals
Compare heart hospitals by city and services ā all in one place.
Explore Hospitals