Comparing KEDA, HPA, VPA & Custom Adapters for Real-World Scaling with Cost, Complexity & Best Practices
| Category | KEDA (Event-driven) | Prometheus-based (Adapter/HPA) | Datadog-based (Cluster Agent) | CloudWatch-based (Adapter/HPA) |
|---|---|---|---|---|
| Primary Function | Event-driven autoscaler (creates HPAs dynamically from external metrics like ALB, SQS, Kafka, etc.) | Uses HPA with Prometheus metrics via adapter (e.g., kube-metrics-adapter or prometheus-adapter) | Uses Datadog metrics (via Cluster Agent) as external metrics for HPA | Uses CloudWatch metrics (via AWS CloudWatch Metrics Adapter) for HPA |
| Metric Source | Multiple external sources: CloudWatch, Prometheus, SQS, Kafka, HTTP, etc. (50+ scalers) | Prometheus time-series metrics (scraped from exporters or apps) | Datadog platform metrics (ingested from AWS, custom apps, APM) | AWS CloudWatch (e.g., ALB metrics, RDS, SQS, Lambda, etc.) |
| Data Flow Model | Pull metrics or events → internal HPA → scale | Prometheus scrapes → adapter → Kubernetes Metrics API → HPA | Datadog agent → Cluster Agent → External Metrics API → HPA | CloudWatch Adapter → External Metrics API → HPA |
| Setup Complexity | 🟢 Medium (Helm + few YAMLs; no exporter needed) | 🔵 Medium-High (need Prometheus + adapter configuration) | 🟣 Medium (if Datadog is already deployed) | 🟠 Medium (adapter installation + IAM + mappings) |
| Integration with ALB Traffic | ✅ Native (via CloudWatch scaler – uses RequestCountPerTarget, TargetResponseTime) | ⚠️ Requires Prometheus CloudWatch exporter (YACE or similar) | ✅ Native (Datadog already pulls ALB metrics) | ✅ Native (direct access to ALB metrics) |
| Supports Scale-to-Zero | ✅ Yes | ❌ No (HPA cannot scale to zero) | ❌ No | ❌ No |
| Responsiveness / Latency | ~30–60 seconds (depends on CloudWatch polling) | ~15–30 seconds (depends on scrape interval) | ~30–60 seconds (depends on Datadog ingestion) | ~60 seconds (CloudWatch metric delay) |
| Operational Cost | 💲 Low (CloudWatch API calls only) | 💲💲 Medium (Prometheus infra + storage + exporter costs) | 💲💲💲 High (Datadog licensing per host/container) | 💲 Low (CloudWatch API calls) |
| Infrastructure Overhead | Lightweight (1 KEDA controller) | Heavy (Prometheus, exporters, adapter) | Moderate (Datadog Cluster Agent) | Moderate (Adapter deployment) |
| Ease of Maintenance | 🟢 Easy – one Helm upgrade for all namespaces | 🔵 Moderate – maintain adapter & Prometheus | 🟣 Easy if Datadog already managed | 🟠 Moderate – periodic IAM & adapter updates |
| EKS Auto Mode Compatibility | ✅ Fully compatible – scales pods; NodePools handle nodes | ✅ Compatible | ✅ Compatible | ✅ Compatible |
| Multi-Namespace Scaling | ✅ Native support (Scoped per namespace) | ✅ Supported | ✅ Supported | ✅ Supported |
| Security / IAM | Uses IRSA or static keys for AWS APIs | No AWS permissions required (depends on Prometheus) | Uses Datadog API key & IAM integration | Uses IRSA for AWS CloudWatch read access |
| Supported Triggers / Metrics | 50+ sources (CloudWatch, Kafka, RabbitMQ, HTTP, Redis, MySQL, etc.) | Limited to Prometheus metrics | Limited to Datadog metrics | Limited to AWS metrics |
| Scales on Events (not metrics) | ✅ Yes | ❌ No | ❌ No | ❌ No |
| Can Combine Multiple Triggers | ✅ Yes (multi-trigger scaling rules) | ⚠️ Only via complex PromQL expressions | ⚠️ Limited (Datadog composite metrics) | ⚠️ Limited (one metric per HPA) |
| Recommended For | Event-driven / traffic-based workloads (ALB, queues, web APIs) | Resource or app-metric-based workloads | Organizations using Datadog for monitoring & APM | AWS-centric workloads without Prometheus |
| Learning Curve | 🟢 Low | 🔵 Medium | 🟣 Low | 🟠 Medium |
| Vendor Lock-in | Low (Open Source) | Low (OSS ecosystem) | High (Datadog SaaS) | Medium (AWS-only) |
| Community & Ecosystem | Very active (CNCF Sandbox Project) | Large (K8s ecosystem standard) | Proprietary (Datadog documentation) | AWS-maintained (moderate community) |
| Use with WAF + ALB | ✅ Seamless (uses ALB TG metrics directly) | ⚠️ Need exporter for ALB metrics | ✅ Seamless (Datadog ALB integration) | ✅ Seamless (ALB metrics native in CloudWatch) |
| Example Metric | CloudWatch → RequestCountPerTarget, TargetResponseTime | Prometheus → nginx_ingress_controller_requests_total | Datadog → aws.applicationelb.request_count | CloudWatch → RequestCountPerTarget |
| Scale Behavior Visualization | KEDA Metrics API + Grafana dashboards | Prometheus / Grafana | Datadog Dashboards | CloudWatch Dashboards |
| Maturity (as of 2025) | ⭐⭐⭐⭐⭐ (CNCF Incubating) | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| Overall Recommendation (for EKS + ALB) | ✅✅✅ Best Option | ✅ Good (if Prometheus is already in place) | ⚙️ Suitable for Datadog-native orgs | ✅ Good fallback if KEDA not allowed |
🌐 1 | Architecture Overview
Flow:Client → DNS → WAF → ALB → TargetGroup → EKS Service/Pods
| Layer | Purpose | Key AWS / K8s Component |
|---|---|---|
| Edge Security | Filter malicious traffic | AWS WAF (Web ACL) |
| Load Balancing | Distribute inbound requests | ALB (AWS Load Balancer Controller) |
| Routing | Path/host-based dispatch to namespaces | Kubernetes Ingress |
| Compute | Run workloads | EKS Pods/Deployments |
| Node Capacity | Provision nodes automatically | EKS Auto Mode NodePools (Karpenter) |
| Autoscaling Brain | Adjust replicas dynamically | KEDA / HPA / VPA / Custom Adapter |
With EKS Auto Mode, AWS manages node scaling.
Your responsibility is pod-level scaling — deciding how many replicas each service needs based on traffic or resource metrics.
🧩 2 | Namespace-Scoped Design Pattern
- Each microservice (e.g.,
booking,auth,medical,telematics) lives in its own namespace. - Each namespace has its Ingress, Service, Deployment, ConfigMap, and autoscaler objects.
- Optionally, multiple namespaces can share one ALB via
alb.ingress.kubernetes.io/group.nameto save cost while keeping per-namespace isolation.
⚙️ 3 | Ingress & WAF Setup (Shared ALB Example)
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: booking-ing
namespace: booking
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/group.name: shared-edge
alb.ingress.kubernetes.io/group.order: "20"
alb.ingress.kubernetes.io/target-type: ip
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/wafv2-acl-arn: arn:aws:wafv2:ap-northeast-1:111111111111:regional/webacl/mywebacl/abcd1234
spec:
rules:
- host: api.example.com
http:
paths:
- path: /booking
pathType: Prefix
backend:
service:
name: booking-svc
port:
number: 80
Each namespace can repeat this pattern using different paths (/auth, /legal, etc.) but share the same group.name → one ALB under one WAF.
🚀 4 | Autoscaling Options for Pod Level Control
Below are five viable mechanisms for pod autoscaling inside EKS.
| # | Method | Scaling Source | Scales To Zero | Works with ALB Metrics | Typical Latency | Setup Time | Maint. Effort | Approx. Cost* | Skill Level |
|---|---|---|---|---|---|---|---|---|---|
| 1 | KEDA | External events (CloudWatch ALB, SQS, Prometheus, etc.) | ✅ | ✅ (native scaler) | 30-60 s | ⚙️ Medium | 🧩 Low (once installed) | 💲💲 CloudWatch API calls | Intermediate |
| 2 | HPA | CPU / memory / custom metrics | ❌ | ⚠️ via adapter | 15-30 s | ⚙️ Low | 🧩 Low | 💲 free | Beginner |
| 3 | VPA | Internal resource usage | ❌ | ❌ | N/A | ⚙️ Medium | 🧩 Low | 💲 free | Intermediate |
| 4 | Custom Metric Adapter | Prometheus / CloudWatch | ❌ | ✅ with manual mapping | 45-60 s | ⚙️ High | 🧩 High | 💲💲 metrics infra | Advanced |
| 5 | Manual Scaling | Human input | ❌ | ❌ | N/A | ⚙️ Instant | 🧩 High Opex | 💲 none | Basic |
* Cost = relative AWS service charges + operational overhead
🧮 5 | Detailed Analysis of Each Approach
🔹 A | KEDA (Event-Driven Autoscaler)
How it works:
KEDA reads external metrics (CloudWatch ALB RequestCountPerTarget, TargetResponseTime, SQS depth, PromQL queries, etc.) and creates an internal HPA.
Pros
- Supports 50 + scalers (AWS, Azure, Kafka, Prometheus, etc.).
- Scales to zero during idle.
- Simple YAML (
ScaledObject) per Deployment. - Works seamlessly with EKS Auto Mode and NodePools.
- Natively integrates with CloudWatch ALB metrics.
Cons
- Extra component to operate.
- CloudWatch polling → small metric costs and ≈ 1 min delay.
- Needs IRSA permissions for CloudWatch API.
Setup time: ≈ 1 hr (Helm install + ScaledObject YAMLs)
Maintenance: Low (central Helm upgrade + namespace YAMLs)
Recommended for: Multi-namespace EKS clusters with real-traffic scaling.
🔹 B | HPA (Native Horizontal Pod Autoscaler)
How it works:
Built into Kubernetes; scales based on CPU and memory by default.
Can also use custom metrics with an adapter.
Pros
- Native, stable, zero extra components.
- Predictable behavior and fine-grained control.
Cons
- Default metrics = CPU / memory only.
- Cannot scale to zero.
- Needs a metric adapter to use ALB metrics.
- Not event-driven; reactive after load hits CPU.
Setup time: ≈ 30 min
Maintenance: Minimal
Recommended for: Steady workloads or CPU-bound apps.
🔹 C | VPA (Vertical Pod Autoscaler)
How it works:
Adjusts CPU and memory requests/limits per pod automatically.
Pros
- Prevents over/under-provisioning.
- Complements KEDA/HPA.
Cons
- No replica count scaling.
- Not suited for traffic bursts.
Setup time: ≈ 45 min
Maintenance: Low
Recommended for: Batch or steady apps to optimize resources.
🔹 D | Custom Metric Adapters (Prometheus / CloudWatch)
How it works:
Deploy an external-metrics adapter exposing selected metrics to HPA.
HPA then scales on those metrics.
Pros
- Fine control; use any metric you own.
- Integrates into existing monitoring plane.
Cons
- Complex to deploy and maintain.
- Harder to debug.
- No scale-to-zero.
- Usually delayed by scrape interval + adapter polling.
Setup time: 1 – 2 hrs
Maintenance: High
Recommended for: Large orgs with centralized Prometheus or Datadog.
🔹 E | Manual Scaling
kubectl scale deployment <name> --replicas=N
Pros: 100 % control, simple to understand.
Cons: No automation; wastes capacity; high operational risk.
Use only for: testing or stable low-traffic sites.
💡 6 | KEDA Setup Walkthrough (for EKS + ALB)
- Install KEDA
helm repo add kedacore https://kedacore.github.io/charts helm install keda kedacore/keda -n keda --create-namespace - Enable IRSA (for CloudWatch)
eksctl utils associate-iam-oidc-provider --cluster my-eks --approve - IAM Policy
{ "Version":"2012-10-17", "Statement":[{"Effect":"Allow","Action":[ "cloudwatch:GetMetricData","cloudwatch:GetMetricStatistics", "cloudwatch:ListMetrics","cloudwatch:DescribeAlarms" ],"Resource":"*"}] } - ServiceAccount + TriggerAuthentication
apiVersion: v1 kind: ServiceAccount metadata: name: svc-traffic-autoscale namespace: booking annotations: eks.amazonaws.com/role-arn: arn:aws:iam::<ACCOUNT_ID>:role/eks-traffic-autoscale --- apiVersion: keda.sh/v1alpha1 kind: TriggerAuthentication metadata: name: alb-cw-auth namespace: booking spec: podIdentity: provider: aws - ScaledObject
apiVersion: keda.sh/v1alpha1 kind: ScaledObject metadata: name: booking-traffic namespace: booking spec: scaleTargetRef: name: booking-deployment minReplicaCount: 2 maxReplicaCount: 30 triggers: - type: aws-cloudwatch authenticationRef: name: alb-cw-auth metadata: namespace: AWS/ApplicationELB metricName: RequestCountPerTarget dimensionName: TargetGroup dimensionValue: targetgroup/k8s-xyz/abc123456 statistic: Sum period: "60" metricUnit: Count targetValue: "100" - Observe Scaling
kubectl get hpa -n booking kubectl get pods -n booking -w
📈 7 | Performance & Cost Considerations
| Factor | KEDA | HPA | VPA | Custom Adapter |
|---|---|---|---|---|
| Responsiveness | 30-60 s | 15-30 s | N/A | 45-60 s |
| Infra Cost | Low (CloudWatch polling) | None | None | Medium (Prometheus infra) |
| Setup Overhead | Medium | Low | Medium | High |
| Maintenance | Low | Low | Low | High |
| Complexity | Medium | Low | Low | High |
| Best for | Traffic / Event driven | CPU/Mem | Resource tuning | Centralized metrics |
| Scale-to-Zero | ✅ | ❌ | ❌ | ❌ |
🧠 8 | Decision Matrix
| Requirement | Best Choice | Reason |
|---|---|---|
| Real ALB traffic scaling | KEDA | Direct CloudWatch integration |
| CPU/memory bound apps | HPA | Native simple autoscaler |
| Optimize pod resources over time | VPA | Adjusts requests/limits |
| Central metrics team wants Prometheus-based control | Custom Adapter + HPA | Full metric plane |
| Low-traffic or manual control | Manual | No automation needed |
🧰 9 | Combining Approaches
A production-grade EKS stack often mixes them:
| Layer | Tool | Role |
|---|---|---|
| Replica Scaling | KEDA + HPA | Respond to traffic & CPU |
| Resource Tuning | VPA | Adjust limits automatically |
| Node Scaling | EKS Auto Mode (NodePools) | Provide capacity |
| Monitoring | CloudWatch + AMP + Grafana | Visibility into metrics |
🔒 10 | Security and Auth Notes
- Keep Firebase OIDC at pod level (not ALB listener), which avoids auth redirect limits.
- Enable IRSA for KEDA & pods requiring AWS API access.
- WAF rules protect ALB from volumetric attacks before KEDA reacts.
- Monitor 5xx errors + TargetResponseTime to guard against scaling loops.
🧭 11 | Final Recommendation
For your multi-namespace, WAF-protected, ALB-routed EKS cluster running in Full Auto Mode,
KEDA is the best fit for traffic-driven autoscaling:
- Event-driven and responsive to real user load.
- Scales independently per namespace/service.
- Integrates cleanly with EKS NodePools for capacity.
- Minimizes cost via scale-to-zero and fine-grained rules.
Use HPA as a fallback for CPU-based logic, VPA for optimization, and custom adapters only when you already maintain Prometheus or Datadog metric infrastructure.
🏁 Summary Matrix
| Dimension | Best Fit |
|---|---|
| Speed to implement | HPA |
| Responsiveness to traffic | KEDA |
| Ease of maintenance | KEDA / HPA |
| Cost efficiency | KEDA (scale-to-zero) |
| Complex metric logic | Custom Adapter |
| Resource tuning | VPA |
📚 References & Further Reading
AWS Blog – Autoscaling EKS with KEDA & CloudWatch
AWS Docs – EKS Auto Mode & NodePools
KEDA Docs – CloudWatch Scaler
SpectroCloud – Kubernetes Autoscaling Patterns: HPA, VPA & KEDA
✅ Final Takeaway:
If you need hands-off, event-driven, traffic-aware, namespace-isolated scaling for an ALB-fronted EKS cluster,
KEDA + EKS Auto Mode (NodePools) is the modern production-grade combination—
balancing performance, cost, and operational simplicity for any multi-service cloud platform.
I’m a DevOps/SRE/DevSecOps/Cloud Expert passionate about sharing knowledge and experiences. I have worked at Cotocus. I share tech blog at DevOps School, travel stories at Holiday Landmark, stock market tips at Stocks Mantra, health and fitness guidance at My Medic Plus, product reviews at TrueReviewNow , and SEO strategies at Wizbrand.
Do you want to learn Quantum Computing?
Please find my social handles as below;
Rajesh Kumar Personal Website
Rajesh Kumar at YOUTUBE
Rajesh Kumar at INSTAGRAM
Rajesh Kumar at X
Rajesh Kumar at FACEBOOK
Rajesh Kumar at LINKEDIN
Rajesh Kumar at WIZBRAND