{"id":57762,"date":"2026-01-13T01:36:34","date_gmt":"2026-01-13T01:36:34","guid":{"rendered":"https:\/\/www.devopsschool.com\/blog\/?p=57762"},"modified":"2026-02-21T08:45:51","modified_gmt":"2026-02-21T08:45:51","slug":"aws-architect-design-decision-matrix-nat-per-az-vs-single-nat-vs-regional-nat-for-eks-environments","status":"publish","type":"post","link":"https:\/\/www.devopsschool.com\/blog\/aws-architect-design-decision-matrix-nat-per-az-vs-single-nat-vs-regional-nat-for-eks-environments\/","title":{"rendered":"AWS Architect &amp; Design &#8211; Decision Matrix: NAT per AZ vs Single NAT vs Regional NAT for EKS Environments"},"content":{"rendered":"\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"683\" height=\"1024\" src=\"https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2026\/01\/AWS-EKS-VPC-Natgw_compressed-683x1024.jpg\" alt=\"\" class=\"wp-image-57765\" srcset=\"https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2026\/01\/AWS-EKS-VPC-Natgw_compressed-683x1024.jpg 683w, https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2026\/01\/AWS-EKS-VPC-Natgw_compressed-200x300.jpg 200w, https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2026\/01\/AWS-EKS-VPC-Natgw_compressed-768x1152.jpg 768w, https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2026\/01\/AWS-EKS-VPC-Natgw_compressed.jpg 800w\" sizes=\"auto, (max-width: 683px) 100vw, 683px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Final comparison table (all approaches)<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Approach<\/th><th>Architecture (AZ \/ NAT)<\/th><th>Design<\/th><th>Cost profile<\/th><th>Parity to PROD<\/th><th>Terraform complexity<\/th><th>Risk<\/th><th>Best-practice fit<\/th><th>Pros<\/th><th>Cons<\/th><th>Typical fit (envs)<\/th><\/tr><\/thead><tbody><tr><td><strong>A. Zonal NAT per AZ (classic HA)<\/strong><\/td><td><strong>3 AZ \/ 3 NAT<\/strong> (one per AZ)<\/td><td>Each private subnet routes <code>0.0.0.0\/0<\/code> to the <strong>same-AZ<\/strong> NAT<\/td><td><strong>Highest baseline<\/strong> NAT hourly; <strong>lowest cross-AZ egress cost<\/strong><\/td><td><strong>Excellent<\/strong><\/td><td>Medium (per-AZ loops for NAT + routes)<\/td><td><strong>Low<\/strong><\/td><td><strong>Strong<\/strong><\/td><td>HA for outbound; avoids cross-AZ NAT traffic; \u201cboring\u201d operations<\/td><td>Pays for 3 NATs even when non-prod is quiet<\/td><td><strong>PROD, UAT<\/strong>, Stage if pre-prod<\/td><\/tr><tr><td><strong>B. Single zonal NAT shared (cost compromise)<\/strong><\/td><td><strong>3 AZ \/ 1 NAT<\/strong> (in 1 AZ)<\/td><td>All private subnets (all AZs) route to a <strong>single<\/strong> NAT<\/td><td><strong>Lower hourly<\/strong> NAT; may add <strong>cross-AZ data transfer<\/strong> for 2 AZs<\/td><td>Good topology parity, <strong>weaker egress parity<\/strong><\/td><td>Low\u2013Medium (simple NAT, but routing must be consistent)<\/td><td><strong>Medium<\/strong><\/td><td><strong>Compromise<\/strong><\/td><td>Keeps 3-AZ layout; saves NAT hourly cost<\/td><td>AZ containing NAT becomes <strong>egress SPOF<\/strong>; cross-AZ charges\/latency possible<\/td><td><strong>Stage\/Dev<\/strong> only if outages acceptable &amp; traffic low<\/td><\/tr><tr><td><strong>C. Single-AZ lower envs<\/strong><\/td><td><strong>1 AZ \/ 1 NAT<\/strong><\/td><td>Reduce VPC + cluster footprint to one AZ<\/td><td><strong>Lowest overall<\/strong> (fewer subnets\/nodes\/LBs + 1 NAT)<\/td><td><strong>Poor<\/strong> (no multi-AZ testing)<\/td><td>Low\u2013Medium (parameterized modules)<\/td><td><strong>Medium\u2013High<\/strong> (late discovery of AZ issues)<\/td><td>OK if explicitly accepted<\/td><td>Cheapest + simplest; reduces many moving parts<\/td><td>Multi-AZ behavior not tested until UAT\/PROD<\/td><td><strong>DEV<\/strong> (often), sometimes Stage (sandbox)<\/td><\/tr><tr><td><strong>D. Regional NAT Gateway (newer, simplified HA)<\/strong><\/td><td><strong>3 AZ \/ \u201c1 NAT object\u201d<\/strong> (regional mode auto-expands across AZs)<\/td><td>Set NAT <strong>availability to regional<\/strong>; AWS manages multi-AZ expansion\/contraction<\/td><td>Can approach <strong>per-AZ cost<\/strong> depending on AZ expansion; simpler ops<\/td><td><strong>Excellent<\/strong><\/td><td>Low (fewer NAT resources\/route concerns)<\/td><td>Low\u2013Medium (newer feature adoption)<\/td><td><strong>Strong<\/strong> (modern best practice option)<\/td><td>\u201cHA by default\u201d + simpler architecture; reduces need to manage per-AZ NAT placement<\/td><td>Newer operational patterns; Terraform changes (auto\u2194manual) can recreate<\/td><td><strong>PROD\/UAT\/Stage<\/strong>, possibly Dev if you keep 3 AZ<\/td><\/tr><tr><td><strong>E. NAT Instance (DIY)<\/strong><\/td><td>1\u20133 AZ \/ EC2-based NAT<\/td><td>Replace NAT GW with EC2 NAT instances + routes<\/td><td>Lowest <strong>if tiny<\/strong>, but adds ops overhead<\/td><td>Depends<\/td><td>High (patching\/scaling\/HA)<\/td><td>Medium\u2013High<\/td><td>Not preferred for platform teams<\/td><td>Cheap for DEV; can stop when unused<\/td><td>You own uptime, scaling, patching, throughput limits<\/td><td>DEV only (if extreme cost pressure)<\/td><\/tr><tr><td><strong>Add-on: VPC Endpoints (EKS cost lever)<\/strong><\/td><td>Works with any of the above<\/td><td>Use S3 gateway endpoint + interface endpoints (ECR, CloudWatch Logs, STS\/KMS\/SSM, etc.)<\/td><td>Often <strong>reduces NAT $\/GB<\/strong> materially<\/td><td>Improves consistency &amp; reliability<\/td><td>Medium (endpoint set + SG\/policies)<\/td><td>Low<\/td><td><strong>Strong<\/strong><\/td><td>Reduces NAT traffic, improves security (no internet path)<\/td><td>Endpoint hourly costs exist; needs policy\/governance<\/td><td><strong>All envs<\/strong> (especially PROD\/UAT)<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Source-backed notes (why these rows look like they do)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>AWS explicitly advises <strong>same-AZ placement<\/strong> or <strong>NAT per AZ<\/strong> to reduce cross-AZ data transfer charges. (<a href=\"https:\/\/docs.aws.amazon.com\/vpc\/latest\/userguide\/nat-gateway-pricing.html?utm_source=chatgpt.com\" target=\"_blank\" rel=\"noopener\">AWS Documentation<\/a>)<\/li>\n\n\n\n<li>AWS EKS best practices recommend <strong>VPC endpoints<\/strong> to reduce costs and avoid internet traversal for AWS services. (<a href=\"https:\/\/docs.aws.amazon.com\/eks\/latest\/best-practices\/cost-opt-networking.html?utm_source=chatgpt.com\" target=\"_blank\" rel=\"noopener\">AWS Documentation<\/a>)<\/li>\n\n\n\n<li><strong>Regional NAT Gateway<\/strong>: AWS docs + \u201cWhat\u2019s New\u201d describe regional mode and that you no longer need a public subnet to host a regional NAT. (<a href=\"https:\/\/docs.aws.amazon.com\/vpc\/latest\/userguide\/nat-gateways-regional.html?utm_source=chatgpt.com\" target=\"_blank\" rel=\"noopener\">AWS Documentation<\/a>)<\/li>\n\n\n\n<li>Terraform support and lifecycle caveats for regional NAT (<code>availability_mode<\/code>, <code>availability_zone_address<\/code>, recreation behavior) are documented in the provider registry. (<a href=\"https:\/\/registry.terraform.io\/providers\/hashicorp\/aws\/6.26.0\/docs\/resources\/nat_gateway?utm_source=chatgpt.com\" target=\"_blank\" rel=\"noopener\">Terraform Registry<\/a>)<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<h2 class=\"wp-block-heading\">Environment-by-environment recommendation table (EKS)<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Environment<\/th><th>Recommended AZ footprint<\/th><th>Recommended NAT strategy<\/th><th>Why (practical outcome)<\/th><\/tr><\/thead><tbody><tr><td><strong>PROD<\/strong><\/td><td><strong>3 AZ<\/strong><\/td><td><strong>A (Zonal NAT per AZ)<\/strong> <em>or<\/em> <strong>D (Regional NAT)<\/strong><\/td><td>Multi-AZ egress should not have an AZ SPOF; minimize cross-AZ NAT traffic; keep ops predictable. (<a href=\"https:\/\/docs.aws.amazon.com\/vpc\/latest\/userguide\/nat-gateway-pricing.html?utm_source=chatgpt.com\" target=\"_blank\" rel=\"noopener\">AWS Documentation<\/a>)<\/td><\/tr><tr><td><strong>UAT<\/strong><\/td><td><strong>3 AZ<\/strong><\/td><td><strong>Match PROD<\/strong> (A or D)<\/td><td>UAT\u2019s job is to catch prod-like scaling\/AZ behaviors early.<\/td><\/tr><tr><td><strong>STAGE<\/strong><\/td><td>Depends on purpose<\/td><td>If <strong>pre-prod<\/strong>: <strong>match PROD<\/strong> (A or D). If <strong>sandbox<\/strong>: <strong>C (1 AZ)<\/strong> or <strong>B (3 AZ + 1 NAT)<\/strong><\/td><td>Pre-prod should validate multi-AZ. Sandbox can trade HA for cost. Cross-AZ NAT costs matter if you keep 3 AZ with 1 zonal NAT. (<a href=\"https:\/\/docs.aws.amazon.com\/vpc\/latest\/userguide\/nat-gateway-pricing.html?utm_source=chatgpt.com\" target=\"_blank\" rel=\"noopener\">AWS Documentation<\/a>)<\/td><\/tr><tr><td><strong>DEV<\/strong><\/td><td><strong>1 AZ<\/strong> (default)<\/td><td><strong>C (1 AZ + 1 NAT)<\/strong>; optionally endpoints<\/td><td>Cheapest while still functional. Keep 3 AZ only if you truly need early multi-AZ validation. (<a href=\"https:\/\/docs.aws.amazon.com\/eks\/latest\/best-practices\/cost-opt-networking.html?utm_source=chatgpt.com\" target=\"_blank\" rel=\"noopener\">AWS Documentation<\/a>)<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<h2 class=\"wp-block-heading\">Zonal NAT vs Regional NAT (quick decision points)<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Topic<\/th><th><strong>Zonal NAT (classic)<\/strong><\/th><th><strong>Regional NAT (new)<\/strong><\/th><\/tr><\/thead><tbody><tr><td>HA model<\/td><td>You build HA by deploying <strong>one NAT per AZ<\/strong><\/td><td>\u201cHA by default\u201d via <strong>regional availability mode<\/strong> and automatic AZ expansion (<a href=\"https:\/\/docs.aws.amazon.com\/vpc\/latest\/userguide\/nat-gateways-regional.html?utm_source=chatgpt.com\" target=\"_blank\" rel=\"noopener\">AWS Documentation<\/a>)<\/td><\/tr><tr><td>Ops model<\/td><td>More resources, more route-table patterns<\/td><td>Simpler NAT footprint; fewer NAT objects<\/td><\/tr><tr><td>Terraform<\/td><td>Straightforward per-AZ loops<\/td><td>Supported, but switching auto\/manual modes can recreate resources (<a href=\"https:\/\/registry.terraform.io\/providers\/hashicorp\/aws\/6.26.0\/docs\/resources\/nat_gateway?utm_source=chatgpt.com\" target=\"_blank\" rel=\"noopener\">Terraform Registry<\/a>)<\/td><\/tr><tr><td>Best fit<\/td><td>Teams wanting mature, widely-used patterns<\/td><td>Teams wanting simpler HA with less NAT sprawl<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<h2 class=\"wp-block-heading\">Following Concern has been address in this blog<\/h2>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>NAT Gateway Strategy for Multi-Account EKS: HA, Cost, and Parity Trade-offs<\/strong><\/li>\n\n\n\n<li><strong>EKS Network Egress Design Guide: Zonal vs Regional NAT and Environment Sizing<\/strong><\/li>\n\n\n\n<li><strong>Standardizing VPC Egress Across PROD\/UAT\/STAGE\/DEV for EKS<\/strong><\/li>\n\n\n\n<li><strong>Balancing Cost and Reliability: NAT Gateways, AZ Footprint, and VPC Endpoints in EKS<\/strong><\/li>\n\n\n\n<li><strong>Reference Architecture: Multi-AZ EKS with Optimized NAT and Private Connectivity<\/strong><\/li>\n\n\n\n<li><strong>Decision Matrix: NAT per AZ vs Single NAT vs Regional NAT for EKS Environments<\/strong><\/li>\n\n\n\n<li><strong>Reducing NAT Spend in EKS: Environment Tiers + VPC Endpoints Blueprint<\/strong><\/li>\n\n\n\n<li><strong>EKS Networking Parity Playbook: When to Match PROD and When to Cut Cost<\/strong><\/li>\n<\/ol>\n\n\n\n<p>Below is a consolidated <strong>recommendation guide<\/strong> for NAT strategy across <strong>PROD \/ UAT \/ STAGE \/ DEV<\/strong> for an <strong>EKS-based<\/strong> platform in <strong>separate AWS accounts<\/strong>, with <strong>3 AZs, 3 public + 3 private subnets per VPC<\/strong>.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<h2 class=\"wp-block-heading\">Executive conclusion<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>If an environment runs workloads across <strong>multiple AZs<\/strong>, the <strong>clean \u201cAWS classic\u201d design<\/strong> is <strong>NAT Gateway per AZ<\/strong> (zonal NAT) with each private subnet routing to the NAT in the <strong>same AZ<\/strong>. This avoids <strong>AZ-level egress SPOF<\/strong> and avoids <strong>cross-AZ data transfer<\/strong> for NAT traffic. (<a href=\"https:\/\/docs.aws.amazon.com\/vpc\/latest\/userguide\/nat-gateway-basics.html?utm_source=chatgpt.com\" target=\"_blank\" rel=\"noopener\">AWS Documentation<\/a>)<\/li>\n\n\n\n<li><strong>\u201c3 AZ + 1 zonal NAT GW\u201d<\/strong> (single NAT in one AZ, shared by all AZs) is a <strong>cost compromise<\/strong>, not best practice: it introduces an <strong>AZ-level SPOF for egress<\/strong> and typically adds <strong>cross-AZ charges<\/strong> for the two \u201cremote\u201d AZs. (<a href=\"https:\/\/repost.aws\/knowledge-center\/vpc-reduce-nat-gateway-transfer-costs?utm_source=chatgpt.com\" target=\"_blank\" rel=\"noopener\">Repost<\/a>)<\/li>\n\n\n\n<li><strong>New option (late 2026): Regional NAT Gateway<\/strong> (\u201cavailability mode = regional\u201d) gives you <strong>one NAT object<\/strong> that <strong>automatically expands across AZs based on workload presence<\/strong>, and simplifies route-table\/placement concerns. It\u2019s designed for \u201cHA by default\u201d with simpler ops. (<a href=\"https:\/\/docs.aws.amazon.com\/vpc\/latest\/userguide\/nat-gateways-regional.html?utm_source=chatgpt.com\" target=\"_blank\" rel=\"noopener\">AWS Documentation<\/a>)<\/li>\n\n\n\n<li>For EKS specifically, the biggest NAT cost driver is often AWS service traffic (ECR image pulls, logs\/metrics, STS\/KMS, etc.). Use <strong>VPC endpoints<\/strong> aggressively to reduce NAT processing\/data transfer where possible. (<a href=\"https:\/\/docs.aws.amazon.com\/eks\/latest\/best-practices\/cost-opt-networking.html?utm_source=chatgpt.com\" target=\"_blank\" rel=\"noopener\">AWS Documentation<\/a>)<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<h2 class=\"wp-block-heading\">Key AWS facts to anchor decisions<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Zonal NAT Gateway<\/strong> is created in a <strong>specific AZ<\/strong> and has redundancy <strong>within that AZ<\/strong> (not across AZs). (<a href=\"https:\/\/docs.aws.amazon.com\/vpc\/latest\/userguide\/nat-gateway-basics.html?utm_source=chatgpt.com\" target=\"_blank\" rel=\"noopener\">AWS Documentation<\/a>)<\/li>\n\n\n\n<li>AWS explicitly recommends reducing NAT data transfer charges by keeping traffic <strong>in the same AZ<\/strong> (i.e., use NAT per AZ for multi-AZ workloads), because cross-AZ paths can add cost. (<a href=\"https:\/\/repost.aws\/knowledge-center\/vpc-reduce-nat-gateway-transfer-costs?utm_source=chatgpt.com\" target=\"_blank\" rel=\"noopener\">Repost<\/a>)<\/li>\n\n\n\n<li><strong>Regional NAT Gateway<\/strong> automatically expands\/contracts across AZs based on workload presence. (<a href=\"https:\/\/docs.aws.amazon.com\/vpc\/latest\/userguide\/nat-gateways-regional.html?utm_source=chatgpt.com\" target=\"_blank\" rel=\"noopener\">AWS Documentation<\/a>)<\/li>\n\n\n\n<li>Terraform supports regional NAT configuration via <code>availability_mode<\/code> and (optionally) <code>availability_zone_address<\/code> blocks. (<a href=\"https:\/\/registry.terraform.io\/providers\/hashicorp\/aws\/6.26.0\/docs\/resources\/nat_gateway?utm_source=chatgpt.com\" target=\"_blank\" rel=\"noopener\">Terraform Registry<\/a>)<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<h2 class=\"wp-block-heading\">Approaches you should consider (with Pros\/Cons)<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Approach A \u2014 Multi-AZ + <strong>Zonal NAT GW per AZ<\/strong> (3 NATs for 3 AZs)<\/h3>\n\n\n\n<p><strong>Design<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>NAT GW in each public subnet (one per AZ).<\/li>\n\n\n\n<li>Each private subnet\u2019s route table points <code>0.0.0.0\/0<\/code> to the NAT in the same AZ.<\/li>\n<\/ul>\n\n\n\n<p><strong>Pros<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Best HA for egress (no AZ-level NAT SPOF).<\/li>\n\n\n\n<li>Avoids cross-AZ NAT traffic\/cost.<\/li>\n\n\n\n<li>Most aligned with long-standing AWS reference patterns.<\/li>\n<\/ul>\n\n\n\n<p><strong>Cons<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Highest baseline NAT hourly cost (3x NAT resources).<\/li>\n\n\n\n<li>More resources (though IaC makes it mostly variable-driven).<\/li>\n<\/ul>\n\n\n\n<p><strong>Terraform complexity<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Moderate: <code>for_each<\/code>\/<code>count<\/code> per AZ for NAT + per-AZ route tables (or per-AZ associations).<\/li>\n<\/ul>\n\n\n\n<p><strong>Risk<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Low.<\/li>\n<\/ul>\n\n\n\n<p><strong>Best fit<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>PROD and UAT (and Stage if it is \u201cpre-prod\u201d).<\/li>\n<\/ul>\n\n\n\n<p><strong>Best-practice alignment<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Strong. (<a href=\"https:\/\/docs.aws.amazon.com\/vpc\/latest\/userguide\/nat-gateway-basics.html?utm_source=chatgpt.com\" target=\"_blank\" rel=\"noopener\">AWS Documentation<\/a>)<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<h3 class=\"wp-block-heading\">Approach B \u2014 Multi-AZ + <strong>Single zonal NAT GW shared<\/strong> (1 NAT for 3 AZs)<\/h3>\n\n\n\n<p><strong>Design<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>One NAT GW in a single public subnet (e.g., AZ-a).<\/li>\n\n\n\n<li>All private subnet route tables (AZ-a\/b\/c) point <code>0.0.0.0\/0<\/code> to that one NAT.<\/li>\n<\/ul>\n\n\n\n<p><strong>Pros<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Lower NAT hourly cost vs 3 NATs.<\/li>\n\n\n\n<li>Keeps 3-AZ subnet topology (better parity than single-AZ environments).<\/li>\n<\/ul>\n\n\n\n<p><strong>Cons<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>AZ-level <strong>SPOF for outbound<\/strong>: if the NAT\u2019s AZ is impaired, <strong>all<\/strong> private subnets lose internet egress (even if other AZs are fine). (<a href=\"https:\/\/docs.aws.amazon.com\/vpc\/latest\/userguide\/nat-gateway-basics.html?utm_source=chatgpt.com\" target=\"_blank\" rel=\"noopener\">AWS Documentation<\/a>)<\/li>\n\n\n\n<li>Likely <strong>cross-AZ NAT traffic costs<\/strong> for the other AZs; AWS guidance is to keep high-traffic resources in the same AZ as the NAT to reduce transfer charges. (<a href=\"https:\/\/repost.aws\/knowledge-center\/vpc-reduce-nat-gateway-transfer-costs?utm_source=chatgpt.com\" target=\"_blank\" rel=\"noopener\">Repost<\/a>)<\/li>\n<\/ul>\n\n\n\n<p><strong>Terraform complexity<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Low\u2013moderate: simpler NAT resources, but route tables still need careful handling.<\/li>\n<\/ul>\n\n\n\n<p><strong>Risk<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Medium (acceptable only if the environment can tolerate losing outbound internet during AZ events).<\/li>\n<\/ul>\n\n\n\n<p><strong>Best fit<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>STAGE\/DEV only if you explicitly accept reduced egress HA and low traffic volume.<\/li>\n<\/ul>\n\n\n\n<p><strong>Best-practice alignment<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Compromise (not \u201cbest practice\u201d for HA). (<a href=\"https:\/\/repost.aws\/knowledge-center\/vpc-reduce-nat-gateway-transfer-costs?utm_source=chatgpt.com\" target=\"_blank\" rel=\"noopener\">Repost<\/a>)<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<h3 class=\"wp-block-heading\">Approach C \u2014 <strong>Single-AZ<\/strong> lower environments (1 AZ + 1 NAT)<\/h3>\n\n\n\n<p><strong>Design<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Only one AZ in the VPC for STAGE\/DEV (or DEV only).<\/li>\n<\/ul>\n\n\n\n<p><strong>Pros<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Lowest overall cost: fewer subnets, fewer nodes\/LBs, smaller footprint.<\/li>\n\n\n\n<li>Simplest networking.<\/li>\n<\/ul>\n\n\n\n<p><strong>Cons<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Parity drift<\/strong>: you won\u2019t test multi-AZ behavior (scheduling spread, AZ capacity edge cases, load balancer zonal behavior, failover patterns, etc.) until UAT\/PROD.<\/li>\n<\/ul>\n\n\n\n<p><strong>Terraform complexity<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Often higher in practice if you maintain separate topology modules; can be low if your module is fully parameterized (<code>az_count=1|3<\/code>) but you\u2019ll still have more branching.<\/li>\n<\/ul>\n\n\n\n<p><strong>Risk<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Medium\u2013high for \u201clate discovery\u201d of multi-AZ issues.<\/li>\n<\/ul>\n\n\n\n<p><strong>Best fit<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>DEV (usually OK), sometimes STAGE if stage is just a sandbox and not \u201cpre-prod\u201d.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<h3 class=\"wp-block-heading\">Approach D \u2014 <strong>Regional NAT Gateway<\/strong> (one NAT object, multi-AZ behavior managed by AWS)<\/h3>\n\n\n\n<p><strong>Design<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Create NAT GW with <code>availability_mode = \"regional\"<\/code>.<\/li>\n\n\n\n<li>AWS expands\/contracts across AZs based on workload presence; simplifies needing \u201cone NAT per AZ\u201d operations. (<a href=\"https:\/\/docs.aws.amazon.com\/vpc\/latest\/userguide\/nat-gateways-regional.html?utm_source=chatgpt.com\" target=\"_blank\" rel=\"noopener\">AWS Documentation<\/a>)<\/li>\n<\/ul>\n\n\n\n<p><strong>Pros<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Simplifies NAT architecture and ongoing ops (fewer NAT resources to reason about).<\/li>\n\n\n\n<li>\u201cHA by default\u201d posture (AWS-managed multi-AZ expansion model). (<a href=\"https:\/\/docs.aws.amazon.com\/vpc\/latest\/userguide\/nat-gateways-regional.html?utm_source=chatgpt.com\" target=\"_blank\" rel=\"noopener\">AWS Documentation<\/a>)<\/li>\n\n\n\n<li>Terraform support exists (<code>availability_mode<\/code>, optional <code>availability_zone_address<\/code>). (<a href=\"https:\/\/registry.terraform.io\/providers\/hashicorp\/aws\/6.26.0\/docs\/resources\/nat_gateway?utm_source=chatgpt.com\" target=\"_blank\" rel=\"noopener\">Terraform Registry<\/a>)<\/li>\n<\/ul>\n\n\n\n<p><strong>Cons \/ watchouts<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>This is newer, so you\u2019ll want strong rollout discipline (testing + observability + runbooks).<\/li>\n\n\n\n<li>Depending on how you configure addresses, resource changes can trigger recreation (Terraform notes). (<a href=\"https:\/\/registry.terraform.io\/providers\/hashicorp\/aws\/6.26.0\/docs\/resources\/nat_gateway?utm_source=chatgpt.com\" target=\"_blank\" rel=\"noopener\">Terraform Registry<\/a>)<\/li>\n<\/ul>\n\n\n\n<p><strong>Terraform complexity<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Lowest for NAT (often fewer resources, less per-AZ looping).<\/li>\n<\/ul>\n\n\n\n<p><strong>Risk<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Low\u2013medium (mostly \u201cnewness\u201d + change management).<\/li>\n<\/ul>\n\n\n\n<p><strong>Best fit<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Strong candidate for PROD\/UAT\/STAGE where you want simplicity without losing multi-AZ posture.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<h3 class=\"wp-block-heading\">Approach E \u2014 NAT Instances (only for very low-value envs)<\/h3>\n\n\n\n<p>Not a best practice for long-run managed-platform operations:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>You trade NAT GW cost for <strong>instance management<\/strong> (patching, scaling, HA, monitoring).<\/li>\n\n\n\n<li>Only consider for DEV if cost pressure is extreme.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<h2 class=\"wp-block-heading\">EKS-specific cost and design guidance (applies to all approaches)<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Reduce NAT dependency with VPC endpoints<\/h3>\n\n\n\n<p>EKS environments commonly send large volumes to AWS services; use VPC endpoints to bypass NAT for those flows and reduce NAT charges. (<a href=\"https:\/\/docs.aws.amazon.com\/eks\/latest\/best-practices\/cost-opt-networking.html?utm_source=chatgpt.com\" target=\"_blank\" rel=\"noopener\">AWS Documentation<\/a>)<\/p>\n\n\n\n<p>Common high-value endpoints in EKS VPCs:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Gateway endpoints:<\/strong> S3 (and DynamoDB if used)<\/li>\n\n\n\n<li><strong>Interface endpoints (often worth it):<\/strong> ECR (api + dkr), CloudWatch Logs, STS, KMS, SSM, EC2, ELB (as relevant)<\/li>\n<\/ul>\n\n\n\n<p>This is often a bigger win than \u201c1 NAT vs 3 NAT\u201d because it reduces <strong>per-GB<\/strong> NAT cost and avoids cross-AZ NAT effects.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<h2 class=\"wp-block-heading\">Recommendation by environment<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">PROD (must be boring, resilient)<\/h3>\n\n\n\n<p><strong>Recommended<\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Zonal NAT per AZ<\/strong> (Approach A) <strong>OR<\/strong><\/li>\n\n\n\n<li><strong>Regional NAT<\/strong> (Approach D) if you want simpler ops and are comfortable adopting the newer model after validation.<\/li>\n<\/ol>\n\n\n\n<p><strong>Avoid<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Single zonal NAT shared across AZs (Approach B).<\/li>\n<\/ul>\n\n\n\n<p><strong>Why<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Production multi-AZ workloads should not have an AZ-level egress SPOF. (<a href=\"https:\/\/docs.aws.amazon.com\/vpc\/latest\/userguide\/nat-gateway-basics.html?utm_source=chatgpt.com\" target=\"_blank\" rel=\"noopener\">AWS Documentation<\/a>)<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<h3 class=\"wp-block-heading\">UAT (closest to PROD)<\/h3>\n\n\n\n<p><strong>Recommended<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Match PROD architecture (same AZ count and NAT strategy).<\/li>\n\n\n\n<li>If PROD uses Approach A, UAT uses A. If PROD uses D, UAT uses D.<\/li>\n<\/ul>\n\n\n\n<p><strong>Why<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>UAT is where you want to catch \u201cprod-like\u201d issues early (networking, scaling, AZ placement behaviors).<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<h3 class=\"wp-block-heading\">STAGE (depends on what \u201cstage\u201d means in your org)<\/h3>\n\n\n\n<p><strong>If STAGE is truly pre-prod \/ release rehearsal<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Treat it like PROD\/UAT: <strong>Approach A or D<\/strong>.<\/li>\n<\/ul>\n\n\n\n<p><strong>If STAGE is mainly integration sandbox \/ shared testing<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Either:\n<ul class=\"wp-block-list\">\n<li><strong>Approach C (single-AZ)<\/strong> for cost, accepting parity drift; or<\/li>\n\n\n\n<li><strong>Approach B (3 AZ + 1 zonal NAT)<\/strong> only if traffic is low and you explicitly accept egress SPOF + some cross-AZ costs. (<a href=\"https:\/\/repost.aws\/knowledge-center\/vpc-reduce-nat-gateway-transfer-costs?utm_source=chatgpt.com\" target=\"_blank\" rel=\"noopener\">Repost<\/a>)<\/li>\n\n\n\n<li><strong>Approach D (regional NAT)<\/strong> is also attractive here because it preserves multi-AZ posture with simpler ops. (<a href=\"https:\/\/docs.aws.amazon.com\/vpc\/latest\/userguide\/nat-gateways-regional.html?utm_source=chatgpt.com\" target=\"_blank\" rel=\"noopener\">AWS Documentation<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<h3 class=\"wp-block-heading\">DEV (optimize cost, accept downtime)<\/h3>\n\n\n\n<p><strong>Recommended default<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Single-AZ (Approach C)<\/strong> is usually the best cost\/maintenance outcome.<\/li>\n<\/ul>\n\n\n\n<p><strong>Alternative<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>If you want to keep 3-AZ subnet topology for early detection of AZ-related issues:\n<ul class=\"wp-block-list\">\n<li>Prefer <strong>Regional NAT (Approach D)<\/strong> over \u201c3 AZ + 1 zonal NAT\u201d, because it avoids the \u201cdeliberate AZ SPOF\u201d pattern while keeping ops simpler. (<a href=\"https:\/\/docs.aws.amazon.com\/vpc\/latest\/userguide\/nat-gateways-regional.html?utm_source=chatgpt.com\" target=\"_blank\" rel=\"noopener\">AWS Documentation<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<h2 class=\"wp-block-heading\">Side-by-side summary (what to pick when)<\/h2>\n\n\n\n<p><strong>If your priority is \u201cAWS classic best practice + lowest risk\u201d:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>PROD\/UAT\/STAGE(pre-prod): <strong>A<\/strong><\/li>\n\n\n\n<li>DEV: <strong>C<\/strong><\/li>\n\n\n\n<li>Endpoints everywhere<\/li>\n<\/ul>\n\n\n\n<p><strong>If your priority is \u201ckeep parity but simplify NAT management\u201d:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>PROD\/UAT\/STAGE: <strong>D<\/strong><\/li>\n\n\n\n<li>DEV: <strong>C<\/strong> (or D if you insist on multi-AZ dev)<\/li>\n\n\n\n<li>Endpoints everywhere<\/li>\n<\/ul>\n\n\n\n<p><strong>If your priority is \u201cmaximum cost cutting in non-prod\u201d:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>PROD\/UAT: <strong>A<\/strong> (or D after adoption)<\/li>\n\n\n\n<li>STAGE: <strong>C<\/strong> (single-AZ)<\/li>\n\n\n\n<li>DEV: <strong>C<\/strong><\/li>\n\n\n\n<li>Endpoints still recommended<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<h2 class=\"wp-block-heading\">Terraform notes (practical maintainability)<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Keeping modules maintainable long-run<\/h3>\n\n\n\n<p>Model these as variables, not separate templates:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>az_count<\/code> (1 or 3)<\/li>\n\n\n\n<li><code>nat_strategy<\/code> (<code>per_az_zonal<\/code> | <code>single_zonal<\/code> | <code>regional<\/code>)<\/li>\n\n\n\n<li><code>enable_vpc_endpoints<\/code> (true\/false + list)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Regional NAT in Terraform<\/h3>\n\n\n\n<p>Terraform resource docs include <code>availability_mode<\/code> and <code>availability_zone_address<\/code> for regional NAT scenarios. (<a href=\"https:\/\/registry.terraform.io\/providers\/hashicorp\/aws\/6.26.0\/docs\/resources\/nat_gateway?utm_source=chatgpt.com\" target=\"_blank\" rel=\"noopener\">Terraform Registry<\/a>)<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<h2 class=\"wp-block-heading\">Final recommendation<\/h2>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>PROD &amp; UAT<\/strong>: Multi-AZ must not have an egress AZ SPOF \u2192 <strong>NAT per AZ (zonal)<\/strong> or adopt <strong>Regional NAT<\/strong> after validation. (<a href=\"https:\/\/docs.aws.amazon.com\/vpc\/latest\/userguide\/nat-gateway-basics.html?utm_source=chatgpt.com\" target=\"_blank\" rel=\"noopener\">AWS Documentation<\/a>)<\/li>\n\n\n\n<li><strong>STAGE<\/strong>: decide what Stage is:\n<ul class=\"wp-block-list\">\n<li>If \u201cpre-prod\u201d: match PROD\/UAT.<\/li>\n\n\n\n<li>If \u201csandbox\u201d: single-AZ is acceptable; otherwise consider Regional NAT to preserve multi-AZ without per-AZ NAT sprawl. (<a href=\"https:\/\/docs.aws.amazon.com\/vpc\/latest\/userguide\/nat-gateways-regional.html?utm_source=chatgpt.com\" target=\"_blank\" rel=\"noopener\">AWS Documentation<\/a>)<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>DEV<\/strong>: single-AZ is usually best.<\/li>\n\n\n\n<li>Across all envs: implement <strong>VPC endpoints<\/strong> to reduce NAT traffic and cost for EKS. (<a href=\"https:\/\/docs.aws.amazon.com\/eks\/latest\/best-practices\/cost-opt-networking.html?utm_source=chatgpt.com\" target=\"_blank\" rel=\"noopener\">AWS Documentation<\/a>)<\/li>\n<\/ol>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Final comparison table (all approaches) Approach Architecture (AZ \/ NAT) Design Cost profile Parity to PROD Terraform complexity Risk Best-practice fit Pros Cons Typical fit (envs) A. Zonal NAT per AZ (classic HA) 3 AZ \/ 3 NAT (one per AZ) Each private subnet routes 0.0.0.0\/0 to the same-AZ NAT Highest baseline NAT hourly; lowest&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"_kad_post_transparent":"","_kad_post_title":"","_kad_post_layout":"","_kad_post_sidebar_id":"","_kad_post_content_style":"","_kad_post_vertical_padding":"","_kad_post_feature":"","_kad_post_feature_position":"","_kad_post_header":false,"_kad_post_footer":false,"_kad_post_classname":"","_joinchat":[],"footnotes":""},"categories":[11138],"tags":[],"class_list":["post-57762","post","type-post","status-publish","format-standard","hentry","category-best-tools"],"_links":{"self":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/57762","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/comments?post=57762"}],"version-history":[{"count":4,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/57762\/revisions"}],"predecessor-version":[{"id":60290,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/57762\/revisions\/60290"}],"wp:attachment":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/media?parent=57762"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/categories?post=57762"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/tags?post=57762"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}