Alibaba Cloud Network Load Balancer (NLB) Tutorial: Architecture, Pricing, Use Cases, and Hands-On Guide for Networking and CDN

1. Introduction

Alibaba Cloud Network Load Balancer (NLB) is a managed Layer 4 (transport-layer) load balancing service in the Server Load Balancer family. It distributes incoming traffic (typically TCP and UDP) across multiple backend targets (for example, ECS instances) to improve availability, scalability, and fault tolerance for network applications.

In simple terms: you place an NLB in front of your servers, and clients connect to the NLB instead of directly to the servers. The NLB continuously checks backend health and routes new connections only to healthy backends. If a backend fails, traffic is routed elsewhere with minimal interruption.

Technically, Network Load Balancer (NLB) is designed for high-throughput, low-latency Layer 4 load balancing. You configure listeners (protocol/port), attach server groups (backend targets), and define health checks. NLB integrates with Alibaba Cloud networking primitives such as VPC, vSwitch, EIP (for Internet-facing access), and security controls such as Security Groups and RAM.

The main problem it solves is reliable traffic distribution for TCP/UDP services—especially when you need predictable performance, scale-out, and high availability across multiple backends and potentially multiple zones—without operating your own load balancer fleet.

Naming note (verify in official docs): Alibaba Cloud’s load balancing portfolio is commonly presented under Server Load Balancer, with different products for different layers and capabilities (for example, Classic Load Balancer (CLB) as legacy, Application Load Balancer (ALB) for Layer 7, and Network Load Balancer (NLB) for Layer 4). Always confirm the latest positioning and features in the official Alibaba Cloud documentation for your region.

2. What is Network Load Balancer (NLB)?

Official purpose

Network Load Balancer (NLB) provides managed, scalable, highly available Layer 4 load balancing for TCP/UDP traffic. Its purpose is to decouple client access from backend servers so you can scale, upgrade, and heal backends without changing client endpoints.

Core capabilities (high-level)

Layer 4 traffic distribution for network services (TCP/UDP).
Health checks to route traffic only to healthy backend targets.
Elastic scaling to handle changing traffic volume (how scaling is expressed and billed varies by region—verify in official docs).
Multi-zone deployment by attaching subnets (vSwitches) in more than one zone (exact behavior depends on region—verify).
Integration with VPC and Elastic IP (EIP) for intranet (private) or Internet-facing endpoints.

Major components

While exact console labels can evolve by region, NLB deployments typically involve:

NLB instance: The managed load balancer resource that owns the frontend endpoint(s).
Listener: A protocol/port configuration (for example, TCP:80 or UDP:53) that accepts client connections and forwards them.
Server group: A set of backend targets (for example, ECS instances) plus port and weight settings (capabilities vary—verify).
Health check configuration: Probe type and thresholds to decide if a backend is healthy.
Networking attachments:
VPC and vSwitch selection (often one or more zones)
Optional EIP for Internet-facing access
Security controls:
Security Group rules on backend servers (NLB itself typically does not replace server firewalling)
RAM policies controlling who can create/change NLB resources

Service type and scope

Service type: Fully managed load balancing (Networking and CDN category), part of Alibaba Cloud’s load balancing ecosystem.
Scope: NLB is typically a regional resource that you create inside a specific region and attach to one or more zones by selecting vSwitches/subnets in those zones. The exact scoping model and cross-zone behavior can differ—verify in official docs for your region.
Account/project scope: Managed under your Alibaba Cloud account. Access is controlled through RAM.

How it fits into the Alibaba Cloud ecosystem

Network Load Balancer (NLB) commonly sits in front of: – Elastic Compute Service (ECS) instances running TCP services (APIs over TCP, gRPC over TCP pass-through, game servers, VPN endpoints, custom protocols). – Containerized workloads (for example, ACK exposing services at Layer 4—specific integration details depend on cluster and annotations—verify). – Stateful network services (databases with strict network behavior, message brokers, or custom binary protocols—provided you understand session/connection behavior).

NLB also interacts with: – VPC routing and segmentation – EIP for public entry points – CloudMonitor for metrics and alarms – Log Service (SLS) for logs if/where supported (verify support and setup steps)

3. Why use Network Load Balancer (NLB)?

Business reasons

Higher uptime for revenue-critical services: NLB enables multi-backend failover so a single server issue does not become an outage.
Faster scaling without endpoint changes: Marketing campaigns or seasonal spikes can be handled by adding/removing backends behind the same stable endpoint.
Reduced operational burden: You don’t maintain HAProxy/Nginx/Keepalived clusters for L4 distribution.

Technical reasons

Layer 4 fit: When you need to balance TCP/UDP without HTTP routing rules (which is an ALB use case), NLB is simpler and closer to the network.
Protocol flexibility: Suitable for non-HTTP protocols and UDP workloads.
Preserve application design: Many applications already handle HTTP at the app layer or require pass-through behavior.

Operational reasons

Health-driven routing: Automated removal of unhealthy backends reduces incident response time.
Centralized control point: You manage listeners, backend membership, and health checks in one place.
Rolling upgrades: Drain traffic from a backend (or remove it), patch, and add it back.

Security and compliance reasons

Minimize direct exposure of servers: Clients connect to NLB instead of your ECS public IPs.
Controlled inbound paths: You can restrict backend Security Group inbound to known ranges (for example, VPC CIDRs) rather than broad public access.
Auditable changes: RAM + ActionTrail (if enabled in your environment) provide change visibility (verify service name and availability in your account).

Scalability and performance reasons

Scales with demand: Managed load balancers are designed to handle scale better than single-host balancers.
Low-latency forwarding: L4 forwarding avoids L7 parsing overhead (though you lose L7 routing features).

When teams should choose it

Choose Network Load Balancer (NLB) when: – Your workload is TCP/UDP oriented. – You need a stable endpoint with high availability. – You want managed scaling and health checks without operating your own L4 balancer. – You don’t need L7 routing features like host/path rules (ALB territory).

When they should not choose it

Avoid (or reconsider) NLB when: – You need HTTP/HTTPS L7 routing, header-based routing, rewrites, or advanced traffic management—use Application Load Balancer (ALB) instead. – You require deep WAF-style protections inline at L7 (often fronted by WAF + ALB/CDN patterns; NLB is not a WAF). – You need features that may not be available on NLB in your region (for example, specific logging options or IPv6)—verify in official docs.

4. Where is Network Load Balancer (NLB) used?

Industries

SaaS and Internet platforms (TCP APIs, gateways, custom protocols)
Gaming (real-time TCP/UDP session traffic)
FinTech and payments (network services with strict latency requirements; always validate compliance needs)
IoT platforms (device gateways, MQTT over TCP; confirm product fit and patterns)
Media and live streaming control planes (not CDN delivery itself, but control/ingest components)

Team types

Platform engineering teams standardizing ingress for TCP/UDP services
SRE/operations teams needing HA patterns with minimal self-managed infrastructure
DevOps teams building repeatable environments across regions
Security teams enforcing centralized ingress and auditing

Workloads and architectures

Multi-tier VPC architectures (public edge -> NLB -> private app tier)
Microservices that expose some L4 services (service mesh gateways, custom protocols)
Kubernetes clusters exposing L4 services (depending on ACK integration—verify)
Hybrid scenarios where NLB fronts services that bridge to on-prem through VPN/Express Connect (architecture-dependent)

Production vs dev/test usage

Production: Common for mission-critical TCP/UDP entry points with multi-zone resilience and controlled network exposure.
Dev/test: Useful to simulate production ingress patterns; be careful with cost drivers like Internet EIP and data transfer.

5. Top Use Cases and Scenarios

Below are practical, realistic scenarios where Network Load Balancer (NLB) is often a good fit.

1) Internet-facing TCP service entry point

Problem: Users need a single stable IP/DNS to reach a TCP service, but you want multiple backend servers for HA.
Why NLB fits: L4 distribution with health checks and managed endpoint.
Example: A public TCP API on port 443 (TLS pass-through handled by the app) backed by two ECS instances.

2) UDP-based service (DNS, game matchmaking, telemetry)

Problem: UDP traffic must be distributed across multiple backends.
Why NLB fits: UDP support at Layer 4.
Example: A UDP telemetry collector on port 9000 scaled across several ECS backends.

3) Internal (intranet) load balancing inside a VPC

Problem: Multiple internal services need a stable private endpoint without exposing backends publicly.
Why NLB fits: Private NLB endpoint in the VPC, routing only within your network boundary.
Example: A private database proxy tier or internal TCP gateway accessed by app servers in the same VPC.

4) Blue/green backend rotation for zero-downtime deployments

Problem: You need to deploy new versions without downtime.
Why NLB fits: You can shift backend membership by adding/removing targets from the server group.
Example: Deploy v2 instances, add them to the server group, then remove v1 after validation.

5) Multi-zone backend resilience for a stateful TCP service

Problem: A zonal failure should not fully take down client connectivity.
Why NLB fits: You can attach subnets across zones and distribute to backends in multiple zones (exact behavior varies—verify).
Example: A TCP message broker cluster across two zones with NLB in front.

6) Exposing Kubernetes Services (Layer 4) to the Internet

Problem: You need a managed L4 endpoint for a Kubernetes Service of type LoadBalancer.
Why NLB fits: L4 LB is a common service exposure mechanism (integration details depend on ACK and annotations—verify).
Example: An ACK cluster exposing a TCP gRPC gateway (pass-through) via NLB.

7) Replace self-managed HAProxy/Keepalived L4 cluster

Problem: Maintaining HA for self-managed load balancers is complex (VRRP, failover, patching).
Why NLB fits: Managed HA and scaling reduces operational overhead.
Example: Migrating from 2-node HAProxy to NLB while keeping backend ports stable.

8) Centralized ingress for multiple TCP ports (per service)

Problem: Each backend service requires a stable endpoint but on different ports.
Why NLB fits: Multiple listeners can front different ports (limits apply—verify quotas).
Example: Port 22 (restricted), 443 (API), 1883 (MQTT) each via separate listeners to separate server groups.

9) Protect backends by removing public IPs

Problem: Backend ECS instances shouldn’t have public IPs but still need to serve Internet users.
Why NLB fits: Internet-facing NLB with private backends in VPC subnets.
Example: Public users connect to NLB EIP; ECS instances are only reachable via private IP.

10) Gradual migration between backend fleets

Problem: You’re migrating from an old backend pool to a new one and want controlled cutover.
Why NLB fits: You can adjust backend weights (if supported) or split traffic via multiple listeners/DNS patterns.
Example: Run old and new stacks in parallel; progressively shift traffic by backend membership changes.

6. Core Features

Feature availability can be region-dependent and may vary by account type. For any feature you plan to rely on in production, verify in official Alibaba Cloud documentation for Network Load Balancer (NLB).

6.1 Layer 4 load balancing (TCP/UDP)

What it does: Forwards TCP/UDP connections from clients to healthy backend targets.
Why it matters: Supports protocols where L7 routing is not possible or not desired.
Practical benefit: Low overhead and broad protocol compatibility.
Caveats: Limited application-layer insight compared to ALB (no host/path routing).

6.2 Listeners (front-end protocol/port configuration)

What it does: Defines how clients connect (protocol + port) and where traffic is forwarded.
Why it matters: Allows you to expose one or multiple network entry points.
Practical benefit: Clear separation between frontend ports and backend target groups.
Caveats: Listener count and port ranges may be limited by quotas—verify limits.

6.3 Server groups (backend target management)

What it does: Groups backend targets and defines how NLB forwards to them.
Why it matters: Makes scaling and failover operationally simple.
Practical benefit: Add/remove backends during scaling or maintenance without changing client endpoints.
Caveats: Target types (ECS, ENI, IP) and features like weights can vary—verify supported target types.

6.4 Health checks

What it does: Periodically probes backends to determine if they can receive new connections.
Why it matters: Prevents routing to failed or misconfigured backends.
Practical benefit: Automated failover and faster recovery.
Caveats: Misconfigured health checks are a common cause of “no healthy backend” incidents.

6.5 Internet-facing and internal load balancing

What it does: Exposes services either publicly (typically via EIP) or privately within a VPC.
Why it matters: Supports both external customer traffic and internal service-to-service traffic patterns.
Practical benefit: Use the same operational model for edge and internal traffic.
Caveats: Public endpoints introduce additional cost and security considerations (EIP + outbound data transfer).

6.6 Multi-zone high availability (where supported)

What it does: Allows deploying NLB across multiple zones by associating vSwitches in different zones.
Why it matters: Reduces the blast radius of a zonal failure.
Practical benefit: Higher availability without building multi-zone routing yourself.
Caveats: Cross-zone traffic behavior and charge implications (if any) can vary—verify in docs and pricing.

6.7 Integration with Alibaba Cloud networking (VPC, vSwitch, EIP)

What it does: NLB is created in a VPC and placed into vSwitch subnets; public access typically uses EIP.
Why it matters: Aligns with standard network segmentation and routing controls.
Practical benefit: Enables private backends, controlled east-west traffic, and structured subnetting.
Caveats: Wrong subnet/route selection can prevent health checks and data plane connectivity.

6.8 Observability (metrics/logs) via Alibaba Cloud tooling

What it does: Exposes operational metrics (and in some cases logs) for monitoring and troubleshooting.
Why it matters: Load balancers are critical infrastructure—visibility reduces MTTR.
Practical benefit: Build alarms on connection counts, health status changes, or traffic patterns.
Caveats: Access logging and log destinations can be region/edition dependent—verify.

6.9 Access control via RAM

What it does: Uses Alibaba Cloud RAM to control who can create/modify/delete NLB resources.
Why it matters: Prevents accidental changes and supports least privilege.
Practical benefit: Separation of duties between platform, security, and application teams.
Caveats: Ensure policies include dependent services (VPC/ECS/EIP) if your workflow needs them.

7. Architecture and How It Works

7.1 High-level architecture

At a high level, Network Load Balancer (NLB) sits at the edge of a VPC (or internally within it) and forwards traffic to backend targets:

Client connects to the NLB frontend endpoint (public or private).
NLB evaluates the listener configuration (protocol/port).
NLB selects a healthy backend from the associated server group.
NLB forwards traffic to the backend on the configured backend port.
Health checks run continuously to keep the healthy set updated.

7.2 Data flow vs control flow

Data plane: Actual TCP/UDP packets between clients, NLB, and backends.
Control plane: API/console operations that create/modify listeners, server groups, health checks, and backend registrations.

This separation matters operationally: – A control plane error (like wrong health check settings) can remove all backends from rotation even if servers are fine. – Data plane issues often relate to routes, security groups, or application-level binding/listening.

7.3 Integrations and dependencies

Common dependencies in Alibaba Cloud: – VPC / vSwitch: NLB is attached to your network; backend connectivity depends on correct subnetting and routing. – ECS: Common backend compute target. – EIP: Common requirement for Internet-facing access. – CloudMonitor: Metrics, dashboards, and alerting. – Log Service (SLS): If access logging is supported and enabled, logs may be delivered to SLS (verify setup and region support). – RAM: Permission management for NLB resources and related networking resources.

7.4 Security/authentication model

Management access: Controlled by RAM (users, roles, policies).
Network access:
Public NLB endpoints must be controlled by upstream controls (IP allowlists, Anti-DDoS/WAF patterns, or application auth as appropriate).
Backends must be locked down with Security Group rules and subnet design.
Auditability: Use Alibaba Cloud audit tooling (for example, ActionTrail—verify service availability and naming) to track API operations.

7.5 Networking model (practical mental model)

NLB is placed into one or more vSwitch subnets in your VPC.
For Internet-facing services, a public endpoint typically uses EIP.
Backend ECS instances are usually in private subnets (no public IP), reachable from NLB within the VPC.
Backend Security Groups must allow inbound traffic on the service port(s) from appropriate sources (ideally VPC ranges or known LB source ranges—verify recommended patterns in docs).

7.6 Monitoring/logging/governance considerations

Monitor:
Backend health status changes (flapping)
Connection rates
Traffic volume
Error counters (where available)
Log:
Configuration changes (audit)
Traffic/access logs (if supported)
Governance:
Enforce tagging for cost allocation
Use naming conventions (env-app-region-purpose)
Define change control for listeners/server groups

Simple architecture diagram (Mermaid)

flowchart LR
  U[Users / Clients] -->|TCP/UDP| NLB[Alibaba Cloud Network Load Balancer (NLB)]
  NLB --> ECS1[ECS Backend 1]
  NLB --> ECS2[ECS Backend 2]

Production-style architecture diagram (Mermaid)

flowchart TB
  DNS[Public DNS] --> NLBpub[Internet-facing NLB\n(Region)]
  subgraph VPC[Alibaba Cloud VPC]
    subgraph Z1[Zone A]
      VS1[vSwitch A] --> ASG1[ECS backend pool A]
    end
    subgraph Z2[Zone B]
      VS2[vSwitch B] --> ASG2[ECS backend pool B]
    end
    NLBpub -->|Forward TCP/UDP| ASG1
    NLBpub -->|Forward TCP/UDP| ASG2
    ASG1 --> RDS[(ApsaraDB / RDS)]
    ASG2 --> RDS
  end
  MON[CloudMonitor Alarms] -.metrics.-> NLBpub
  LOG[Log Service (SLS)\n(if enabled/supported)] -.logs.-> NLBpub

8. Prerequisites

Before you start the hands-on lab:

Account and billing

An active Alibaba Cloud account with billing enabled.
Ability to create ECS, VPC, EIP, and Server Load Balancer / NLB resources in your chosen region.

Permissions (RAM)

You need permissions to: – Create and manage NLB instances, listeners, and server groups – Create/attach EIP (if Internet-facing) – Create ECS instances and manage Security Groups – Create VPC/vSwitch resources (if not already available)

If you use RAM users/roles: – Assign appropriate RAM policies for Server Load Balancer, VPC, ECS, and EIP. – Policy names can differ; in many Alibaba Cloud services there are system policies like “FullAccess” variants. Verify in the RAM console for your account.

Tools

Alibaba Cloud console access (web).
SSH client to access ECS instances:
macOS/Linux: built-in ssh
Windows: Windows Terminal / OpenSSH or PuTTY

Region availability

NLB availability and specific features vary by region. Confirm in the Alibaba Cloud console for your region.

Quotas/limits

Load balancer instance quotas, listener limits, and EIP quotas apply. Check:
SLB/NLB quota pages in the console
Any region-specific restrictions

Prerequisite services

VPC and vSwitch (subnet)
ECS instances (backend)
(Optional) EIP for Internet-facing NLB

9. Pricing / Cost

Pricing for Alibaba Cloud Network Load Balancer (NLB) is usage-based and region-dependent. Do not assume a single global rate. Always confirm on the official Alibaba Cloud pricing pages for your region.

Pricing dimensions (common model components)

In Alibaba Cloud load balancing services, costs typically come from combinations of:

Load balancer instance or resource fee – Charged per time unit (hourly in pay-as-you-go; monthly/yearly in subscription), depending on billing mode.
Capacity/usage-based charges – Often based on one or more of:
- New connections / concurrent connections
- Processed bytes
- Rules/listeners (less common at L4)
- “Capacity units” (sometimes expressed as LCUs or similar units in cloud load balancers; verify if NLB uses this naming in your region)
Internet-facing costs – EIP fees (if you allocate an EIP) – Data transfer (Internet egress is commonly billed)
Dependent resource costs – ECS instances (compute + disk) – CloudMonitor beyond free metrics tiers (if applicable) – Log Service (SLS) ingestion/storage if you enable logging (if supported)

Free tier

There is generally no universal free tier for load balancers and EIPs. Occasionally promotions exist. Verify current promotions in your Alibaba Cloud account and region.

Primary cost drivers (what makes bills go up)

High traffic volume (especially Internet egress)
High connection rates (for connection-based pricing dimensions)
More public endpoints (EIPs)
Always-on resources in dev/test environments (NLB + EIP running 24/7)

Hidden or indirect costs to plan for

Cross-zone traffic considerations: Multi-zone architectures can change traffic patterns; verify whether your design increases inter-zone data transfer and whether it is billed in your scenario.
Logging costs: Access logs can generate significant ingestion volume.
Overprovisioned test environments: Leaving EIPs and LBs allocated is a common cost leak.

How to optimize cost (practical checklist)

Prefer internal NLB for internal services; reserve Internet-facing NLB for true edge entry points.
Reduce Internet egress using:
Caching/CDN for HTTP workloads (note: NLB is L4; CDN integration is typically L7/HTTP)
Regional placement closer to users
Right-size backend scaling to avoid excess capacity.
Use tagging for cost allocation and cleanup automation.

Example low-cost starter estimate (no fabricated numbers)

A minimal lab typically includes: – 1 NLB instance (pay-as-you-go) – 1 EIP (pay-as-you-go) if Internet-facing – 2 small ECS instances (pay-as-you-go) for backends – Minimal traffic for testing

To estimate cost: – Use the Alibaba Cloud Pricing Calculator: https://www.alibabacloud.com/pricing/calculator – Check the Server Load Balancer / NLB pricing pages for your region (see resource section at the end). If you can’t find a dedicated NLB pricing page, look under “Server Load Balancer” pricing and select NLB.

Example production cost considerations

For production, plan for: – Multiple backends across zones (more ECS cost) – Higher traffic and higher connection rates (higher usage charges) – EIP and Internet egress (often the largest part for public services) – Observability (logs + metrics + retention)

10. Step-by-Step Hands-On Tutorial

Objective

Deploy a low-cost, beginner-friendly setup where Alibaba Cloud Network Load Balancer (NLB) distributes TCP port 80 traffic across two ECS instances running Nginx, and validate round-robin-style behavior by returning different backend IDs.

Lab Overview

You will: 1. Create (or reuse) a VPC and vSwitch. 2. Launch two ECS instances in the same VPC. 3. Install Nginx and serve unique content from each ECS. 4. Create an Internet-facing Network Load Balancer (NLB) with a listener on TCP:80. 5. Create a server group and register the two ECS instances. 6. Validate access via the NLB public endpoint. 7. Clean up resources to stop charges.

Notes: – Console terminology can vary slightly by region. Use the closest matching option and verify in official docs if labels differ. – If you want a private-only lab (no EIP), create an internal NLB and test from a third ECS inside the VPC.

Step 1: Choose region and create networking (VPC + vSwitch)

Sign in to the Alibaba Cloud console.
Select a region where ECS and Network Load Balancer (NLB) are available.
Go to VPC: – Create a VPC (example CIDR: 10.0.0.0/16) – Create a vSwitch (subnet) in one zone (example CIDR: 10.0.1.0/24)

Expected outcome – You have a VPC and vSwitch ready for ECS and NLB placement.

Verification – In the VPC console, confirm the VPC and vSwitch are in “Available” state.

Step 2: Create a Security Group for backend ECS

Create a Security Group that allows: – Inbound TCP 80 from a safe source. – For a lab, you can allow from the VPC CIDR (10.0.0.0/16) so the NLB can reach the backends. – Avoid 0.0.0.0/0 on backend servers in production. – Inbound SSH 22 from your admin IP (your home/office IP), so you can configure the servers. – Outbound: allow default outbound for package installation.

Expected outcome – A Security Group attached to the VPC.

Verification – In ECS console, confirm rules include TCP:80 and SSH:22 as intended.

Step 3: Launch two ECS instances (backends)

Go to ECS > Instances > Create Instance.
Choose: – Same region and VPC/vSwitch created earlier – An economical instance type suitable for labs – A Linux image (Alibaba Cloud Linux, CentOS, Ubuntu, etc.)
Attach the backend Security Group.
Ensure both instances have private IPs in the vSwitch subnet.
For lower cost and better security, do not assign public IPs to the backend ECS for this lab. You’ll use the NLB EIP for public access.

Repeat for the second ECS.

Expected outcome – Two running ECS instances in the same VPC/vSwitch.

Verification – In ECS instance list, confirm both instances are “Running” and have private IP addresses (for example, 10.0.1.10 and 10.0.1.11).

Step 4: Install Nginx and set unique responses on each backend

SSH into ECS-1 and ECS-2 and install Nginx.

Option A: Alibaba Cloud Linux/CentOS/RHEL-like

sudo yum makecache
sudo yum install -y nginx
sudo systemctl enable nginx
sudo systemctl start nginx

Option B: Ubuntu/Debian-like

sudo apt-get update
sudo apt-get install -y nginx
sudo systemctl enable nginx
sudo systemctl start nginx

Now set a unique index page.

On ECS-1:

echo "backend-1 $(hostname)" | sudo tee /usr/share/nginx/html/index.html

On ECS-2:

echo "backend-2 $(hostname)" | sudo tee /usr/share/nginx/html/index.html

Confirm Nginx responds locally on each server:

curl -s http://127.0.0.1/

Expected outcome – ECS-1 returns “backend-1 …” – ECS-2 returns “backend-2 …”

Verification – You see the expected response on each ECS using curl.

Step 5: Create the Network Load Balancer (NLB)

Navigate to Server Load Balancer (or Network Load Balancer) in the console.
Create a new Network Load Balancer (NLB) instance.
Choose: – Network type: Intranet or Internet-facing
- For this lab, choose Internet-facing so you can test from your local machine.
- VPC: select your lab VPC
- vSwitch: select your vSwitch (and optionally additional vSwitches in other zones if you created them)
If prompted, allocate or associate an EIP for the NLB.

Expected outcome – NLB instance is created and shows a frontend address (IP or DNS name).

Verification – In the NLB details page, confirm the instance state is “Active/Running” and an Internet-facing endpoint is present.

Step 6: Create a server group and register ECS backends

In the NLB console, create a server group.
Add backend servers: – Register ECS-1 and ECS-2 – Backend port: 80 – Weight: keep defaults (if weight is supported and visible)
Configure health check: – Protocol: TCP (or HTTP health check if available for TCP listener—options vary) – Port: 80 – Thresholds: keep defaults for the lab

Expected outcome – Server group contains both ECS instances and health checks begin.

Verification – Backend health transitions to “Healthy” after the initial checks. – If backends remain unhealthy, see Troubleshooting below.

Step 7: Create a TCP listener on port 80

Create a listener: – Protocol: TCP – Listener port: 80
Associate the listener with the server group created earlier.
Save/apply configuration.

Expected outcome – Listener is created and in “Running” state.

Verification – Listener status is active and attached to the correct server group.

Validation

From your local machine, access the NLB endpoint.

If you have an IP:

curl -s http://<NLB_PUBLIC_IP>/

If you have a DNS name:

curl -s http://<NLB_DNS_NAME>/

Run multiple times to observe responses from different backends:

for i in $(seq 1 10); do curl -s http://<NLB_ENDPOINT>/; echo; done

What you should see – Alternating (or at least varying) responses like: – backend-1 ... – backend-2 ...

Note: Load balancing behavior depends on algorithm, connection reuse, and client behavior. If your client reuses a TCP connection, you may see the same backend repeatedly. Use repeated new connections (as above) and/or different clients to observe distribution.

Troubleshooting

Issue: NLB shows “No healthy backends”

Common causes: – Backend Security Group does not allow inbound TCP/80 from the NLB/VPC. – Nginx not running or not listening on port 80. – Backend port mismatch (server group uses 80 but service listens elsewhere). – Wrong vSwitch/VPC selection (NLB can’t reach backends).

Fix checklist: – On each backend: bash sudo systemctl status nginx ss -lntp | grep ':80' || sudo netstat -plnt | grep ':80' curl -s http://127.0.0.1/ – Ensure Security Group inbound allows TCP/80 from VPC CIDR (lab) or the recommended source ranges per official docs (production).

Issue: You can’t reach the NLB from the Internet

Ensure the NLB is Internet-facing and has an associated EIP.
Confirm local ISP/firewall allows outbound to TCP/80.
If your organization blocks port 80, use TCP/443 and change Nginx to listen on 443 only if you understand TLS termination (NLB is L4; TLS termination may not be provided—verify).

Issue: Responses don’t alternate between backends

Some clients reuse connections (HTTP keep-alive). Try: bash curl -s --http1.1 -H "Connection: close" http://<NLB_ENDPOINT>/
Check whether backend weights or algorithm settings exist in your region; adjust only if you understand the behavior (verify in docs).

Cleanup

To avoid ongoing charges, delete resources in this order:

Delete the listener(s) from the NLB.
Delete the server group (if it’s not used elsewhere).
Delete the NLB instance.
Release the EIP (if it’s not automatically released with the NLB—verify your region behavior).
Stop/delete the ECS instances.
Optionally delete VPC/vSwitch (only if they’re dedicated to this lab).

Verification – Confirm the NLB instance and EIP are no longer listed in the console. – Confirm ECS instances are terminated (or stopped if you intentionally keep them).

11. Best Practices

Architecture best practices

Choose the correct load balancer type:
NLB for L4 TCP/UDP
ALB for L7 HTTP/HTTPS routing
CLB only when you must support legacy patterns (verify current recommendations)
Design for multi-zone where supported:
Place backends in at least two zones
Attach vSwitches in multiple zones to the NLB if required by the product design in your region
Keep backends private (no public IPs) for Internet-facing services; use NLB + EIP as the controlled entry.

IAM/RAM best practices

Use least privilege:
Separate roles for LB admins vs app operators
Require MFA for privileged users.
Use change control:
Restrict who can modify listeners and server groups in production.

Cost best practices

Avoid leaving Internet-facing NLB + EIP running for dev/test.
Use tagging:
env=dev|prod, app=..., owner=..., costcenter=...
Monitor data transfer and log ingestion volumes.

Performance best practices

Configure health checks to balance sensitivity vs stability:
Too aggressive = flapping and unnecessary failovers
Too lax = slow detection of failures
Ensure backends are sized for peak connections and throughput.
Avoid unnecessary NAT hops and keep traffic paths simple within the VPC.

Reliability best practices

Spread backends across zones.
Use autoscaling or operational runbooks to replace unhealthy instances quickly.
Test failure scenarios:
Stop Nginx on one backend and confirm traffic shifts.
Terminate one ECS and validate recovery behavior.

Operations best practices

Standardize naming:
nlb-<env>-<app>-<region>-<purpose>
Record configuration intent:
Listener ports, server group membership, health check parameters.
Set CloudMonitor alarms (examples):
Unhealthy backend count > 0 for more than N minutes
Sudden traffic drop/spike (possible outage or attack)

Governance best practices

Enforce resource ownership and lifecycle policies.
Use infrastructure-as-code where possible (Terraform for Alibaba Cloud exists; verify provider support for NLB resources and attributes).

12. Security Considerations

Identity and access model

Use RAM to restrict:
Who can create/delete NLB instances
Who can modify listeners and server groups
Who can allocate/release EIPs
Prefer RAM roles and temporary credentials for automation pipelines.

Encryption

NLB is Layer 4; TLS termination may not be provided (or may be provided in specific modes/regions—verify).
Common patterns:
TLS pass-through: backend terminates TLS (app or proxy like Nginx/Envoy)
If you need managed TLS termination and HTTP routing, consider ALB (L7)

Network exposure

For Internet-facing services:
Restrict service ports to only what you need
Use upstream DDoS protections appropriate to your risk profile (Alibaba Cloud has Anti-DDoS offerings—choose based on requirements and verify integration patterns)
For backend ECS:
Avoid public IPs
Security Group inbound should be minimal (service port from VPC ranges or documented LB source ranges)

Secrets handling

Do not store secrets in user-data scripts or public repos.
Use Alibaba Cloud secret management services if available in your environment (verify service options), or store secrets in encrypted mechanisms with strict RAM policies.

Audit and logging

Enable audit trails for API actions (for example, ActionTrail—verify in your environment).
If access logs are supported for NLB, route them to a controlled log project with retention policies.

Compliance considerations

If you handle regulated data:
Confirm region residency requirements
Confirm logging, encryption, and access control requirements
Document traffic flows (edge -> NLB -> backend) for audits

Common security mistakes

Allowing backend ports from 0.0.0.0/0 when NLB is the intended entry point.
Over-privileged RAM users that can delete production load balancers.
No monitoring/alerting for backend health changes.
Exposing admin ports (SSH/RDP) through public listeners.

Secure deployment recommendations

Use private backends + Internet-facing NLB.
Lock down backend Security Groups.
Separate admin access (VPN/bastion) from service ingress.
Implement least-privilege RAM and audit all LB changes.

13. Limitations and Gotchas

Always confirm current limits in the official docs for your region and account.

Feature variance by region: Logging, IPv6, target types (ECS/ENI/IP), and advanced settings may differ.
Health check pitfalls: Incorrect ports, wrong protocol, or overly strict thresholds can mark all backends unhealthy.
Connection reuse: TCP-level behavior can make distribution look “sticky” depending on client connection reuse.
Backend security rules: If backend Security Groups are too strict (or too permissive), you’ll either break traffic or create security risk.
EIP lifecycle: Releasing NLB may or may not automatically release EIPs depending on how they were allocated—verify and always check to avoid cost leaks.
Quota ceilings: Listener count, server group size, and instance quotas can limit rapid scaling in emergencies unless pre-approved.
Not an L7 feature set: If you need HTTP redirects, header routing, or WAF-like controls, NLB is the wrong tool; use ALB/WAF/CDN patterns.

14. Comparison with Alternatives

In Alibaba Cloud

Application Load Balancer (ALB): L7 HTTP/HTTPS, routing rules, potentially richer observability for web apps.
Classic Load Balancer (CLB): Legacy SLB offering; may be required for older environments but generally less preferred for new designs (verify current guidance).
PrivateLink / Gateway services: For private service exposure rather than public ingress (different problem space).

In other clouds

AWS Network Load Balancer: Similar L4 concept; differences in features, pricing dimensions, and integrations.
Google Cloud TCP/UDP Load Balancing: Similar outcome; different constructs (forwarding rules, backend services).
Azure Standard Load Balancer: L4 load balancing; different operational and diagnostic model.

Self-managed alternatives

HAProxy / Nginx (stream) / Envoy: More customization, but you operate scaling, HA, patching, and failover.
Keepalived (VRRP): Provides VIP failover but still requires careful ops and typically does not scale like managed LB.

Comparison table

Option	Best For	Strengths	Weaknesses	When to Choose
Alibaba Cloud Network Load Balancer (NLB)	TCP/UDP services needing managed HA and scale	Managed L4, health checks, VPC/EIP integration	Limited L7 features; region feature variance	You need L4 load balancing for network apps
Alibaba Cloud Application Load Balancer (ALB)	HTTP/HTTPS applications	L7 routing, better for web ingress patterns	Not designed for generic UDP; more app-layer oriented	You need host/path routing, managed TLS (verify), web features
Alibaba Cloud Classic Load Balancer (CLB)	Legacy compatibility	Mature, widely used historically	Legacy positioning; feature set depends on region	You have existing CLB workloads or must match legacy behavior
AWS NLB / Azure Std LB / GCP TCP/UDP LB	Multi-cloud L4 patterns	Comparable managed L4 capabilities	Different pricing/limits/integration	You’re implementing similar L4 ingress on another cloud
Self-managed HAProxy/Envoy	Custom behaviors and deep control	Maximum customization	Highest ops burden; HA/patching complexity	You must implement custom logic not available in managed LB

15. Real-World Example

Enterprise example: Multi-zone TCP API ingress for regulated platform

Problem: A financial services platform exposes a TCP-based API gateway that must remain available during instance failures and routine patching. Backends must stay private, and changes must be auditable.
Proposed architecture:
Internet-facing NLB with EIP
Backends: ECS instances in two zones in private subnets
Strict backend Security Group rules (only from VPC CIDR or documented LB ranges)
CloudMonitor alarms on health and traffic
Audit trail enabled for all NLB changes
Why NLB was chosen:
L4 pass-through fits the TCP gateway design
Managed health checks and failover reduce operational risk
Expected outcomes:
Improved uptime and faster maintenance windows
Reduced exposure of backend network surfaces
Clear audit history for compliance reviews

Startup/small-team example: Game service with UDP traffic spikes

Problem: A small team runs a multiplayer matchmaking service over UDP. Traffic spikes unpredictably after marketing pushes.
Proposed architecture:
NLB with UDP listener to distribute traffic to a pool of ECS instances
Autoscaling on backend pool (implementation varies—verify best method for your stack)
Basic CloudMonitor alarms for sudden traffic spikes and unhealthy backends
Why NLB was chosen:
UDP support and simplified operations
Ability to scale backends without changing the public endpoint
Expected outcomes:
Fewer outages during spikes
Faster scale-out with minimal platform maintenance

16. FAQ

Is Network Load Balancer (NLB) an L4 or L7 load balancer?
NLB is primarily a Layer 4 (TCP/UDP) load balancer. For HTTP/HTTPS Layer 7 routing, Alibaba Cloud typically positions Application Load Balancer (ALB).
Can NLB terminate TLS/SSL?
Often NLB is used for TLS pass-through (backend terminates TLS). Whether NLB supports TLS termination in your region depends on current product capabilities—verify in official docs.
What are the key building blocks I configure?
You typically configure an NLB instance, one or more listeners, server groups, and health checks.
Does NLB support UDP?
NLB is commonly used for TCP/UDP load balancing, but availability can be region-dependent—verify in official docs for your region.
How do health checks work for TCP?
TCP health checks usually attempt to establish a TCP connection to a backend port. If the connection succeeds consistently, the backend is considered healthy.
Why do I keep seeing responses from the same backend?
Your client may be reusing TCP connections (keep-alive). Try forcing new connections (for example, Connection: close) or running multiple parallel clients.
Should backend ECS instances have public IPs?
For most secure architectures, no. Prefer private backends and expose only the NLB (plus EIP if Internet-facing).
Can I use NLB inside a VPC only?
Yes—create an internal NLB endpoint for private service-to-service traffic.
How do I restrict who can modify NLB configuration?
Use RAM to grant least-privilege permissions and limit production changes to approved roles.
What is the biggest cost risk with NLB?
For public services, Internet egress (data transfer) and EIP charges are common major cost drivers, along with usage-based LB charges.
Does NLB support multi-zone?
Many managed load balancers support multi-zone deployments by selecting vSwitches in multiple zones. Exact support and behavior can vary—verify.
Can I log NLB traffic?
Some load balancer products support access logs to Log Service (SLS). Confirm NLB-specific logging support and setup in your region—verify.
Can I put a CDN in front of NLB?
CDNs are typically HTTP/HTTPS-focused. If your workload is TCP/UDP, CDN may not apply. For HTTP web delivery, consider ALB/CDN patterns instead.
How do I migrate from CLB to NLB?
Commonly: build NLB in parallel, register backends, validate health and traffic, then cut over via DNS or IP migration strategies. Exact steps depend on the legacy setup—verify.
What’s the simplest way to test NLB?
Put two ECS instances behind it with a basic TCP service (like Nginx on port 80), then curl the NLB endpoint repeatedly while checking backend health.

17. Top Online Resources to Learn Network Load Balancer (NLB)

Because Alibaba Cloud documentation URLs can change by version/locale, use the links below as starting points and search within the Help Center for “Network Load Balancer (NLB)” and “pricing” if needed.

Resource Type	Name	Why It Is Useful
Official documentation	Alibaba Cloud Help Center – Server Load Balancer	Entry point for SLB family docs, including NLB/ALB/CLB: https://www.alibabacloud.com/help/en/server-load-balancer
Official product page	Alibaba Cloud – Server Load Balancer (portfolio)	Product overview and positioning: https://www.alibabacloud.com/product/server-load-balancer
Official pricing	Alibaba Cloud Pricing Calculator	Region-aware estimates across services: https://www.alibabacloud.com/pricing/calculator
Official docs (search)	Help Center search for “Network Load Balancer billing”	Fastest way to find the current NLB billing/pricing model in your region: https://www.alibabacloud.com/help
Architecture guidance	Alibaba Cloud Architecture Center	Reference architectures and best practices (search for load balancing patterns): https://www.alibabacloud.com/architecture
Monitoring docs	CloudMonitor documentation	Metrics/alarms patterns for infrastructure: https://www.alibabacloud.com/help/en/cloudmonitor
Logging docs	Log Service (SLS) documentation	If NLB logging is supported, logs often integrate with SLS: https://www.alibabacloud.com/help/en/log-service
IaC tooling	Terraform Alibaba Cloud Provider docs	Automating NLB and related resources (verify NLB resource coverage): https://registry.terraform.io/providers/aliyun/alicloud/latest/docs

18. Training and Certification Providers

Institute	Suitable Audience	Likely Learning Focus	Mode	Website URL
DevOpsSchool.com	DevOps engineers, SREs, cloud engineers	DevOps/cloud fundamentals, operations practices, cloud services overview	Check website	https://www.devopsschool.com/
ScmGalaxy.com	Beginners to intermediate engineers	SCM/DevOps foundations, tooling, basic cloud/ops workflows	Check website	https://www.scmgalaxy.com/
CLoudOpsNow.in	Cloud operations practitioners	CloudOps practices, operations, monitoring, governance	Check website	https://www.cloudopsnow.in/
SreSchool.com	SREs and reliability-focused teams	Reliability engineering, monitoring, incident management	Check website	https://www.sreschool.com/
AiOpsSchool.com	Ops/SRE teams exploring AIOps	Monitoring automation, AIOps concepts, operational analytics	Check website	https://www.aiopsschool.com/

19. Top Trainers

Platform/Site	Likely Specialization	Suitable Audience	Website URL
RajeshKumar.xyz	DevOps/cloud training content (verify offerings)	Learners seeking trainer-led guidance	https://www.rajeshkumar.xyz/
devopstrainer.in	DevOps training platform (verify offerings)	Beginners to intermediate DevOps learners	https://www.devopstrainer.in/
devopsfreelancer.com	DevOps consulting/training marketplace style (verify offerings)	Teams/individuals seeking flexible help	https://www.devopsfreelancer.com/
devopssupport.in	Operational support and training resources (verify offerings)	Ops teams needing practical support	https://www.devopssupport.in/

20. Top Consulting Companies

Company	Likely Service Area	Where They May Help	Consulting Use Case Examples	Website URL
cotocus.com	Cloud/DevOps consulting (verify service catalog)	Architecture reviews, automation, operations improvement	Designing NLB/ALB ingress, VPC segmentation, monitoring setup	https://www.cotocus.com/
DevOpsSchool.com	DevOps consulting and training (verify service catalog)	Platform enablement, DevOps practices, implementation support	Building standardized ingress with NLB, IaC pipelines, operational runbooks	https://www.devopsschool.com/
DEVOPSCONSULTING.IN	DevOps consulting (verify service catalog)	DevOps transformation and implementation support	Migrating from self-managed L4 load balancers to NLB, HA designs	https://www.devopsconsulting.in/

21. Career and Learning Roadmap

What to learn before NLB

Networking fundamentals:
TCP vs UDP, ports, NAT, DNS
CIDR, routing tables, subnets
Alibaba Cloud basics:
VPC, vSwitch, Security Groups
ECS provisioning and SSH access
EIP concepts and public vs private networking
Basic Linux ops:
Systemd, service logs, firewall basics

What to learn after NLB

Layer 7 traffic management:
Alibaba Cloud Application Load Balancer (ALB)
TLS certificate management patterns
Observability:
CloudMonitor dashboards and alarms
Log Service pipelines and retention
Infrastructure as Code:
Terraform for Alibaba Cloud
CI/CD integration and change control
Resilience engineering:
Multi-zone patterns
Chaos testing basics for backend failure

Job roles that use it

Cloud Engineer / Cloud Platform Engineer
DevOps Engineer
SRE
Network Engineer (cloud-focused)
Security Engineer (ingress and exposure control)

Certification path (if available)

Alibaba Cloud offers role-based certifications (availability and names change over time). Look for current Alibaba Cloud certification tracks that include networking and architecture modules. Verify current certification paths on Alibaba Cloud’s official training/certification pages.

Project ideas for practice

Build a two-zone NLB + ECS setup and simulate zonal failure by stopping all backends in one zone.
Create internal NLB for east-west service traffic and validate Security Group restrictions.
Implement Terraform for VPC + ECS + NLB (verify provider support) and enforce naming/tagging standards.
Add CloudMonitor alarms for unhealthy backends and test alert delivery.

22. Glossary

NLB (Network Load Balancer): Managed Layer 4 load balancer for TCP/UDP traffic in Alibaba Cloud.
SLB (Server Load Balancer): Alibaba Cloud’s broader load balancing portfolio/category in the console and docs (often includes CLB/ALB/NLB).
Layer 4 (L4): Transport layer in the OSI model; typically TCP/UDP load balancing.
Layer 7 (L7): Application layer; HTTP/HTTPS routing, header inspection, etc.
Listener: Frontend configuration defining protocol and port for incoming connections.
Server group: Collection of backend targets that receive forwarded traffic.
Health check: Periodic probe that decides whether a backend is eligible to receive traffic.
Backend target: The server receiving traffic (commonly an ECS instance; sometimes ENI/IP depending on product support).
VPC: Virtual Private Cloud; isolated network environment in Alibaba Cloud.
vSwitch: A subnet inside a VPC, tied to a specific zone.
EIP: Elastic IP address used to provide a public endpoint for Internet-facing access.
Security Group: Stateful virtual firewall rules for ECS and related resources.
CloudMonitor: Alibaba Cloud monitoring service for metrics and alerts.
Log Service (SLS): Centralized logging service for ingestion, storage, and querying (integration depends on feature support).

23. Summary

Alibaba Cloud Network Load Balancer (NLB) is a managed Layer 4 load balancer in the Networking and CDN category, used to distribute TCP/UDP traffic across healthy backend targets in your VPC. It matters because it improves availability, enables horizontal scaling, and reduces the operational burden of running self-managed L4 load balancers.

From a cost perspective, plan for NLB resource/usage charges, plus EIP and Internet egress for public services, and consider observability costs (metrics/logs). From a security perspective, use RAM least privilege, keep backends private, and restrict Security Group rules carefully.

Use NLB when you need reliable L4 ingress for TCP/UDP services; choose ALB for L7 web routing. Next, deepen your skills by adding multi-zone design, CloudMonitor alarms, and infrastructure-as-code automation for repeatable deployments.

Category