Category
Container
1. Introduction
Alibaba Cloud Service Mesh (ASM) is a managed service mesh offering for Kubernetes-based microservices, designed to standardize traffic management, security (mTLS), and observability across services—without requiring application code changes in most cases.
In simple terms: you deploy your applications on Kubernetes (typically Alibaba Cloud Container Service for Kubernetes (ACK)), then ASM injects a sidecar proxy next to each workload. Those proxies handle service-to-service communication features like retries, timeouts, canary releases, and encryption—centrally and consistently.
Technically, ASM provides a managed Istio-compatible control plane and integrates it with your Kubernetes clusters. Your applications keep using normal Kubernetes Services, while the mesh uses Envoy sidecars (data plane) and Istio APIs/CRDs (control plane configuration) to enforce policies and routing.
ASM solves common microservices problems: – “How do we do safe canary releases across dozens of services?” – “How do we get consistent mTLS and authorization between services?” – “How do we observe service-to-service traffic with standard metrics, logs, and traces?” – “How do we apply governance without rewriting every application?”
Naming/Status note: The official product name is Alibaba Cloud Service Mesh (ASM). If your account shows different labels or editions, verify in official docs because naming and editions can vary by region and console updates.
2. What is Service Mesh (ASM)?
Official purpose
Service Mesh (ASM) is Alibaba Cloud’s managed service mesh service, intended to help teams run microservices on Kubernetes with consistent: – Traffic management (routing, load balancing rules, circuit breaking patterns) – Security (mTLS, identity-based access control) – Observability (metrics, logs, traces)
ASM is generally positioned as “managed Istio” (Istio-compatible APIs and architecture). Exact supported Istio versions and feature availability can vary—verify in official docs.
Core capabilities
Common, practical capabilities you should expect from ASM (and validate in your region/edition): – Sidecar-based data plane for L7/L4 traffic control (Envoy proxies) – Istio APIs/CRDs for routing and policy configuration – mTLS for service-to-service encryption and identity – Ingress/Egress gateways for north-south traffic patterns (where enabled) – Observability integration (metrics/logging/tracing) via Alibaba Cloud services and/or common OSS tooling (availability depends on edition and integration choices)
Major components (conceptual)
- Control plane (managed by ASM): configuration distribution, certificate management, policy/routing evaluation.
- Data plane (in your clusters): sidecar proxies (Envoy) deployed alongside your pods; optionally gateways.
- Mesh configuration: Istio resources such as
VirtualService,DestinationRule,Gateway,PeerAuthentication, andAuthorizationPolicy(actual CRDs available depend on the ASM-managed Istio profile/version).
Service type
- Managed control plane (SaaS-like) + in-cluster agents/components (Kubernetes add-ons and sidecars).
- You keep ownership of your Kubernetes clusters and workloads; ASM manages the mesh control plane lifecycle depending on your chosen model.
Scope: regional / account / cluster
In practice, ASM is typically: – Region-scoped (you create an ASM instance in a region and attach one or more Kubernetes clusters in that region and network context). – Account-scoped within your Alibaba Cloud account and governed by RAM policies. – Cluster-attached: you explicitly add ACK clusters to the mesh.
Exact scoping (cross-region, cross-VPC, multi-cluster topology) depends on ASM features and your network design—verify in official docs.
How it fits into the Alibaba Cloud ecosystem
ASM usually sits in the “Container + Microservices governance” layer and integrates most often with: – ACK (Container Service for Kubernetes): primary runtime for mesh workloads – VPC: network foundation for clusters and service-to-service traffic – RAM: permissions and access control for operators – SLB/NLB/ALB (product availability varies): ingress exposure and load balancing options – Log Service (SLS): centralized logging (including proxy access logs if enabled) – Application Real-Time Monitoring Service (ARMS): metrics and APM-style visibility (integration depends on setup)
Use ASM when you want standardized governance without running and upgrading Istio yourself.
Official docs starting point (verify current URLs if your region differs): – https://www.alibabacloud.com/help/en/asm/
3. Why use Service Mesh (ASM)?
Business reasons
- Faster, safer releases: canary, blue/green, traffic splitting, and progressive delivery patterns reduce outage risk.
- Standardized governance across teams: central policies apply to many services without per-app changes.
- Reduced platform toil: managed control plane reduces the operational burden compared to self-managed Istio.
Technical reasons
- Consistent L7 traffic behavior: retries, timeouts, header-based routing, fault injection (if supported).
- Service-to-service security: mTLS identity and encryption across microservices.
- Better multi-service debugging: consistent telemetry from proxies improves root-cause analysis.
Operational reasons
- Central configuration: governance via Istio CRDs rather than app config sprawl.
- Unified observability: mesh-level metrics, logs, and traces help SREs correlate incidents.
- Change control: routing policies can be rolled out and reviewed like code (GitOps).
Security/compliance reasons
- Encryption in transit for east-west traffic (mTLS).
- Identity-based policies: allow/deny communication based on workload identities and namespaces.
- Auditability: change history via Kubernetes manifests; plus cloud audit logs where integrated.
Scalability/performance reasons
- Traffic shaping and outlier detection patterns can protect services under load.
- Resilience patterns at the network layer reduce cascading failures (within limits).
When teams should choose it
Choose ASM when: – You run (or are moving to) Kubernetes microservices on ACK – You need consistent traffic control and security across many services – You want managed mesh operations instead of owning Istio upgrades, compatibility, and CVE patching yourself
When teams should not choose it
Avoid (or delay) ASM when: – You have a small number of services and don’t need mesh features (sidecars add complexity and overhead) – You rely heavily on non-HTTP protocols or edge cases where Envoy/Istio behavior may be surprising (validate first) – Your organization is not ready for the operational model (CRDs, debugging proxies, certificate lifecycle) – You cannot tolerate additional latency/CPU/memory overhead from sidecars
4. Where is Service Mesh (ASM) used?
Industries
- E-commerce and retail (high-change, high-traffic services)
- FinTech and banking (mTLS, policy enforcement, audit requirements)
- SaaS platforms (multi-tenant traffic controls and observability)
- Gaming and media (high throughput, resilience patterns)
- Telecom and IoT platforms (large service graphs and operational governance)
Team types
- Platform engineering teams building internal developer platforms on ACK
- SRE/operations teams standardizing telemetry and policies
- DevOps teams implementing progressive delivery
- Security engineering teams enforcing east-west encryption and segmentation
Workloads
- Kubernetes microservices (HTTP/gRPC commonly)
- API backends and service graphs with many dependencies
- Hybrid environments (where supported): multi-cluster designs, shared services, staged migrations
Architectures
- Microservices (service-per-team)
- Event-driven microservices (mesh for synchronous dependencies; event bus separately)
- Multi-cluster active/active or active/passive (validate ASM multi-cluster features)
- Zero-trust internal networks (mesh + network policies)
Real-world deployment contexts
- Production meshes for core business services
- Dev/test meshes for validating routing rules and mTLS
- Migration phases: gradually onboarding namespaces/services to the mesh
5. Top Use Cases and Scenarios
Below are practical scenarios where Alibaba Cloud Service Mesh (ASM) is commonly used.
1) Canary releases with traffic splitting
- Problem: You need to ship a new version without risking full blast radius.
- Why ASM fits: Route a percentage of traffic to v2 while keeping v1 stable, controlled by mesh config.
- Example: 90% to
reviews-v1, 10% toreviews-v2, then ramp up.
2) Blue/green cutover for critical services
- Problem: You want near-instant rollback for risky changes.
- Why ASM fits: Route by header/cookie or switch all traffic between two versions quickly.
- Example: Switch
paymentstraffic from green to blue with one config change.
3) Enforcing mTLS for east-west encryption
- Problem: Internal service calls traverse shared networks and must be encrypted.
- Why ASM fits: Mesh-issued identities and mTLS encrypt service-to-service calls.
- Example: Set namespace policy to require mTLS and block plaintext.
4) Service-to-service authorization (zero trust within cluster)
- Problem: Any service can call any other service by default.
- Why ASM fits: Authorization policies can restrict calls by namespace/service identity.
- Example: Only
frontendcan callcatalog; others denied.
5) Standardized retries/timeouts (resilience)
- Problem: Teams implement retries inconsistently, causing thundering herds or timeouts.
- Why ASM fits: Central traffic policies reduce inconsistency and can be applied per service.
- Example: Add a 2s timeout and limited retries to
inventorycalls.
6) Circuit breaking / outlier handling patterns
- Problem: A failing instance causes repeated errors and cascading failures.
- Why ASM fits: Mesh-level outlier detection can reduce impact (capabilities vary; validate).
- Example: Eject unhealthy pods from load balancing pool after consecutive 5xx errors.
7) Unified observability for microservices
- Problem: Hard to trace cross-service latency and error hotspots.
- Why ASM fits: Sidecars emit consistent metrics and can propagate trace headers (setup required).
- Example: Identify that 80% of checkout latency comes from
recommendations.
8) Safer migrations between services or clusters
- Problem: You’re splitting a monolith into microservices or moving traffic between clusters.
- Why ASM fits: Routing control helps shift traffic gradually while observing behavior.
- Example: Route only a subset of users to the new
user-profileservice.
9) Multi-tenant platform isolation (logical)
- Problem: Different tenants/teams share a cluster but require isolation.
- Why ASM fits: Mesh policies can restrict cross-namespace traffic; combined with Kubernetes RBAC.
- Example: Tenant A namespace cannot call Tenant B namespace services.
10) Controlled egress to external dependencies
- Problem: Services call external APIs; you need visibility and control.
- Why ASM fits: Egress policies/gateways can centralize external access patterns (validate in ASM).
- Example: Force all outbound traffic to go via an egress gateway with logs.
11) Header-based routing for experiments (A/B tests)
- Problem: You want to test features for a subset of users.
- Why ASM fits: Route requests with specific headers to experimental service versions.
- Example: Requests with
x-exp: ongo tosearch-v2.
12) Gradual enabling of security policies
- Problem: Turning on mTLS everywhere breaks legacy clients.
- Why ASM fits: Use permissive mode first, then strict, service-by-service.
- Example: Enable permissive mTLS in a namespace, fix non-mesh clients, then enforce strict.
6. Core Features
Feature availability can vary by ASM edition, region, and supported Istio version. Treat specifics as “check your console + official docs”.
Managed service mesh control plane
- What it does: Runs and manages the service mesh control plane for you.
- Why it matters: Reduces operational overhead (upgrades, HA setup, control plane scaling).
- Practical benefit: Platform teams focus on policies and architecture rather than Istio maintenance.
- Caveats: You still own data plane resource usage and troubleshooting in your clusters.
Sidecar injection and data plane governance
- What it does: Injects Envoy sidecars into pods to intercept and manage traffic.
- Why it matters: Sidecars are how L7 policies get enforced consistently.
- Practical benefit: Uniform traffic telemetry and policy enforcement without code changes.
- Caveats: Adds CPU/memory overhead and potential latency; requires careful resource sizing.
Traffic management via Istio APIs
- What it does: Supports routing rules using Istio CRDs such as:
VirtualService(routing)DestinationRule(subsets/traffic policies)Gateway(ingress configuration)- Why it matters: Enables safe releases and complex routing without modifying apps.
- Practical benefit: Canary, blue/green, header-based routing, fault injection (if supported).
- Caveats: Misconfiguration can cause outages; treat policies as production code.
Service-to-service security (mTLS)
- What it does: Encrypts service-to-service traffic and authenticates workloads.
- Why it matters: Protects data in transit and establishes workload identity.
- Practical benefit: Helps meet internal security and compliance requirements.
- Caveats: Legacy clients, non-mesh workloads, and some protocols may need special handling.
Authorization policies (service-to-service access control)
- What it does: Allows/denies requests based on identity, namespace, workload labels, and request attributes.
- Why it matters: Implements zero-trust patterns inside Kubernetes.
- Practical benefit: Prevent lateral movement and accidental overreach between teams.
- Caveats: Policies must be tested carefully; overly strict rules can break dependencies.
Observability: metrics, logs, traces (mesh telemetry)
- What it does: Sidecars can emit traffic metrics (latency, RPS, error rates), logs, and tracing context.
- Why it matters: Microservices are hard to observe without consistent telemetry.
- Practical benefit: Faster incident triage and dependency mapping.
- Caveats: Telemetry collection can add cost; ensure retention and sampling strategies.
Ingress and egress control (mesh gateways)
- What it does: Central gateways can manage north-south and outbound traffic (depending on setup).
- Why it matters: Standardizes edge behavior and auditing.
- Practical benefit: Unified TLS termination patterns, routing rules, and access logging.
- Caveats: Gateway capacity planning becomes critical; also introduces another hop.
Multi-cluster / mesh expansion (where supported)
- What it does: Attach multiple clusters to one mesh for consistent governance.
- Why it matters: Enables staged migrations and regional resilience designs.
- Practical benefit: Central policy management across clusters.
- Caveats: Multi-cluster networking and DNS/service discovery are complex; verify supported topologies.
7. Architecture and How It Works
High-level architecture
ASM follows the common service mesh model: – A managed control plane distributes configuration and certificates. – Each pod in the mesh runs an Envoy sidecar proxy. – Traffic between services flows through sidecars, which enforce routing and security policies.
Request/data/control flow
- Control flow: You apply configuration (Istio CRDs) to the Kubernetes API server. ASM control plane watches those resources and pushes configuration to sidecars.
- Data flow: Requests go from client pod → client sidecar → network → server sidecar → server pod.
- Security flow: Mesh-issued certs enable mutual authentication; policies determine whether requests are allowed.
Integrations with related services (common patterns)
- ACK: primary Kubernetes runtime and API server for mesh resources.
- VPC: pod-to-pod and service-to-service networking.
- SLB/NLB/ALB: exposes ingress gateway or Kubernetes Ingress (depending on design).
- ARMS / Prometheus-compatible monitoring: metrics and dashboards (implementation options vary).
- SLS: proxy access logs and application logs collection.
Exact integration steps depend on your environment; always validate with the ASM docs for your region.
Dependency services
Typical dependencies you should account for: – ACK cluster(s) – VPC/subnets, security groups – Load balancer resources for ingress gateways (if exposed) – Log/metrics backends for observability
Security/authentication model
- Operator access: Alibaba Cloud RAM users/roles control who can manage ASM and ACK.
- Workload identity: Mesh identities are derived from Kubernetes service accounts and mesh CA certificates (Istio model).
- Encryption in transit: mTLS for east-west traffic (configurable per namespace/workload).
Networking model
- Sidecars intercept traffic via iptables rules in pods (typical sidecar model).
- Service discovery typically relies on Kubernetes Services and DNS.
- For multi-cluster, you must plan:
- network reachability between clusters
- service discovery across clusters
- gateway placement and TLS (Verify ASM’s supported multi-cluster modes.)
Monitoring/logging/governance considerations
- Decide early on:
- Where metrics live (Prometheus, ARMS, managed Prometheus, etc.)
- Where traces go (Jaeger/Zipkin/ARMS APM depending on availability)
- Where access logs go (SLS often)
- Treat mesh configs as code:
- GitOps with review gates
- policy testing in staging meshes
- controlled rollouts
Simple architecture diagram (Mermaid)
flowchart LR
U[User / Client] --> LB[Load Balancer / Ingress]
LB --> IGW[Ingress Gateway (Envoy)]
IGW --> SVC1[Service A Pod]
SVC1 -->|mTLS| SVC2[Service B Pod]
subgraph PodA[Pod: Service A]
AAPP[App Container]
APX[Envoy Sidecar]
AAPP <--> APX
end
subgraph PodB[Pod: Service B]
BAPP[App Container]
BPX[Envoy Sidecar]
BAPP <--> BPX
end
SVC1 --- PodA
SVC2 --- PodB
CP[ASM Managed Control Plane] --> APX
CP --> BPX
Production-style architecture diagram (Mermaid)
flowchart TB
subgraph AlibabaCloud[Alibaba Cloud Account]
subgraph VPC[VPC]
subgraph ACK1[ACK Cluster A]
NS1[Namespace: prod]
GW1[Ingress Gateway Service (LB)]
MS1[Microservices + Sidecars]
OBS1[Telemetry Collectors (optional)]
end
subgraph ACK2[ACK Cluster B]
NS2[Namespace: prod]
GW2[East-West / Ingress Gateway (optional)]
MS2[Microservices + Sidecars]
end
SLS[(Log Service - SLS)]
ARMS[(ARMS / Monitoring Backend)]
LBEXT[Public/Private Load Balancer]
end
ASMCP[Service Mesh (ASM)\nManaged Control Plane]
RAM[RAM (IAM)]
ActionTrail[ActionTrail (Audit)]
end
Internet((Internet)) --> LBEXT --> GW1
GW1 --> MS1
MS1 <--> MS2
ASMCP --> MS1
ASMCP --> MS2
MS1 --> SLS
MS1 --> ARMS
MS2 --> SLS
MS2 --> ARMS
RAM --> ASMCP
RAM --> ACK1
RAM --> ACK2
ActionTrail --> AlibabaCloud
8. Prerequisites
Account and billing
- An active Alibaba Cloud account with billing enabled (Pay-as-you-go or subscription depending on your purchasing model).
- Ensure your account can create:
- ACK clusters
- ASM instances
- VPC resources
- Load balancers (if you expose ingress)
Permissions / IAM (RAM)
You’ll typically need RAM permissions for: – Creating and managing ACK clusters and Kubernetes resources – Creating and managing ASM – Managing dependent networking and observability services
Alibaba Cloud provides managed policies for many services (names can change). Common examples include “full access” policies for ACK and ASM, but verify exact policy names in official docs: – ASM admin permissions (verify) – ACK/Container Service admin permissions (verify) – VPC and SLB admin permissions (verify) – SLS/ARMS permissions if you integrate observability
Best practice: use least privilege and separate operator roles (platform, security, app teams).
Tools
kubectl(required)- Access to your ACK cluster kubeconfig (via ACK console)
- Optional (depending on ASM workflow):
istioctlfor diagnostics (verify if ASM supports/endorses a specific version)- Helm (if you deploy add-ons)
- A terminal with network reachability to the ACK API endpoint (public or via VPN/bastion depending on cluster exposure)
Region availability
ASM is not necessarily available in every Alibaba Cloud region, and features can vary. Pick a region where both ACK and ASM are supported. Verify in official docs/console.
Quotas / limits
Potential limits you should check before production: – Number of clusters attachable to one ASM instance – Number of namespaces or workloads in the mesh – Control plane capacity limits by edition – Load balancer quotas (often a real constraint) – VPC and EIP quotas
Check: – Alibaba Cloud Quotas Center (if applicable) – ASM product documentation for mesh limits
Prerequisite services
- ACK cluster (managed Kubernetes)
- VPC, vSwitches (subnets), security groups
- Container registry (optional, if you build your own images)
9. Pricing / Cost
ASM pricing can vary by region, edition, and commercial model. Do not rely on fixed numbers from third parties; use official pricing pages and your console quote.
Current pricing model (how to think about it)
Typically, expect cost in these buckets:
1) ASM service fee – Often billed per ASM instance and/or by spec/edition (control plane capacity, HA, features). – Some features (multi-cluster, advanced governance, etc.) may affect price depending on edition. – Verify pricing dimensions on the official ASM pricing page.
2) ACK cluster costs – ACK clusters have their own pricing model (cluster management fee depending on type/edition) plus worker node compute.
3) Compute overhead for sidecars – Every meshed pod includes an Envoy sidecar, increasing: – CPU usage – memory usage – network overhead – This increases ECS node sizing requirements or serverless billing.
4) Load balancer costs – Ingress gateways typically require an SLB/NLB/ALB. These can be significant in production.
5) Observability costs – Metrics ingestion/retention (ARMS or managed Prometheus) – Log storage and indexing (SLS) – Tracing (if stored and queried)
6) Network and data transfer – Cross-zone or cross-region traffic (if used) may add cost. – Egress to the Internet is often billed.
Free tier
ASM may or may not have a free tier, trial, or promotional credits depending on region/time. Verify in official pricing.
Hidden/indirect costs to plan for
- Higher Kubernetes node count due to sidecar overhead
- Additional load balancers for gateways
- Increased log volume from proxy access logs
- Operational time for policy design, testing, and incident response training
Cost optimization tips
- Mesh only what you need: start with a few namespaces/services.
- Right-size sidecar resources (requests/limits) to avoid node waste.
- Turn on access logging selectively (or sample).
- Use staging meshes for policy testing to reduce production mistakes.
- Minimize cross-zone traffic where possible; keep chatty services co-located.
- Prefer fewer gateways with adequate autoscaling rather than many underutilized gateways.
Example low-cost starter estimate (conceptual)
A lab environment commonly includes: – 1 small ACK cluster (minimal worker nodes) – 1 ASM instance (smallest supported spec) – 1 load balancer (only if you need inbound access) – Minimal telemetry retention
Because pricing is region/edition-specific, estimate using official tools: – ASM pricing page: https://www.alibabacloud.com/product/asm (then follow pricing links in-console if needed) – Alibaba Cloud Pricing Calculator: https://www.alibabacloud.com/pricing/calculator (verify availability for ASM in your region)
Example production cost considerations
For production you should model: – Peak pod count × sidecar overhead – HA gateways (multiple replicas) and load balancers – Telemetry ingestion at peak RPS – Multi-cluster interconnect costs (if applicable) – Long retention for audit/compliance logs
10. Step-by-Step Hands-On Tutorial
This lab focuses on a safe, real workflow: create an ACK cluster, create an ASM instance, onboard a namespace, deploy two service versions, and do a canary traffic split using Istio resources.
Important: Console steps can change. Use this tutorial as a practical guide, and follow the ASM “Getting Started” doc for your region when a console field differs.
Objective
Deploy a simple microservices demo on ACK, onboard it into Alibaba Cloud Service Mesh (ASM), and perform a canary release using mesh traffic routing—then validate and clean up.
Lab Overview
You will:
1. Create or use an ACK cluster
2. Create an ASM instance and attach the ACK cluster
3. Enable sidecar injection for a namespace
4. Deploy two versions of a service (v1 and v2)
5. Route traffic by weight (90/10, then 50/50)
6. Optionally enforce mTLS (permissive → strict)
7. Validate, troubleshoot, and clean up
Step 1: Create (or reuse) an ACK cluster
Goal: Have a running Kubernetes cluster where you can deploy workloads.
1) In the Alibaba Cloud console, go to: – Container Service for Kubernetes (ACK)
2) Create a cluster: – Choose a region where ASM is available – Choose a small, cost-effective node type for a lab – Ensure kubectl access is possible (public API endpoint or private access via VPN/bastion)
3) Download kubeconfig: – In ACK cluster details → Connection Information → download kubeconfig – Configure local access:
export KUBECONFIG=~/Downloads/kubeconfig
kubectl version --short
kubectl get nodes
Expected outcome: kubectl get nodes shows Ready nodes.
Step 2: Create an ASM instance and attach the ACK cluster
Goal: Create a managed mesh and connect your cluster to it.
1) In the Alibaba Cloud console, go to: – Service Mesh (ASM)
2) Create an ASM instance: – Choose the same region as the ACK cluster (for simplest setup) – Select an edition/spec appropriate for a lab – Confirm networking requirements (VPC) and any prerequisites shown by the console
3) Attach / add the ACK cluster to the mesh: – In the ASM instance → Cluster Management (or similar) – Add the ACK cluster
4) Wait until the cluster status becomes “attached/managed/ready” (wording varies).
Expected outcome: ASM shows your ACK cluster as connected and healthy.
Verification: – In your cluster, list Istio-related namespaces and CRDs (names can vary):
kubectl get ns
kubectl get crd | grep -E 'istio|networking.istio|security.istio' || true
If CRDs aren’t visible yet, wait a few minutes—ASM may install components asynchronously.
Step 3: Enable sidecar injection for a namespace
Goal: Configure a namespace so workloads get Envoy sidecars automatically.
1) Create a namespace:
kubectl create namespace asm-demo
2) Enable automatic injection. In Istio, this is often done by labeling the namespace. Common label:
kubectl label namespace asm-demo istio-injection=enabled --overwrite
Note: Some managed meshes use a different label (for example
asm-injection=enabled) or a revision-based label. Verify the correct label in ASM docs/console.
3) Confirm the label:
kubectl get namespace asm-demo --show-labels
Expected outcome: Namespace shows an injection label set to enabled.
Step 4: Deploy a simple “two versions” service
We’ll deploy:
– hello-v1 that returns “v1”
– hello-v2 that returns “v2”
– A curl client pod to generate traffic
We’ll use hashicorp/http-echo for simplicity.
4.1 Deploy services and deployments
Apply the following manifest:
cat <<'EOF' | kubectl apply -n asm-demo -f -
apiVersion: v1
kind: Service
metadata:
name: hello
labels:
app: hello
spec:
ports:
- name: http
port: 80
targetPort: 5678
selector:
app: hello
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: hello-v1
spec:
replicas: 1
selector:
matchLabels:
app: hello
version: v1
template:
metadata:
labels:
app: hello
version: v1
spec:
containers:
- name: app
image: hashicorp/http-echo:1.0
args: ["-text=hello from v1"]
ports:
- containerPort: 5678
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: hello-v2
spec:
replicas: 1
selector:
matchLabels:
app: hello
version: v2
template:
metadata:
labels:
app: hello
version: v2
spec:
containers:
- name: app
image: hashicorp/http-echo:1.0
args: ["-text=hello from v2"]
ports:
- containerPort: 5678
EOF
4.2 Deploy a client pod
cat <<'EOF' | kubectl apply -n asm-demo -f -
apiVersion: v1
kind: Pod
metadata:
name: curl
labels:
app: curl
spec:
containers:
- name: curl
image: curlimages/curl:8.5.0
command: ["sleep", "3650d"]
EOF
4.3 Verify sidecar injection worked
Check pods:
kubectl get pods -n asm-demo -o wide
Then inspect one pod to see if it has 2 containers (app + sidecar):
kubectl get pod -n asm-demo -l app=hello,version=v1 -o jsonpath='{.items[0].spec.containers[*].name}'; echo
kubectl get pod -n asm-demo -l app=hello,version=v2 -o jsonpath='{.items[0].spec.containers[*].name}'; echo
Expected outcome: You should see something like app istio-proxy (name may differ but usually istio-proxy).
If you only see app, injection is not enabled—see Troubleshooting.
Step 5: Generate traffic and observe baseline behavior
Exec into the curl pod and call the service multiple times.
kubectl exec -n asm-demo -it curl -- sh
Inside the pod:
for i in $(seq 1 10); do
curl -s http://hello/ ; echo
done
Expected outcome: You’ll likely see a mix of “hello from v1” and “hello from v2” because Kubernetes Service load balancing is round-robin across endpoints.
Exit the pod:
exit
Step 6: Apply ASM/Istio routing rules (canary 90/10)
Now we’ll control traffic using Istio resources. This assumes ASM supports standard Istio CRDs.
6.1 Create subsets via DestinationRule
cat <<'EOF' | kubectl apply -n asm-demo -f -
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: hello
spec:
host: hello
subsets:
- name: v1
labels:
version: v1
- name: v2
labels:
version: v2
EOF
6.2 Create VirtualService for weighted routing
cat <<'EOF' | kubectl apply -n asm-demo -f -
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: hello
spec:
hosts:
- hello
http:
- route:
- destination:
host: hello
subset: v1
weight: 90
- destination:
host: hello
subset: v2
weight: 10
EOF
Expected outcome: 90% of calls go to v1, 10% to v2.
6.3 Verify traffic split
Run 50 requests:
kubectl exec -n asm-demo curl -- sh -c 'for i in $(seq 1 50); do curl -s http://hello/ ; echo; done' | sort | uniq -c
Expected outcome: Counts should roughly reflect 90/10 split (not exact due to randomness and small sample size).
Step 7: Shift traffic to 50/50, then 0/100
Update the VirtualService to 50/50:
cat <<'EOF' | kubectl apply -n asm-demo -f -
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: hello
spec:
hosts:
- hello
http:
- route:
- destination:
host: hello
subset: v1
weight: 50
- destination:
host: hello
subset: v2
weight: 50
EOF
Re-check:
kubectl exec -n asm-demo curl -- sh -c 'for i in $(seq 1 50); do curl -s http://hello/ ; echo; done' | sort | uniq -c
Then go all-in on v2 (0/100):
cat <<'EOF' | kubectl apply -n asm-demo -f -
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: hello
spec:
hosts:
- hello
http:
- route:
- destination:
host: hello
subset: v1
weight: 0
- destination:
host: hello
subset: v2
weight: 100
EOF
Expected outcome: All responses should be “hello from v2”.
Step 8 (Optional): Enable mTLS (permissive → strict)
mTLS resources and API versions depend on the Istio version ASM is running. The following is a common approach in Istio:
8.1 Start with PERMISSIVE
This allows both plaintext and mTLS while you validate compatibility.
cat <<'EOF' | kubectl apply -n asm-demo -f -
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
spec:
mtls:
mode: PERMISSIVE
EOF
Expected outcome: Traffic should continue working.
8.2 Switch to STRICT
This requires mTLS for in-mesh traffic.
cat <<'EOF' | kubectl apply -n asm-demo -f -
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
spec:
mtls:
mode: STRICT
EOF
Expected outcome: Calls from meshed workloads to meshed workloads continue to work. Calls from non-meshed clients may fail.
Validate again:
kubectl exec -n asm-demo curl -- curl -sS http://hello/
If it fails, revert to permissive while you troubleshoot.
If
PeerAuthenticationCRD is not found, your ASM version/profile may differ. Verify supported security APIs in ASM docs.
Validation
Run these checks:
1) Sidecars present:
kubectl get pod -n asm-demo -l app=hello,version=v1 -o jsonpath='{.items[0].spec.containers[*].name}'; echo
kubectl get pod -n asm-demo -l app=hello,version=v2 -o jsonpath='{.items[0].spec.containers[*].name}'; echo
2) Istio resources applied:
kubectl get virtualservice -n asm-demo
kubectl get destinationrule -n asm-demo
kubectl get peerauthentication -n asm-demo 2>/dev/null || true
3) Traffic behavior matches weights:
kubectl exec -n asm-demo curl -- sh -c 'for i in $(seq 1 30); do curl -s http://hello/ ; echo; done' | sort | uniq -c
Troubleshooting
Problem: Sidecar not injected
Symptoms
– Pods only have one container (app)
– Mesh routing rules don’t affect traffic
Fixes – Confirm namespace label:
kubectl get ns asm-demo --show-labels
- If you added the label after pods were created, restart deployments:
kubectl rollout restart deployment -n asm-demo hello-v1 hello-v2
- Verify your ASM uses a different injection label or revision model. Check ASM docs/console.
Problem: Istio CRDs not found
Symptoms – Applying VirtualService/DestinationRule fails with “no matches for kind …”
Fixes – Confirm cluster is successfully attached to ASM and components are installed. – Wait and retry (initial installation can take time). – In managed environments, CRD versions may differ. Check which API versions exist:
kubectl get crd | grep istio
- Use the CRD’s supported API version (for example
v1alpha3vsv1beta1) if required—verify in your cluster.
Problem: Traffic split doesn’t seem to work
Symptoms – Always gets v1 or always v2
Fixes – Confirm both deployments have Ready pods and endpoints:
kubectl get deploy -n asm-demo
kubectl get endpoints -n asm-demo hello -o yaml | sed -n '1,120p'
- Confirm labels match subsets exactly (
version: v1,version: v2) - Ensure you are calling the correct host (
helloinside the namespace)
Problem: mTLS strict breaks traffic
Symptoms – Curl fails after enabling STRICT
Fixes – Ensure client and server are both in-mesh (both have sidecars) – Revert to permissive while investigating:
kubectl apply -n asm-demo -f - <<'EOF'
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
spec:
mtls:
mode: PERMISSIVE
EOF
- Check for non-mesh workloads calling into the namespace; they will fail in STRICT unless you provide exceptions/gateways.
Cleanup
To avoid ongoing charges, clean up in this order.
1) Delete demo namespace resources:
kubectl delete namespace asm-demo
2) In ASM console: – Detach/remove the ACK cluster from the ASM instance (workflow name varies)
3) Delete the ASM instance: – Ensure it is not attached to any clusters – Delete the instance from the ASM console
4) Delete ACK cluster if you created it for the lab: – In ACK console → delete cluster – Ensure worker nodes, load balancers, and EIPs are released
5) Verify you have no leftover billable resources: – Load balancers – EIPs/NAT gateways – SLS projects with high retention – Monitoring instances
11. Best Practices
Architecture best practices
- Start with a single cluster + single mesh before multi-cluster.
- Onboard services incrementally by namespace.
- Standardize labels (
app,version,team,env) to make routing/policy manageable. - Prefer service boundaries aligned with team ownership to reduce policy complexity.
IAM/security best practices
- Use RAM roles for automation (CI/CD) and separate them from human admin roles.
- Limit who can apply mesh-wide policies (ClusterRole for Istio CRDs).
- Use GitOps + review for mesh policy changes.
Cost best practices
- Measure sidecar overhead early (CPU/memory) and adjust node sizing.
- Avoid enabling verbose access logs cluster-wide by default.
- Consolidate gateways where possible and autoscale based on metrics.
- Keep observability retention aligned to needs (7–14 days for high-volume logs unless compliance requires more).
Performance best practices
- Define sane timeouts and retry budgets to avoid retry storms.
- Use connection pooling and keepalives carefully (validate with your protocol behavior).
- Benchmark latency impact of sidecars on critical paths.
- Isolate gateway workloads on dedicated nodes if needed.
Reliability best practices
- Test policies in staging with realistic traffic before production.
- Implement progressive delivery with automated rollback criteria.
- Avoid “big bang” mTLS strict across all namespaces; migrate gradually.
Operations best practices
- Define runbooks for:
- sidecar injection issues
- mesh policy rollback
- gateway overload
- certificate and mTLS debugging
- Track mesh version compatibility with ACK upgrades.
- Keep a “break glass” process: ability to disable injection or remove routing rules in an incident.
Governance/tagging/naming best practices
- Name Istio resources predictably:
vs-<service>-<purpose>dr-<service>- Use Kubernetes annotations/labels for ownership:
owner,cost-center,data-classification- Document approved routing patterns (canary templates) to reduce errors.
12. Security Considerations
Identity and access model
- Operator identity: RAM users/roles; use MFA and least privilege.
- Kubernetes RBAC: controls who can deploy and configure mesh resources.
- Workload identity: typically based on Kubernetes service accounts; used for mTLS and policy decisions.
Encryption
- In transit (east-west): mTLS between sidecars when enabled.
- North-south TLS: typically terminated at ingress gateway or load balancer; decide where TLS should terminate based on compliance and observability needs.
- At rest: depends on your storage backends (SLS, disk encryption for nodes, etc.). ASM itself focuses on traffic governance, not data-at-rest encryption.
Network exposure
- Minimize public exposure:
- Use private load balancers for internal meshes when possible
- Restrict security group rules to known sources
- Use network segmentation:
- separate VPCs/environments (dev/stage/prod)
- Kubernetes NetworkPolicies (mesh is not a replacement)
Secrets handling
- Avoid embedding credentials in mesh configs.
- Use Kubernetes Secrets with encryption at rest where available, or integrate a secrets manager.
- Restrict who can read Secrets and who can exec into pods.
Audit/logging
- Use ActionTrail (Alibaba Cloud audit logging) for control-plane API actions (verify coverage for ASM actions).
- Log changes to mesh resources via Git and Kubernetes audit logs (if enabled).
- Store gateway/proxy access logs in SLS with appropriate retention.
Compliance considerations
- mTLS helps with encryption in transit requirements.
- Policy enforcement supports segmentation and least privilege.
- Ensure logs do not capture sensitive headers/body content; sanitize as needed.
Common security mistakes
- Enabling STRICT mTLS without inventorying non-mesh callers
- Overly permissive authorization policies (“allow all”)
- Exposing gateways publicly without WAF/rate-limiting controls
- Forgetting to restrict admin access to Istio CRDs (anyone can reroute production traffic)
Secure deployment recommendations
- Start with PERMISSIVE, then move to STRICT namespace by namespace.
- Use AuthorizationPolicy to implement least privilege between services.
- Protect gateways with:
- WAF/API gateway (if needed)
- rate limiting (where supported)
- strict TLS policies
- Ensure you have a rollback path for mesh configs.
13. Limitations and Gotchas
Because ASM is a managed service mesh, expect both mesh-level and managed-service constraints.
Known limitations (common in managed meshes)
- Sidecar overhead: increases pod resource usage and node cost.
- Complex debugging: failures may involve app + proxy + policy layers.
- Protocol edge cases: some protocols require specific configuration; validate for gRPC, WebSockets, and non-HTTP traffic.
- CRD/API version mismatch: your manifests must match the Istio API versions installed by ASM.
Quotas and scaling constraints
- Limits on number of clusters per mesh instance (verify)
- Limits on number of proxies/workloads or config size (verify)
- Load balancer quotas and bandwidth caps can become bottlenecks
Regional constraints
- ASM availability and features vary by region.
- Some observability integrations may be region-dependent.
Pricing surprises
- Log volume from access logs and high-cardinality metrics
- Multiple load balancers for gateways across environments
- Additional compute from sidecars and gateways
Compatibility issues
- ACK cluster versions vs ASM-supported Istio versions
- CNI plugins and network policies interaction (validate your CNI setup)
- Ingress controller interplay: using Kubernetes Ingress vs Istio Gateway patterns can cause confusion
Operational gotchas
- Injection labels applied after deployment require restarts.
- Traffic routing changes can have immediate blast radius—use change control.
- mTLS strict can break:
- non-mesh pods calling services
- external clients without gateways
- Misconfigured retries can amplify outages.
Migration challenges
- Migrating from self-managed Istio to ASM requires careful planning:
- CRD compatibility
- certificate/identity changes
- gateway replacement
- Plan a staged cutover: mirror traffic, then gradually shift.
Vendor-specific nuances
- Console workflows and managed components may differ from upstream Istio guides.
- Some features may be offered via Alibaba Cloud integrations rather than upstream components—verify in official docs.
14. Comparison with Alternatives
Options to consider
- Self-managed Istio on ACK
- Alibaba Cloud MSE (Microservices Engine) governance features (not a service mesh; more app/runtime governance—validate fit)
- Kubernetes Ingress + service-level libraries (no mesh)
- Other clouds’ meshes: AWS App Mesh, Google Anthos Service Mesh, etc.
- Open-source alternatives: Linkerd (self-managed), Consul service mesh (self-managed)
| Option | Best For | Strengths | Weaknesses | When to Choose |
|---|---|---|---|---|
| Alibaba Cloud Service Mesh (ASM) | Teams on ACK wanting managed mesh | Managed control plane; Istio-compatible governance; integrates with Alibaba Cloud | Sidecar overhead; managed constraints; feature variance by region/edition | You want Istio-style mesh without running it yourself |
| Self-managed Istio on ACK | Deep customization and full control | Full control over versions and addons; can customize heavily | High ops burden (upgrades, CVEs, HA); requires expertise | You need features not exposed in ASM or strict control over versions |
| ACK + Ingress (no mesh) | Smaller systems, fewer services | Simpler; less overhead; familiar Kubernetes model | No consistent east-west governance; limited L7 control between services | Early-stage microservices or low complexity environments |
| Alibaba Cloud MSE (governance, registry, gateway) | App-level microservice governance (depending on modules) | Strong governance patterns for microservices ecosystems (validate) | Not a service mesh; may not provide uniform proxy-based control | When you need registry/config/governance rather than mesh sidecars |
| AWS App Mesh / GCP Anthos Service Mesh | Workloads in those clouds | Native integration with their ecosystems | Not applicable if you’re standardizing on Alibaba Cloud | Multi-cloud teams standardizing per-cloud services |
| Linkerd / Consul (self-managed) | Lightweight mesh or alternative feature sets | Potentially simpler (Linkerd) or integrated service registry (Consul) | Still self-managed; migration complexity; different APIs | If you prefer non-Istio mesh design and accept self-ops |
15. Real-World Example
Enterprise example: Regional e-commerce platform
- Problem: Hundreds of microservices on ACK, frequent releases, inconsistent retries/timeouts, security team mandates encryption-in-transit and service segmentation.
- Proposed architecture:
- One ASM instance per environment (prod/stage), attached to multiple ACK clusters per region
- Ingress gateways behind private/public load balancers
- mTLS enabled gradually; AuthorizationPolicies restrict service access
- Centralized telemetry to SLS + ARMS; dashboards for golden signals per service
- Why ASM was chosen:
- Managed control plane reduces operational burden vs self-managed Istio
- Istio CRDs enable progressive delivery across service graph
- Security policies enforce internal zero-trust without rewriting apps
- Expected outcomes:
- Reduced incident frequency during releases via canary rollouts
- Improved MTTR due to consistent service telemetry
- Better compliance posture via mTLS and auditable policies
Startup/small-team example: SaaS API platform
- Problem: A small team runs 15 microservices on ACK; they need canary releases and better tracing without building custom libraries.
- Proposed architecture:
- Single ASM instance attached to one ACK cluster
- Mesh only critical namespaces initially
- Use VirtualService/DestinationRule for canary
- Keep observability lightweight: minimal logs, sampled tracing
- Why ASM was chosen:
- Quick governance capabilities with minimal platform engineering headcount
- Standard patterns for traffic splitting and timeouts
- Expected outcomes:
- Faster deployments with controlled risk
- Actionable visibility into service latency bottlenecks
- Clear growth path to stricter security later
16. FAQ
1) Is Alibaba Cloud Service Mesh (ASM) the same as Istio?
ASM is a managed service mesh offering that is typically Istio-compatible. You use Istio-style CRDs for policies, but the control plane lifecycle is managed by Alibaba Cloud. Exact versions/features can differ—verify in ASM docs.
2) Do I need to change application code to use ASM?
Often no. Many features (routing, mTLS, telemetry) work via sidecars. Some advanced tracing or auth flows may require app configuration.
3) What is the main tradeoff of using ASM?
Operational simplicity and consistent governance vs added runtime overhead (sidecars) and policy complexity.
4) Does ASM work only with ACK?
ASM is primarily used with ACK in Alibaba Cloud. If additional cluster types are supported, verify in official docs.
5) How do I onboard services gradually?
Common approach: enable injection per namespace, then roll out workloads in that namespace. Start with non-critical services.
6) What happens if I misconfigure a VirtualService?
You can reroute or break production traffic. Use staging validation, code review, and small incremental changes.
7) How much latency does a sidecar add?
It depends on workload, traffic rate, and configuration. Measure in your environment; expect some overhead.
8) Can I do canary releases without ASM?
Yes (e.g., with Ingress controllers or application-level routing), but ASM provides consistent service-to-service routing and observability.
9) Does ASM provide a built-in dashboard like Kiali?
Some meshes provide dashboards or integrations; availability varies. Verify ASM’s current observability tooling and integrations.
10) How do I control which pods get sidecars?
Typically by labeling namespaces for injection, or using pod annotations. ASM may support revision-based injection—verify.
11) Can I enforce mTLS for only one namespace?
Yes in typical Istio models via PeerAuthentication in that namespace, but confirm with your ASM-supported API versions.
12) How do I prevent one team’s services from calling another team’s services?
Use AuthorizationPolicies (mesh) plus Kubernetes RBAC/NetworkPolicies for defense in depth.
13) Is ASM suitable for stateful workloads?
It can be, but sidecar overhead and traffic patterns matter. Validate performance for databases and stateful services carefully.
14) What’s the difference between ingress gateway and Kubernetes Ingress?
Kubernetes Ingress is a generic Kubernetes resource typically implemented by an ingress controller. Istio Gateway is mesh-specific and integrates with mesh routing policies. You can use either depending on design.
15) How do I estimate cost impact before adopting ASM?
Model: ASM service fee + ACK costs + sidecar overhead + load balancers + observability ingestion/storage + network egress. Use the official pricing calculator and run a small pilot.
16) Can I run multiple meshes in one cluster?
Some Istio deployments support revision-based or multi-control-plane setups; managed ASM support varies. Verify in official docs.
17) What is the recommended way to roll back a failed canary?
Update the VirtualService weights back to 100% old version, or remove the VirtualService to revert to Kubernetes default load balancing.
17. Top Online Resources to Learn Service Mesh (ASM)
| Resource Type | Name | Why It Is Useful |
|---|---|---|
| Official documentation | Alibaba Cloud ASM Documentation — https://www.alibabacloud.com/help/en/asm/ | Canonical feature set, setup steps, API compatibility notes |
| Official product page | Alibaba Cloud Service Mesh (ASM) — https://www.alibabacloud.com/product/asm | High-level overview and entry points to pricing/docs |
| Official pricing | ASM pricing (via product/pricing page; region-dependent) — start at https://www.alibabacloud.com/product/asm | Explains billing dimensions; avoids outdated third-party numbers |
| Pricing calculator | Alibaba Cloud Pricing Calculator — https://www.alibabacloud.com/pricing/calculator | Build region-specific estimates including dependent services |
| Kubernetes service | ACK Documentation — https://www.alibabacloud.com/help/en/ack/ | Cluster creation, kubeconfig, networking fundamentals |
| Networking | VPC Documentation — https://www.alibabacloud.com/help/en/vpc/ | Required for multi-cluster design and secure connectivity |
| Observability | Log Service (SLS) Docs — https://www.alibabacloud.com/help/en/sls/ | Centralized log collection and retention planning |
| Observability | ARMS Docs — https://www.alibabacloud.com/help/en/arms/ | Monitoring/APM options for mesh telemetry (verify integrations) |
| Upstream reference | Istio Documentation — https://istio.io/latest/docs/ | Understanding CRDs and mesh concepts (adapt to ASM-supported versions) |
| Hands-on examples | Istio Samples — https://github.com/istio/istio/tree/master/samples | Sample apps and routing examples; use as conceptual guidance |
18. Training and Certification Providers
| Institute | Suitable Audience | Likely Learning Focus | Mode | Website URL |
|---|---|---|---|---|
| DevOpsSchool.com | DevOps engineers, SREs, platform teams | Kubernetes, service mesh concepts, cloud DevOps practices | Check website | https://www.devopsschool.com/ |
| ScmGalaxy.com | Beginners to intermediate engineers | DevOps fundamentals, tooling, CI/CD, cloud basics | Check website | https://www.scmgalaxy.com/ |
| CLoudOpsNow.in | Cloud operations engineers | Cloud operations, monitoring, incident response, platform ops | Check website | https://www.cloudopsnow.in/ |
| SreSchool.com | SREs and operations teams | SRE principles, reliability engineering, observability | Check website | https://www.sreschool.com/ |
| AiOpsSchool.com | Ops and monitoring engineers | AIOps concepts, monitoring automation, analytics | Check website | https://www.aiopsschool.com/ |
19. Top Trainers
| Platform/Site | Likely Specialization | Suitable Audience | Website URL |
|---|---|---|---|
| RajeshKumar.xyz | DevOps/Kubernetes training content (verify offerings) | Engineers seeking guided learning | https://rajeshkumar.xyz/ |
| devopstrainer.in | DevOps and container training (verify offerings) | Beginners to advanced DevOps learners | https://www.devopstrainer.in/ |
| devopsfreelancer.com | DevOps freelance/training services (verify offerings) | Teams needing short-term expertise | https://www.devopsfreelancer.com/ |
| devopssupport.in | DevOps support and enablement (verify offerings) | Ops teams needing troubleshooting help | https://www.devopssupport.in/ |
20. Top Consulting Companies
| Company Name | Likely Service Area | Where They May Help | Consulting Use Case Examples | Website URL |
|---|---|---|---|---|
| cotocus.com | DevOps, cloud, platform engineering (verify portfolio) | Platform design, Kubernetes operations, delivery pipelines | Mesh adoption planning, GitOps for Istio CRDs, production readiness reviews | https://cotocus.com/ |
| DevOpsSchool.com | DevOps consulting and enablement (verify offerings) | Training + consulting for DevOps and containers | ASM pilot, mesh governance patterns, SRE runbooks and rollout strategy | https://www.devopsschool.com/ |
| DEVOPSCONSULTING.IN | DevOps consulting (verify portfolio) | DevOps transformation and operations | Observability integration, cost optimization, incident response processes for meshed apps | https://www.devopsconsulting.in/ |
21. Career and Learning Roadmap
What to learn before ASM
- Kubernetes fundamentals:
- Pods, Deployments, Services, Ingress
- namespaces, labels/selectors
- RBAC, ConfigMaps/Secrets
- Networking basics:
- L4 vs L7, DNS, TLS/mTLS
- load balancing concepts
- Observability basics:
- metrics (RED/USE), logs, tracing
- Alibaba Cloud essentials:
- VPC concepts, security groups
- ACK cluster lifecycle
- RAM permissions
What to learn after ASM
- Advanced mesh security:
- Authorization policies, JWT validation patterns (if supported)
- defense in depth with NetworkPolicies
- Progressive delivery at scale:
- automated canary analysis
- service-level objectives (SLOs)
- Multi-cluster architecture:
- service discovery patterns
- traffic failover strategies
- Platform engineering practices:
- golden path templates for teams
- policy-as-code and GitOps workflows
Job roles that use it
- Cloud/Platform Engineer
- DevOps Engineer
- Site Reliability Engineer (SRE)
- Kubernetes Administrator
- Security Engineer (cloud-native)
- Solutions Architect (microservices modernization)
Certification path
Alibaba Cloud certifications and tracks can change. If you want a certification-oriented path:
– Start with Alibaba Cloud foundational certs (if available)
– Add Kubernetes certifications (e.g., CKA/CKAD) for transferable skills
– Build a portfolio of mesh projects and incident learnings
For official Alibaba Cloud certification availability, verify on Alibaba Cloud certification pages:
– https://edu.alibabacloud.com/ (verify certification listings)
Project ideas for practice
- Create a “policy library” repo: standard VirtualService/DestinationRule templates
- Implement namespace-by-namespace mTLS rollout plan with rollback steps
- Build a mesh observability dashboard: latency p95, error rate, top dependencies
- Run a chaos test: introduce fault injection (if supported) and observe blast radius
- Cost study: measure sidecar overhead across 10 services and propose right-sizing
22. Glossary
- ACK: Alibaba Cloud Container Service for Kubernetes.
- ASM (Service Mesh): Alibaba Cloud managed service mesh offering (Istio-compatible model).
- Service Mesh: Infrastructure layer that manages service-to-service communication.
- Control plane: The component that distributes configuration and manages certificates/policies.
- Data plane: The proxies (sidecars/gateways) that handle traffic and enforce policies.
- Sidecar: A proxy container running alongside an app container in the same pod.
- Envoy: A popular proxy used as the data plane in many Istio-based meshes.
- mTLS: Mutual TLS; both client and server authenticate each other and encrypt traffic.
- VirtualService: Istio resource that defines routing rules (weights, matches, rewrites).
- DestinationRule: Istio resource that defines subsets and traffic policies for a service.
- Gateway: Istio resource that configures edge proxy listeners for inbound traffic.
- Namespace: Kubernetes partitioning construct often used for teams/environments.
- Canary release: Gradually shifting traffic to a new version to reduce risk.
- Zero trust: Security model that requires explicit authorization and strong identity even inside the network.
23. Summary
Alibaba Cloud Service Mesh (ASM) is a Container-category service that brings a managed service mesh (commonly Istio-style) to ACK workloads, enabling consistent traffic management, mTLS security, and observability across microservices.
It matters because microservices sprawl makes releases, security, and troubleshooting difficult. ASM centralizes these concerns at the network/proxy layer so teams can ship safer and operate more reliably.
Cost-wise, plan beyond the ASM service fee: the biggest drivers are often sidecar compute overhead, gateway load balancers, and telemetry storage/ingestion. Security-wise, treat mesh policies as production code, roll out mTLS gradually, and lock down who can change routing.
Use ASM when you need standardized governance across many Kubernetes services and want a managed control plane. Skip it for very small systems or when sidecar complexity outweighs benefits.
Next step: follow the official ASM “Getting Started” documentation for your region, then extend this lab by adding AuthorizationPolicy and integrating mesh telemetry with SLS/ARMS based on your organization’s observability standards.