{"id":106,"date":"2026-04-12T20:29:09","date_gmt":"2026-04-12T20:29:09","guid":{"rendered":"https:\/\/www.devopsschool.com\/tutorials\/alibaba-cloud-cloudmonitor-tutorial-architecture-pricing-use-cases-and-hands-on-guide-for-developer-tools\/"},"modified":"2026-04-12T20:29:09","modified_gmt":"2026-04-12T20:29:09","slug":"alibaba-cloud-cloudmonitor-tutorial-architecture-pricing-use-cases-and-hands-on-guide-for-developer-tools","status":"publish","type":"post","link":"https:\/\/www.devopsschool.com\/tutorials\/alibaba-cloud-cloudmonitor-tutorial-architecture-pricing-use-cases-and-hands-on-guide-for-developer-tools\/","title":{"rendered":"Alibaba Cloud CloudMonitor Tutorial: Architecture, Pricing, Use Cases, and Hands-On Guide for Developer Tools"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">Category<\/h2>\n\n\n\n<p>Developer Tools<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">1. Introduction<\/h2>\n\n\n\n<p>CloudMonitor is Alibaba Cloud\u2019s native monitoring and alerting service for cloud resources and workloads. It helps you collect metrics, visualize health and performance, and trigger notifications or automated responses when something abnormal happens\u2014before users notice.<\/p>\n\n\n\n<p>In simple terms: CloudMonitor watches your Alibaba Cloud services (like ECS, RDS, SLB, and many others), tracks key performance indicators (CPU, memory, latency, errors, throughput, and more depending on the service), and sends alerts when thresholds are breached.<\/p>\n\n\n\n<p>Technically, CloudMonitor is a metrics and events observability layer integrated into the Alibaba Cloud control plane. It provides built-in metric collection for many Alibaba Cloud services, plus the ability to ingest custom metrics (for your own applications), define alarm rules, route notifications, and build dashboards for operations teams.<\/p>\n\n\n\n<p>CloudMonitor solves a core production problem: you can\u2019t operate what you can\u2019t observe. Without consistent monitoring and alerting, teams discover failures late, struggle to troubleshoot, and can\u2019t reliably prove SLO\/SLA compliance or capacity needs.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">2. What is CloudMonitor?<\/h2>\n\n\n\n<p>CloudMonitor is an Alibaba Cloud service designed to monitor cloud resources and applications by collecting metrics\/events, presenting them in dashboards, and enabling alerting and notification workflows.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Official purpose (what it\u2019s for)<\/h3>\n\n\n\n<p>CloudMonitor\u2019s purpose is to provide:\n&#8211; <strong>Monitoring<\/strong> of Alibaba Cloud services (built-in metrics)\n&#8211; <strong>Alerting<\/strong> via alarm rules and notification channels\n&#8211; <strong>Visualization<\/strong> via dashboards\/metric charts\n&#8211; <strong>Custom monitoring<\/strong> for user-defined metrics (where supported)<\/p>\n\n\n\n<p>(For the authoritative scope and feature list, verify in official docs: https:\/\/www.alibabacloud.com\/help\/en\/cloudmonitor\/)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Core capabilities<\/h3>\n\n\n\n<p>Common CloudMonitor capabilities include:\n&#8211; <strong>Cloud service monitoring<\/strong>: collect and chart metrics from supported Alibaba Cloud services\n&#8211; <strong>Host monitoring<\/strong>: OS-level metrics for ECS (often requires an agent; verify per OS\/region)\n&#8211; <strong>Custom metrics<\/strong>: push application\/business metrics into CloudMonitor (API-based)\n&#8211; <strong>Alert rules<\/strong>: threshold\/condition-based alarms\n&#8211; <strong>Notification management<\/strong>: contacts, contact groups, and notification channels (availability varies; verify)\n&#8211; <strong>Dashboards<\/strong>: view metrics across multiple resources in one place\n&#8211; <strong>Events \/ event-driven monitoring<\/strong>: view resource\/system events and alert on them (verify exact event sources in docs)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Major components (conceptual model)<\/h3>\n\n\n\n<p>CloudMonitor typically includes:\n&#8211; <strong>Metric collection<\/strong><br\/>\n  Built-in service metrics + optional agent-based host metrics + custom metric ingestion.\n&#8211; <strong>Metric storage &amp; query<\/strong><br\/>\n  Time-series storage with query APIs (retention and granularity depend on metric type and product rules; verify).\n&#8211; <strong>Dashboards \/ visualization<\/strong><br\/>\n  Console-based dashboards and charts; some environments integrate with Grafana (often via Prometheus or other services\u2014verify for your setup).\n&#8211; <strong>Alarming &amp; notification<\/strong><br\/>\n  Alarm rules evaluate metrics and route alerts to contacts\/channels.\n&#8211; <strong>Access control &amp; audit<\/strong><br\/>\n  Controlled via Alibaba Cloud RAM policies and audited via ActionTrail (verify logging coverage).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Service type<\/h3>\n\n\n\n<p>CloudMonitor is a <strong>managed monitoring\/alerting platform service<\/strong> integrated across Alibaba Cloud.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Scope (regional\/global\/account\/project)<\/h3>\n\n\n\n<p>CloudMonitor is <strong>account-scoped<\/strong> (per Alibaba Cloud account \/ Resource Account under a Resource Directory), while:\n&#8211; <strong>Metrics are typically tied to the region of the monitored resource<\/strong> (for example, ECS in cn-hangzhou vs ap-southeast-1).\n&#8211; The <strong>CloudMonitor console experience is centralized<\/strong>, but you select regions\/resources for queries and alarms.<\/p>\n\n\n\n<p>Exact regional behavior (especially for custom metrics and event sources) can vary\u2014<strong>verify in official docs<\/strong>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">How it fits into the Alibaba Cloud ecosystem<\/h3>\n\n\n\n<p>CloudMonitor is part of the operational foundation for Alibaba Cloud workloads:\n&#8211; Works with compute (ECS), networking (SLB\/ALB), storage (OSS), databases (RDS and others), and many SaaS services.\n&#8211; Complements (does not replace) log-focused products like <strong>Simple Log Service (SLS)<\/strong> and application tracing\/APM products like <strong>ARMS<\/strong>. A common pattern is:\n  &#8211; <strong>CloudMonitor for infrastructure\/service metrics + alarms<\/strong>\n  &#8211; <strong>SLS for logs + log analytics + alerting on log patterns<\/strong>\n  &#8211; <strong>ARMS for application performance monitoring and distributed tracing<\/strong><\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">3. Why use CloudMonitor?<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Business reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Reduced downtime and faster incident response<\/strong>: Detect and alert on issues early.<\/li>\n<li><strong>Operational visibility for stakeholders<\/strong>: Dashboards provide a shared source of truth.<\/li>\n<li><strong>SLA\/SLO support<\/strong>: Monitoring is necessary for reliability commitments.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Technical reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Native integration<\/strong> with Alibaba Cloud services: built-in metrics reduce instrumentation effort.<\/li>\n<li><strong>Unified monitoring plane<\/strong>: standardize alerting patterns across teams and services.<\/li>\n<li><strong>Custom metrics<\/strong> (where supported) let you monitor business KPIs alongside infrastructure signals.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Operational reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Alarm automation<\/strong>: notify on-call engineers, trigger runbooks, or integrate with incident workflows.<\/li>\n<li><strong>Capacity planning<\/strong>: trend analysis helps forecast scale needs.<\/li>\n<li><strong>Change impact visibility<\/strong>: detect regression after deployments or infrastructure changes.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Security\/compliance reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Auditability<\/strong>: monitoring\/alerting supports compliance controls (detect anomalies, track operational status).<\/li>\n<li><strong>Separation of duties<\/strong>: RAM policies can limit who can modify alarm rules and notification channels.<\/li>\n<li><strong>Continuous control validation<\/strong>: confirm that key resources are healthy and within expected bounds.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Scalability\/performance reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Handle growth<\/strong>: consistent metrics across regions\/services support large-scale operations.<\/li>\n<li><strong>Performance baselines<\/strong>: define \u201cnormal\u201d and alert when deviations occur.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">When teams should choose CloudMonitor<\/h3>\n\n\n\n<p>Choose CloudMonitor when you:\n&#8211; Primarily run workloads on Alibaba Cloud and want a <strong>native<\/strong> monitoring platform.\n&#8211; Need <strong>standard service metrics<\/strong> and operational alarms quickly.\n&#8211; Want to <strong>centralize<\/strong> dashboards and alarms across Alibaba Cloud services.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">When teams should not choose it (or should augment it)<\/h3>\n\n\n\n<p>CloudMonitor alone may not be enough when you:\n&#8211; Need <strong>deep application tracing\/APM<\/strong> \u2192 consider <strong>ARMS<\/strong> (verify product fit).\n&#8211; Need <strong>full log analytics, indexing, and search<\/strong> \u2192 use <strong>Simple Log Service (SLS)<\/strong>.\n&#8211; Require a single observability tool across multiple clouds\/on-prem with a unified backend \u2192 consider <strong>Prometheus + Grafana<\/strong> or a third-party observability platform (plus integration).<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">4. Where is CloudMonitor used?<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Industries<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>SaaS and internet services<\/li>\n<li>E-commerce and mobile apps<\/li>\n<li>FinTech and payments (with strict availability monitoring)<\/li>\n<li>Gaming (latency and regional performance monitoring)<\/li>\n<li>Manufacturing\/IoT backends (device ingestion systems on Alibaba Cloud)<\/li>\n<li>Education and media streaming platforms<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Team types<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>DevOps and SRE teams<\/li>\n<li>Platform engineering teams<\/li>\n<li>Cloud infrastructure operations<\/li>\n<li>Security operations (for certain operational anomaly detection)<\/li>\n<li>Application teams (for service-level dashboards and KPIs)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Workloads<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Web applications (ECS + SLB\/ALB + RDS)<\/li>\n<li>Containerized platforms (often augmented with Prometheus\/Kubernetes metrics\u2014verify your product stack)<\/li>\n<li>Batch processing and scheduled workloads<\/li>\n<li>API gateways and microservices (often paired with ARMS)<\/li>\n<li>Storage-heavy workloads using OSS<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Architectures<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Single-region production with HA inside a region<\/li>\n<li>Multi-region active-active or active-passive<\/li>\n<li>Multi-account (Resource Directory) with centralized ops dashboards<\/li>\n<li>Hybrid observability (CloudMonitor + SLS + ARMS)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Production vs dev\/test usage<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Production<\/strong>: comprehensive alarms (availability, error rate, saturation), escalation paths, on-call routing, tighter IAM controls, dashboards for NOC\/SRE.<\/li>\n<li><strong>Dev\/test<\/strong>: fewer alarms, focus on debugging and performance tests; careful cost control (custom metrics and probes can add cost).<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">5. Top Use Cases and Scenarios<\/h2>\n\n\n\n<p>Below are realistic scenarios where CloudMonitor is commonly used.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1) ECS CPU saturation alerting<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Instances become slow or unresponsive due to CPU exhaustion.<\/li>\n<li><strong>Why CloudMonitor fits<\/strong>: ECS exposes built-in CPU utilization metrics; CloudMonitor alarms can notify on thresholds.<\/li>\n<li><strong>Scenario<\/strong>: Trigger an alarm when CPU &gt; 85% for 5 minutes on production ECS instances.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">2) ECS disk and memory monitoring (host monitoring)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Out-of-memory kills or disk-full errors cause outages.<\/li>\n<li><strong>Why CloudMonitor fits<\/strong>: With host monitoring\/agent (where supported), you can capture OS-level memory\/disk usage.<\/li>\n<li><strong>Scenario<\/strong>: Alert when <code>\/<\/code> filesystem usage &gt; 90% or memory available &lt; 10%.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">3) RDS connection and storage threshold alarms<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Applications fail due to max connections reached or storage exhaustion.<\/li>\n<li><strong>Why CloudMonitor fits<\/strong>: RDS provides operational metrics; alarms help prevent incident escalation.<\/li>\n<li><strong>Scenario<\/strong>: Notify DBAs when connections exceed 80% of limit or storage approaches capacity.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">4) Load balancer health and traffic anomalies<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Sudden drops in traffic or back-end health issues.<\/li>\n<li><strong>Why CloudMonitor fits<\/strong>: SLB\/ALB metrics and health indicators can be monitored.<\/li>\n<li><strong>Scenario<\/strong>: Alert when healthy backend server count drops below threshold.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">5) OSS request error monitoring<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Increased 4xx\/5xx errors from OSS disrupt downloads\/uploads.<\/li>\n<li><strong>Why CloudMonitor fits<\/strong>: Many OSS metrics are observable; alarms catch regression quickly.<\/li>\n<li><strong>Scenario<\/strong>: Alert when OSS 5xx rate spikes above baseline.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6) Website availability monitoring (synthetic\/site monitoring)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Users report \u201csite down\u201d but infra metrics look fine.<\/li>\n<li><strong>Why CloudMonitor fits<\/strong>: CloudMonitor commonly offers site\/synthetic monitoring (verify the exact \u201csite monitoring\u201d feature availability in your region).<\/li>\n<li><strong>Scenario<\/strong>: Probe <code>https:\/\/api.example.com\/health<\/code> every minute from multiple locations.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">7) Business KPI monitoring via custom metrics<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Infrastructure is healthy but orders drop or payments fail.<\/li>\n<li><strong>Why CloudMonitor fits<\/strong>: Custom metrics allow pushing business signals.<\/li>\n<li><strong>Scenario<\/strong>: Push \u201csuccessful_checkout_count\u201d metric and alert if it drops to near zero.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">8) Release regression detection<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: A deployment increases latency and error rate.<\/li>\n<li><strong>Why CloudMonitor fits<\/strong>: Dashboards compare pre\/post release patterns; alarms catch threshold breaches.<\/li>\n<li><strong>Scenario<\/strong>: After a release, monitor 95th percentile latency and error counts.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">9) Cost anomaly early-warning (indirect)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: A runaway job increases load and resource consumption.<\/li>\n<li><strong>Why CloudMonitor fits<\/strong>: Resource utilization spikes are often the earliest indicator of unexpected cost growth.<\/li>\n<li><strong>Scenario<\/strong>: Alert when outbound traffic or CPU usage grows unusually fast.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">10) Multi-account centralized NOC dashboard<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Large organizations struggle to view service health across accounts\/teams.<\/li>\n<li><strong>Why CloudMonitor fits<\/strong>: Account-scoped monitoring with RAM permissions and cross-account approaches (verify best practice patterns in docs).<\/li>\n<li><strong>Scenario<\/strong>: Platform team builds standard dashboards and enforces baseline alarms across accounts.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">11) Event-driven operational alerts (maintenance\/instance lifecycle)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Unexpected maintenance, restarts, or lifecycle actions cause disruption.<\/li>\n<li><strong>Why CloudMonitor fits<\/strong>: Event monitoring can surface system\/resource events (verify the event types available).<\/li>\n<li><strong>Scenario<\/strong>: Alert when an ECS instance is stopped\/started unexpectedly.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">12) SLO-driven alerting for critical APIs (combined approach)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: You need service-level alerts (latency\/error budgets), not just CPU.<\/li>\n<li><strong>Why CloudMonitor fits<\/strong>: CloudMonitor metrics + custom metrics can approximate SLO signals; for deeper tracing use ARMS.<\/li>\n<li><strong>Scenario<\/strong>: Push request success rate as a custom metric and alarm on error budget burn.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">6. Core Features<\/h2>\n\n\n\n<blockquote>\n<p>Feature availability can differ by region, account type, or product edition. <strong>Verify in official docs<\/strong> for your environment: https:\/\/www.alibabacloud.com\/help\/en\/cloudmonitor\/<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\">1) Cloud service monitoring (built-in metrics)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Collects metrics from supported Alibaba Cloud services automatically.<\/li>\n<li><strong>Why it matters<\/strong>: You get immediate visibility without installing agents or building collectors.<\/li>\n<li><strong>Practical benefit<\/strong>: Faster onboarding; consistent metric naming and dashboards.<\/li>\n<li><strong>Caveats<\/strong>: Not all services expose the same granularity; some metrics may have collection delays. Verify metric resolution\/retention per service.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">2) Host monitoring for ECS (agent-based OS metrics)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Collects OS-level metrics such as memory, disk usage, processes (depending on supported agent\/OS).<\/li>\n<li><strong>Why it matters<\/strong>: CPU alone doesn\u2019t explain many incidents (OOM, disk full, inode exhaustion).<\/li>\n<li><strong>Practical benefit<\/strong>: Alerts on memory and disk capacity prevent avoidable outages.<\/li>\n<li><strong>Caveats<\/strong>: Requires installing and maintaining an agent; ensure outbound connectivity and proper permissions. Verify supported OS versions and agent install steps.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">3) Custom monitoring (custom metrics ingestion)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Lets you push your own metrics into CloudMonitor via API.<\/li>\n<li><strong>Why it matters<\/strong>: Infrastructure health does not always correlate with business health.<\/li>\n<li><strong>Practical benefit<\/strong>: Monitor KPIs like order counts, queue depth, feature flags, and cron job success.<\/li>\n<li><strong>Caveats<\/strong>: Custom metrics may be billable and subject to quotas; verify ingestion rate, retention, and pricing.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">4) Alarm rules (threshold-based alerting)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Evaluates metric conditions and triggers alarms when rules match (for example, CPU &gt; 85% for 5 minutes).<\/li>\n<li><strong>Why it matters<\/strong>: Automates detection and reduces mean time to detect (MTTD).<\/li>\n<li><strong>Practical benefit<\/strong>: Standard \u201cgolden signal\u201d alerting across services.<\/li>\n<li><strong>Caveats<\/strong>: Poorly tuned thresholds cause alert fatigue; use baselines and severity tiers.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">5) Notification management (contacts, groups, channels)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Routes alarms to contacts and contact groups through configured notification methods (email\/SMS\/webhook options vary\u2014verify).<\/li>\n<li><strong>Why it matters<\/strong>: The best alert is useless if it doesn\u2019t reach the right responders.<\/li>\n<li><strong>Practical benefit<\/strong>: On-call routing; team-based ownership.<\/li>\n<li><strong>Caveats<\/strong>: SMS\/voice notifications often have additional costs; confirm notification pricing and regional availability.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6) Dashboards and visualization<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Provides charts, dashboards, and multi-metric views.<\/li>\n<li><strong>Why it matters<\/strong>: Operations work is faster with curated dashboards.<\/li>\n<li><strong>Practical benefit<\/strong>: Single page view for service health and incident triage.<\/li>\n<li><strong>Caveats<\/strong>: Dashboard features can vary; some advanced visualization needs Grafana\/Prometheus.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">7) Metric query APIs \/ OpenAPI integration<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Provides APIs to query metrics and manage alarms programmatically.<\/li>\n<li><strong>Why it matters<\/strong>: Enables \u201cmonitoring as code\u201d patterns.<\/li>\n<li><strong>Practical benefit<\/strong>: Automate baseline alarms for every new resource; integrate with CI\/CD.<\/li>\n<li><strong>Caveats<\/strong>: API rate limits and authentication via AccessKey\/RAM roles; secure key management is critical.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">8) Event monitoring (resource\/system events)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Surfaces events about resource state changes and platform operations (coverage varies; verify event sources).<\/li>\n<li><strong>Why it matters<\/strong>: Some incidents start as events (maintenance, instance reboot, failed scaling).<\/li>\n<li><strong>Practical benefit<\/strong>: Faster correlation between \u201cwhat changed\u201d and \u201cwhat broke.\u201d<\/li>\n<li><strong>Caveats<\/strong>: Event completeness differs by service; do not rely on events alone for availability monitoring.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">9) Tag-based monitoring and grouping (where supported)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Use tags to filter\/group resources in dashboards and alarm targeting.<\/li>\n<li><strong>Why it matters<\/strong>: Tagging is essential at scale.<\/li>\n<li><strong>Practical benefit<\/strong>: Team ownership, environment separation (prod\/stage), cost allocation.<\/li>\n<li><strong>Caveats<\/strong>: Requires consistent tagging discipline and governance.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">7. Architecture and How It Works<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">High-level service architecture<\/h3>\n\n\n\n<p>CloudMonitor sits between your resources and your operators\/automation:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>\n<p><strong>Metric sources<\/strong>\n   &#8211; Alibaba Cloud services emit metrics (ECS, RDS, SLB\/ALB, OSS, etc.).\n   &#8211; Optional host agent sends OS metrics from ECS.\n   &#8211; Your apps can push custom metrics using APIs\/SDKs.<\/p>\n<\/li>\n<li>\n<p><strong>CloudMonitor ingestion and storage<\/strong>\n   &#8211; Receives metrics and events.\n   &#8211; Stores time-series data with defined retention\/granularity rules (verify specifics per metric).<\/p>\n<\/li>\n<li>\n<p><strong>Evaluation and alerting<\/strong>\n   &#8211; Alarm rules periodically evaluate conditions.\n   &#8211; Alarm state changes trigger notifications.<\/p>\n<\/li>\n<li>\n<p><strong>Visualization and access<\/strong>\n   &#8211; Console dashboards and charts for humans.\n   &#8211; APIs for automation and integration.<\/p>\n<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Request\/data\/control flow (typical)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Data plane<\/strong>: metrics\/events flow from services\/agents\/apps \u2192 CloudMonitor.<\/li>\n<li><strong>Control plane<\/strong>: operators define alarm rules\/dashboards \u2192 CloudMonitor configuration is stored and applied.<\/li>\n<li><strong>Notification flow<\/strong>: alarm triggers \u2192 notification system \u2192 email\/SMS\/webhook (depending on configuration; verify).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Integrations with related services (common)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>RAM (Resource Access Management)<\/strong>: access control for CloudMonitor operations.<\/li>\n<li><strong>ActionTrail<\/strong>: audit of API calls\/changes (verify event coverage).<\/li>\n<li><strong>Simple Log Service (SLS)<\/strong>: log collection and analysis; often paired with CloudMonitor.<\/li>\n<li><strong>ARMS<\/strong>: application performance monitoring\/tracing; complements CloudMonitor metrics.<\/li>\n<li><strong>Resource Directory<\/strong>: multi-account governance (patterns vary; verify best practices).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Dependency services<\/h3>\n\n\n\n<p>CloudMonitor is managed; you generally do not deploy dependencies yourself. Your main dependencies are:\n&#8211; Properly configured RAM permissions\n&#8211; Network access for any required agents\n&#8211; Notification endpoints (email\/SMS\/webhooks)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Security\/authentication model<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Console and API calls authenticate through Alibaba Cloud identity mechanisms.<\/li>\n<li>Programmatic access commonly uses:<\/li>\n<li><strong>RAM users<\/strong> with least-privilege policies<\/li>\n<li><strong>RAM roles<\/strong> for services\/automation where applicable (preferred over long-lived AccessKeys when possible)<\/li>\n<li>Always follow least privilege; restrict \u201cwrite\u201d actions (create\/modify alarms, contacts) to ops automation or a small group.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Networking model<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Cloud service metrics are collected internally by Alibaba Cloud.<\/li>\n<li>Host monitoring agents (if used) may require outbound connectivity to Alibaba Cloud endpoints; exact endpoints\/ports vary. <strong>Verify in official docs<\/strong>.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Monitoring\/logging\/governance considerations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Treat alarms and dashboards as production configuration:<\/li>\n<li>version-control via IaC or scripts where possible<\/li>\n<li>consistent naming conventions<\/li>\n<li>tagging for ownership and environment<\/li>\n<li>Audit alarm changes using ActionTrail.<\/li>\n<li>Review quotas and rate limits to avoid blind spots.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Simple architecture diagram (Mermaid)<\/h3>\n\n\n\n<pre><code class=\"language-mermaid\">flowchart LR\n  subgraph AlibabaCloud[\"Alibaba Cloud Account\"]\n    ECS[\"ECS Instances\"]\n    RDS[\"RDS Database\"]\n    SLB[\"SLB\/ALB\"]\n    APP[\"Custom App Metrics (API)\"]\n  end\n\n  ECS --&gt;|Service Metrics| CMS[\"CloudMonitor\"]\n  RDS --&gt;|Service Metrics| CMS\n  SLB --&gt;|Service Metrics| CMS\n  APP --&gt;|PutCustomMetric API| CMS\n\n  CMS --&gt; DASH[\"Dashboards\"]\n  CMS --&gt; ALARM[\"Alarm Rules\"]\n  ALARM --&gt; NOTIF[\"Notifications (Email\/SMS\/Webhook*)\"]\n\n  note1[\"* Notification types vary by region\/account. Verify in official docs.\"]\n  NOTIF --- note1\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Production-style architecture diagram (Mermaid)<\/h3>\n\n\n\n<pre><code class=\"language-mermaid\">flowchart TB\n  subgraph RD[\"Resource Directory \/ Multi-Account (Optional)\"]\n    A1[\"Prod Account\"]\n    A2[\"Shared Services Account\"]\n  end\n\n  subgraph Prod[\"Production VPC\"]\n    LB[\"SLB\/ALB\"]\n    ECSASG[\"ECS\/ASG App Tier\"]\n    RDS1[\"RDS Primary\"]\n    REDIS[\"ApsaraDB for Redis (optional)\"]\n  end\n\n  subgraph Obs[\"Observability Stack (Managed)\"]\n    CMS[\"CloudMonitor\\n(Metrics, Alarms, Dashboards)\"]\n    SLS[\"Simple Log Service\\n(Logs, Search, Alerts)\"]\n    ARMS[\"ARMS\\n(APM\/Tracing, optional)\"]\n    AT[\"ActionTrail\\n(Audit Logs)\"]\n  end\n\n  Users[\"Users\"] --&gt; LB --&gt; ECSASG --&gt; RDS1\n  ECSASG --&gt; REDIS\n\n  LB --&gt;|Metrics| CMS\n  ECSASG --&gt;|Metrics| CMS\n  RDS1 --&gt;|Metrics| CMS\n  ECSASG --&gt;|Logs| SLS\n  ECSASG --&gt;|Traces\/Metrics*| ARMS\n\n  CMS --&gt;|Alarm Notifications| OnCall[\"On-call (Email\/SMS\/Webhook)\"]\n  CMS --&gt; NOC[\"NOC Dashboard\"]\n\n  CMS --&gt;|API Calls| AT\n  SLS --&gt;|API Calls| AT\n  ARMS --&gt;|API Calls| AT\n\n  note2[\"* ARMS integration depends on your instrumentation and licensing. Verify in official docs.\"]\n  ARMS --- note2\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">8. Prerequisites<\/h2>\n\n\n\n<p>Before starting, ensure you have the following.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Account requirements<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>An active <strong>Alibaba Cloud account<\/strong>.<\/li>\n<li>If using multiple accounts (Resource Directory), ensure you understand where metrics and alarms are managed (verify in official docs).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Permissions \/ IAM (RAM)<\/h3>\n\n\n\n<p>You need permissions to:\n&#8211; View monitored resources and metrics\n&#8211; Create\/manage alarm rules\n&#8211; Create\/manage contacts\/contact groups\n&#8211; (Optional) install host monitoring agent on ECS<\/p>\n\n\n\n<p>Practical approach:\n&#8211; Use a dedicated <strong>RAM user or RAM role<\/strong> for monitoring administration.\n&#8211; Apply least privilege: read-only for viewers, write permissions for ops automation.<\/p>\n\n\n\n<p><strong>Verify exact policy actions<\/strong> in the CloudMonitor API reference and RAM policy docs:\n&#8211; CloudMonitor docs: https:\/\/www.alibabacloud.com\/help\/en\/cloudmonitor\/\n&#8211; RAM docs: https:\/\/www.alibabacloud.com\/help\/en\/ram\/<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Billing requirements<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>A billing method configured (Pay-as-you-go is common for labs).<\/li>\n<li>Some CloudMonitor features may incur charges (custom metrics, synthetic monitoring, notifications). <strong>Confirm pricing before enabling<\/strong>.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Tools (optional but recommended)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Alibaba Cloud console access<\/li>\n<li>Alibaba Cloud CLI (optional): https:\/\/www.alibabacloud.com\/help\/en\/alibaba-cloud-cli\/<\/li>\n<li>SSH client to access an ECS instance for generating load<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Region availability<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>CloudMonitor is available broadly, but <strong>feature availability may vary by region<\/strong>. Verify in official docs.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Quotas \/ limits<\/h3>\n\n\n\n<p>CloudMonitor typically enforces quotas such as:\n&#8211; Maximum alarm rules\n&#8211; Custom metric ingestion limits\n&#8211; API rate limits<\/p>\n\n\n\n<p>Do not assume defaults\u2014<strong>check quotas in the CloudMonitor console or official docs<\/strong>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Prerequisite services (for this lab)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>An <strong>ECS instance<\/strong> in any region you can access via SSH.<\/li>\n<li>If you don\u2019t have one, create a small pay-as-you-go ECS instance (cost depends on region, instance type, disk, bandwidth).<\/li>\n<li>An email address for receiving alarm notifications.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">9. Pricing \/ Cost<\/h2>\n\n\n\n<p>CloudMonitor pricing can be a combination of:\n&#8211; <strong>Included\/basic monitoring<\/strong> for many Alibaba Cloud services\n&#8211; <strong>Usage-based charges<\/strong> for value-added capabilities (often custom metrics, synthetic monitoring, advanced alerting\/notification channels, longer retention, etc.)<\/p>\n\n\n\n<p>Because Alibaba Cloud pricing varies by:\n&#8211; region\n&#8211; account type\/contract\n&#8211; metric types and retention\n&#8211; notification method (SMS can be billable)\nyou should rely on official pricing pages.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Official pricing sources (verify)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Product page (often links to pricing): https:\/\/www.alibabacloud.com\/product\/cloudmonitor<\/li>\n<li>CloudMonitor documentation entry point: https:\/\/www.alibabacloud.com\/help\/en\/cloudmonitor\/<\/li>\n<li>Alibaba Cloud pricing center: https:\/\/www.alibabacloud.com\/pricing (navigate to CloudMonitor if listed)<\/li>\n<li>If an official calculator is available for CloudMonitor, use it (availability varies). Verify in official sources.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Pricing dimensions (typical)<\/h3>\n\n\n\n<p>When evaluating CloudMonitor cost, expect these dimensions (confirm for your account):\n&#8211; <strong>Number\/type of monitored metrics<\/strong><br\/>\n  Built-in service metrics may be included; custom metrics may be billed by count and\/or ingestion frequency.\n&#8211; <strong>Data points ingestion rate<\/strong><br\/>\n  Higher-frequency metrics can increase cost and quota usage.\n&#8211; <strong>Alarm rules count and evaluation frequency<\/strong><br\/>\n  Many alarms across many dimensions can increase evaluation load (pricing varies).\n&#8211; <strong>Notification volume and channel<\/strong><br\/>\n  SMS\/voice notifications can be a direct billable item.\n&#8211; <strong>Synthetic\/site monitoring probes<\/strong><br\/>\n  Usually billed by number of probes, frequency, and locations.\n&#8211; <strong>Retention<\/strong><br\/>\n  Longer retention or high granularity may be part of paid tiers (verify).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Cost drivers (most common)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Enabling <strong>custom metrics<\/strong> widely (high-cardinality labels\/dimensions can explode metric count).<\/li>\n<li>High-frequency monitoring (for example, 10-second intervals) if supported\/paid.<\/li>\n<li><strong>SMS notifications<\/strong> for every alarm flapping incident.<\/li>\n<li>Synthetic checks from many locations at short intervals.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Hidden\/indirect costs<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Data transfer<\/strong>: not usually charged for internal metric collection, but:<\/li>\n<li>host agents may generate outbound traffic (typically small, but verify)<\/li>\n<li>your own custom metric push from outside Alibaba Cloud may incur internet egress from the sender side<\/li>\n<li><strong>Operational cost<\/strong>: time spent tuning thresholds, deduplicating alerts, and maintaining dashboards.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">How to optimize cost<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Prefer <strong>built-in service metrics<\/strong> where possible.<\/li>\n<li>Avoid high-cardinality custom metrics (don\u2019t use user IDs as metric dimensions).<\/li>\n<li>Use <strong>email\/webhook<\/strong> alerts where acceptable; reserve SMS for high-severity paging.<\/li>\n<li>Reduce alarm noise: add appropriate durations, suppression, and dependency-based alerting patterns.<\/li>\n<li>Standardize dashboards and alarms using templates and reuse.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Example low-cost starter estimate (conceptual)<\/h3>\n\n\n\n<p>A low-cost lab typically includes:\n&#8211; 1 ECS instance basic monitoring\n&#8211; 1\u20133 alarm rules\n&#8211; Email notifications only<\/p>\n\n\n\n<p>This is often near-zero incremental CloudMonitor cost if you stay within included metrics and avoid paid add-ons\u2014but <strong>verify in official pricing<\/strong> for your region\/account.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Example production cost considerations<\/h3>\n\n\n\n<p>In production, cost planning should consider:\n&#8211; hundreds\/thousands of resources\n&#8211; per-service dashboards and alert rules\n&#8211; custom metrics for business and application signals\n&#8211; synthetic checks for critical endpoints\n&#8211; paging channels (SMS) and alert volumes<\/p>\n\n\n\n<p>Best practice: run a one-week pilot, measure metric and alert volumes, and then validate charges in Billing Center.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">10. Step-by-Step Hands-On Tutorial<\/h2>\n\n\n\n<p>This lab sets up a practical CloudMonitor alarm for an ECS instance CPU metric, generates load to trigger it, verifies notifications, and cleans up safely.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Objective<\/h3>\n\n\n\n<p>Create a CloudMonitor alarm that notifies you by email when an ECS instance CPU utilization stays high for several minutes, then validate it by generating CPU load on the instance.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Lab Overview<\/h3>\n\n\n\n<p>You will:\n1. Prepare an ECS instance and confirm metrics are visible.\n2. Create CloudMonitor contact and contact group.\n3. Create an alarm rule for ECS CPU utilization.\n4. Generate CPU load to trigger the alarm.\n5. Validate the alarm state and notification delivery.\n6. Clean up (delete alarm rule and optional test tools).<\/p>\n\n\n\n<blockquote>\n<p>Notes:\n&#8211; Exact console labels may vary slightly by region or UI version.\n&#8211; Some accounts require enabling monitoring features or accepting service terms. Follow the console prompts.\n&#8211; To keep costs low, use email notifications rather than SMS.<\/p>\n<\/blockquote>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 1: Prepare an ECS instance and confirm metrics are visible<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Sign in to the Alibaba Cloud console.<\/li>\n<li>Navigate to <strong>ECS<\/strong> and select a region.<\/li>\n<li>Ensure you have one running Linux ECS instance you can SSH into.<\/li>\n<\/ol>\n\n\n\n<p><strong>Expected outcome<\/strong>\n&#8211; ECS instance is running and reachable via SSH.<\/p>\n\n\n\n<p><strong>Verify metrics in CloudMonitor<\/strong>\n1. Open <strong>CloudMonitor<\/strong> in the console.\n2. Find <strong>Cloud Service Monitoring<\/strong> (or similar) and select <strong>ECS<\/strong>.\n3. Locate your instance and open its metric charts.\n4. Confirm you can see <strong>CPUUtilization<\/strong> (or similarly named CPU usage metric).<\/p>\n\n\n\n<p><strong>Expected outcome<\/strong>\n&#8211; You can view CPU usage charts for your ECS instance.<\/p>\n\n\n\n<p>If you cannot see metrics:\n&#8211; Confirm you selected the correct region.\n&#8211; Confirm the ECS instance is running.\n&#8211; Wait a few minutes for metrics to appear after instance creation.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 2: Create an alarm contact and contact group<\/h3>\n\n\n\n<p>CloudMonitor typically routes alarms to <strong>contacts<\/strong> and <strong>contact groups<\/strong>.<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>In <strong>CloudMonitor<\/strong>, go to <strong>Alerts \/ Alarm Service<\/strong> (naming may vary).<\/li>\n<li>Go to <strong>Contacts<\/strong> and create a new contact:\n   &#8211; Name: <code>lab-contact<\/code>\n   &#8211; Email: your email address<\/li>\n<li>Confirm\/verify the email if the console prompts for verification.<\/li>\n<li>Create a <strong>Contact Group<\/strong>:\n   &#8211; Name: <code>lab-oncall<\/code>\n   &#8211; Add <code>lab-contact<\/code> to the group<\/li>\n<\/ol>\n\n\n\n<p><strong>Expected outcome<\/strong>\n&#8211; You have a contact group ready for alarm notifications.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 3: Create a CPU utilization alarm rule for ECS<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>In CloudMonitor, go to <strong>Alarm Rules<\/strong> and choose <strong>Create Alarm Rule<\/strong>.<\/li>\n<li>Select the product\/namespace for ECS metrics (often \u201cECS\u201d or \u201cCompute\/ECS\u201d).<\/li>\n<li>Target your ECS instance (InstanceId).<\/li>\n<li>Configure the rule (example values):\n   &#8211; Metric: CPU utilization (for example, <code>CPUUtilization<\/code>)\n   &#8211; Condition: <code>&gt; 80<\/code> (percent)\n   &#8211; Duration: <code>5 minutes<\/code> (or \u201c5 consecutive periods\u201d depending on UI)\n   &#8211; Alarm level\/severity: <code>Warning<\/code> (or equivalent)<\/li>\n<li>Notification:\n   &#8211; Contact group: <code>lab-oncall<\/code>\n   &#8211; Notification method: <strong>Email<\/strong> (avoid SMS for low cost)<\/li>\n<li>Name the rule: <code>lab-ecs-cpu-high<\/code><\/li>\n<li>Create\/Save.<\/li>\n<\/ol>\n\n\n\n<p><strong>Expected outcome<\/strong>\n&#8211; Alarm rule is created and shows status \u201cEnabled\u201d.<\/p>\n\n\n\n<p><strong>Verification<\/strong>\n&#8211; The alarm appears in the alarm rules list.\n&#8211; Alarm history is empty (no trigger yet), which is expected.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 4: Generate CPU load on the ECS instance<\/h3>\n\n\n\n<p>SSH into your ECS instance and run a CPU stress tool.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Option A: Using <code>stress-ng<\/code> (recommended if available)<\/h4>\n\n\n\n<p>Ubuntu\/Debian:<\/p>\n\n\n\n<pre><code class=\"language-bash\">sudo apt-get update\nsudo apt-get install -y stress-ng\n<\/code><\/pre>\n\n\n\n<p>CentOS\/RHEL\/Alibaba Cloud Linux (package availability varies):<\/p>\n\n\n\n<pre><code class=\"language-bash\">sudo yum install -y stress-ng || true\n<\/code><\/pre>\n\n\n\n<p>Run CPU load (example: use 2 workers for 10 minutes):<\/p>\n\n\n\n<pre><code class=\"language-bash\">sudo stress-ng --cpu 2 --timeout 10m\n<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\">Option B: Using <code>stress<\/code> (often available)<\/h4>\n\n\n\n<p>Ubuntu\/Debian:<\/p>\n\n\n\n<pre><code class=\"language-bash\">sudo apt-get update\nsudo apt-get install -y stress\nsudo stress --cpu 2 --timeout 600\n<\/code><\/pre>\n\n\n\n<p>CentOS\/RHEL:<\/p>\n\n\n\n<pre><code class=\"language-bash\">sudo yum install -y epel-release || true\nsudo yum install -y stress || true\nsudo stress --cpu 2 --timeout 600\n<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\">Option C: Simple shell loop (no packages)<\/h4>\n\n\n\n<p>This is less controlled, but works when you cannot install packages:<\/p>\n\n\n\n<pre><code class=\"language-bash\">for i in 1 2; do\n  (while :; do :; done) &amp;\ndone\necho \"CPU loops started. Remember to stop them later.\"\n<\/code><\/pre>\n\n\n\n<p>To stop the loops later:<\/p>\n\n\n\n<pre><code class=\"language-bash\">pkill -f \"while :; do :; done\" || true\n<\/code><\/pre>\n\n\n\n<p><strong>Expected outcome<\/strong>\n&#8211; CPU utilization rises significantly and stays high for several minutes.<\/p>\n\n\n\n<p><strong>Verify on the instance<\/strong><\/p>\n\n\n\n<pre><code class=\"language-bash\">top\n<\/code><\/pre>\n\n\n\n<p>You should see CPU usage near 100% (depending on vCPU count and workers).<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 5: Observe the alarm triggering<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Return to the CloudMonitor console.<\/li>\n<li>Open your ECS CPU chart and confirm CPU is above the threshold.<\/li>\n<li>Go to <strong>Alarm History<\/strong> (or \u201cAlarm Events\u201d).<\/li>\n<li>Wait for the evaluation window to pass (for a 5-minute rule, it can take 5\u201310 minutes depending on metric delay and evaluation interval).<\/li>\n<\/ol>\n\n\n\n<p><strong>Expected outcome<\/strong>\n&#8211; The alarm transitions to a triggered state (often \u201cALARM\u201d).\n&#8211; You receive an email notification.<\/p>\n\n\n\n<p>If you do not receive email:\n&#8211; Check email spam\/junk folders.\n&#8211; Confirm the contact email was verified.\n&#8211; Confirm the rule\u2019s notification settings include your contact group.\n&#8211; Confirm the alarm condition truly stayed above threshold for the configured duration.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 6: Recover and confirm alarm clears (optional but recommended)<\/h3>\n\n\n\n<p>Stop the stress workload (if not timed):<\/p>\n\n\n\n<p>If using <code>stress-ng<\/code> or <code>stress<\/code>, it stops automatically after timeout.<br\/>\nIf using shell loops:<\/p>\n\n\n\n<pre><code class=\"language-bash\">pkill -f \"while :; do :; done\" || true\n<\/code><\/pre>\n\n\n\n<p>Wait several minutes and observe CPU drop below threshold.<\/p>\n\n\n\n<p><strong>Expected outcome<\/strong>\n&#8211; Alarm eventually returns to normal\/OK (depending on rule behavior and whether \u201crecovery notifications\u201d are enabled).<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Validation<\/h3>\n\n\n\n<p>Use this checklist:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>[ ] ECS CPU metrics are visible in CloudMonitor charts  <\/li>\n<li>[ ] Contact and contact group exist, email is verified  <\/li>\n<li>[ ] Alarm rule <code>lab-ecs-cpu-high<\/code> is enabled  <\/li>\n<li>[ ] CPU load sustained above threshold long enough to trigger the alarm  <\/li>\n<li>[ ] Alarm history shows trigger event  <\/li>\n<li>[ ] Email notification received  <\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Troubleshooting<\/h3>\n\n\n\n<p><strong>Issue: No metrics appear for ECS<\/strong>\n&#8211; Confirm correct region in CloudMonitor.\n&#8211; Wait 5\u201315 minutes after instance creation.\n&#8211; Verify the instance is running.\n&#8211; Some metrics require specific instance types or agents; check ECS metric docs (verify in official docs).<\/p>\n\n\n\n<p><strong>Issue: Alarm does not trigger<\/strong>\n&#8211; Ensure condition matches the metric scale (percent vs fraction).\n&#8211; Increase stress load (more workers) or lower threshold.\n&#8211; Increase duration window awareness: metrics and alarm evaluation can lag.<\/p>\n\n\n\n<p><strong>Issue: No email notification<\/strong>\n&#8211; Confirm email verification status.\n&#8211; Confirm contact group is attached to the alarm rule.\n&#8211; Check notification preferences and alarm severity routing.\n&#8211; Verify whether your account\/region restricts certain notification methods.<\/p>\n\n\n\n<p><strong>Issue: CPU doesn\u2019t go high<\/strong>\n&#8211; Your instance may have multiple vCPUs; <code>--cpu 2<\/code> might not saturate it. Increase workers:\n  <code>bash\n  sudo stress-ng --cpu 4 --timeout 10m<\/code>\n&#8211; Use <code>top<\/code> to confirm actual CPU usage.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Cleanup<\/h3>\n\n\n\n<p>To avoid noise and potential cost:\n1. Delete (or disable) the alarm rule <code>lab-ecs-cpu-high<\/code>.\n2. Delete the <code>lab-oncall<\/code> contact group and <code>lab-contact<\/code> contact (optional).\n3. Remove stress tools (optional):\n   &#8211; Ubuntu\/Debian:\n     <code>bash\n     sudo apt-get remove -y stress-ng stress || true<\/code>\n4. If you created an ECS instance just for this lab, stop or release it to avoid compute charges.<\/p>\n\n\n\n<p><strong>Expected outcome<\/strong>\n&#8211; No active alarm rules remain from the lab; no ongoing notifications.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">11. Best Practices<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Architecture best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Monitor the \u201cgolden signals\u201d:<\/li>\n<li><strong>Latency<\/strong>, <strong>Traffic<\/strong>, <strong>Errors<\/strong>, <strong>Saturation<\/strong><\/li>\n<li>Add <strong>dependency-aware dashboards<\/strong>:<\/li>\n<li>Load balancer \u2192 app tier \u2192 database \u2192 storage<\/li>\n<li>Use <strong>multi-region views<\/strong> for active-active architectures.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">IAM\/security best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use <strong>RAM roles<\/strong> where possible instead of long-lived AccessKeys.<\/li>\n<li>Separate duties:<\/li>\n<li>Read-only dashboards for most users<\/li>\n<li>Limited write permissions for ops\/platform team<\/li>\n<li>Restrict who can change:<\/li>\n<li>alarm rules<\/li>\n<li>contact groups<\/li>\n<li>notification channels<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Cost best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Keep custom metrics low-cardinality:<\/li>\n<li>Good: <code>service=checkout<\/code>, <code>env=prod<\/code><\/li>\n<li>Bad: <code>user_id=123456<\/code><\/li>\n<li>Reduce SMS usage; reserve for critical pages.<\/li>\n<li>Use fewer, higher-quality alarms instead of hundreds of noisy ones.<\/li>\n<li>Periodically prune unused dashboards\/alarm rules.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Performance best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Prefer built-in metrics when they exist.<\/li>\n<li>Avoid pushing custom metrics at unnecessarily high frequency.<\/li>\n<li>Use aggregation (sum\/avg\/max) at the source when possible.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Reliability best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Make alarms actionable:<\/li>\n<li>include runbook links in descriptions (if supported)<\/li>\n<li>include owner\/team tag in the alarm name or metadata<\/li>\n<li>Use multiple severity levels:<\/li>\n<li>Warning (email)<\/li>\n<li>Critical (page)<\/li>\n<li>Prevent flapping:<\/li>\n<li>use durations and proper thresholds<\/li>\n<li>use silence\/maintenance windows during planned work (verify feature availability)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Operations best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Standardize naming:<\/li>\n<li><code>prod-&lt;service&gt;-&lt;resource&gt;-&lt;signal&gt;<\/code><\/li>\n<li>Create baseline dashboards:<\/li>\n<li>per service<\/li>\n<li>per environment<\/li>\n<li>Review alarms monthly:<\/li>\n<li>remove stale ones<\/li>\n<li>tune thresholds based on real incidents<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Governance\/tagging\/naming best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Enforce tags:<\/li>\n<li><code>Environment=prod|stage|dev<\/code><\/li>\n<li><code>OwnerTeam=...<\/code><\/li>\n<li><code>Application=...<\/code><\/li>\n<li><code>CostCenter=...<\/code><\/li>\n<li>Use tags as filters for dashboards and alarm targeting (where supported).<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">12. Security Considerations<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Identity and access model<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>CloudMonitor access is governed by <strong>RAM<\/strong>.<\/li>\n<li>Apply least privilege:<\/li>\n<li>Viewers: read-only metrics\/dashboards<\/li>\n<li>Operators: manage alarms<\/li>\n<li>Admins: manage notification configurations and integrations<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Encryption<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Data is managed by Alibaba Cloud; for in-transit and at-rest controls, <strong>verify CloudMonitor security documentation<\/strong> and your compliance needs.<\/li>\n<li>For custom metrics, ensure your client uses official endpoints and TLS (standard for Alibaba Cloud APIs).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Network exposure<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Built-in metrics require no inbound access to your VPC.<\/li>\n<li>Host monitoring agents may require outbound connectivity; restrict via:<\/li>\n<li>security groups<\/li>\n<li>egress policies<\/li>\n<li>private endpoints\/VPC endpoints if supported (verify)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Secrets handling<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Avoid embedding AccessKeys in scripts on ECS.<\/li>\n<li>Prefer:<\/li>\n<li>RAM roles (where applicable)<\/li>\n<li>secure secret stores (for example, KMS\/Secrets Manager patterns\u2014verify Alibaba Cloud offerings and best fit)<\/li>\n<li>Rotate AccessKeys if you must use them.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Audit\/logging<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use <strong>ActionTrail<\/strong> to audit CloudMonitor configuration changes (alarm creation, contact changes).<\/li>\n<li>Export audit logs to SLS for retention and search if needed (verify integration).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Compliance considerations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Determine where monitoring data is stored\/processed (region, retention).<\/li>\n<li>Ensure contact\/notification data (email\/phone numbers) is handled per privacy policies.<\/li>\n<li>For regulated industries, confirm:<\/li>\n<li>data residency<\/li>\n<li>retention controls<\/li>\n<li>access logging<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Common security mistakes<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Over-permissive RAM policies that allow anyone to disable alarms.<\/li>\n<li>Storing AccessKeys in plaintext on instances or in code repos.<\/li>\n<li>Sending critical alerts to shared inboxes without access controls.<\/li>\n<li>Not auditing alarm rule changes (leading to silent monitoring gaps).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Secure deployment recommendations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Maintain a minimal set of operators who can modify alarm configurations.<\/li>\n<li>Use change management for alarm changes (ticket, PR, approval).<\/li>\n<li>Regularly test that alerts reach the on-call rotation.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">13. Limitations and Gotchas<\/h2>\n\n\n\n<p>Because CloudMonitor is a managed service and deeply integrated with Alibaba Cloud resources, watch for these common issues:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Region mismatches<\/strong>: Metrics are often region-bound; selecting the wrong region makes resources \u201cdisappear.\u201d<\/li>\n<li><strong>Metric delays<\/strong>: Some metrics are not real-time; alarm evaluation can lag behind actual behavior.<\/li>\n<li><strong>Different granularity per service<\/strong>: ECS CPU may be frequent; other services might have coarser resolution.<\/li>\n<li><strong>Agent requirements<\/strong>: OS-level metrics often require an agent; missing agent = missing memory\/disk signals.<\/li>\n<li><strong>Quota limits<\/strong>: Alarm rules, custom metrics, and API calls can hit quotas.<\/li>\n<li><strong>Alert fatigue<\/strong>: Default thresholds can be noisy; tune based on baselines and service behavior.<\/li>\n<li><strong>Notification costs<\/strong>: SMS\/voice can create surprise bills during incident storms.<\/li>\n<li><strong>High-cardinality custom metrics<\/strong>: Can explode costs and degrade manageability.<\/li>\n<li><strong>Cross-account visibility<\/strong>: Multi-account setups require careful RAM and governance patterns (verify best practice in official docs).<\/li>\n<li><strong>Service overlaps<\/strong>: Logs and APM are separate products (SLS, ARMS). Don\u2019t expect CloudMonitor to replace them.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">14. Comparison with Alternatives<\/h2>\n\n\n\n<p>CloudMonitor sits in the \u201cmetrics + alerting for Alibaba Cloud resources\u201d space. Here\u2019s how it compares.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Comparison table<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Option<\/th>\n<th>Best For<\/th>\n<th>Strengths<\/th>\n<th>Weaknesses<\/th>\n<th>When to Choose<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>Alibaba Cloud CloudMonitor<\/strong><\/td>\n<td>Native monitoring\/alerting for Alibaba Cloud services<\/td>\n<td>Tight integration with Alibaba Cloud services, fast setup, managed dashboards\/alarms<\/td>\n<td>Not a full log analytics platform; deep APM requires other services<\/td>\n<td>You run primarily on Alibaba Cloud and want a first-party monitoring baseline<\/td>\n<\/tr>\n<tr>\n<td><strong>Alibaba Cloud ARMS<\/strong><\/td>\n<td>Application performance monitoring &amp; tracing<\/td>\n<td>Deep app insights, traces, service topology (verify exact features)<\/td>\n<td>Requires instrumentation; cost and complexity may be higher<\/td>\n<td>You need application-level latency breakdowns and distributed tracing<\/td>\n<\/tr>\n<tr>\n<td><strong>Alibaba Cloud Simple Log Service (SLS)<\/strong><\/td>\n<td>Log collection, search, analytics, log-based alerts<\/td>\n<td>Powerful log analytics, indexing, long-term retention options<\/td>\n<td>Not a metrics-first tool; requires log pipeline setup<\/td>\n<td>You need to investigate errors via logs or alert on log patterns<\/td>\n<\/tr>\n<tr>\n<td><strong>Managed Service for Prometheus + Grafana (Alibaba Cloud ecosystem)<\/strong><\/td>\n<td>Cloud-native\/Kubernetes metrics and standard Prometheus ecosystem<\/td>\n<td>PromQL, broad ecosystem, Grafana dashboards<\/td>\n<td>Requires Prometheus model knowledge; integration work<\/td>\n<td>You run Kubernetes\/microservices and want Prometheus-native observability<\/td>\n<\/tr>\n<tr>\n<td><strong>AWS CloudWatch<\/strong><\/td>\n<td>Monitoring on AWS<\/td>\n<td>Mature cross-service integration in AWS<\/td>\n<td>Not native to Alibaba Cloud<\/td>\n<td>Only if your workloads are primarily on AWS<\/td>\n<\/tr>\n<tr>\n<td><strong>Azure Monitor<\/strong><\/td>\n<td>Monitoring on Azure<\/td>\n<td>Strong Azure integrations<\/td>\n<td>Not native to Alibaba Cloud<\/td>\n<td>Only if your workloads are primarily on Azure<\/td>\n<\/tr>\n<tr>\n<td><strong>Google Cloud Monitoring<\/strong><\/td>\n<td>Monitoring on GCP<\/td>\n<td>Deep GCP integration<\/td>\n<td>Not native to Alibaba Cloud<\/td>\n<td>Only if your workloads are primarily on GCP<\/td>\n<\/tr>\n<tr>\n<td><strong>Prometheus + Grafana (self-managed)<\/strong><\/td>\n<td>Full control, hybrid\/multi-cloud<\/td>\n<td>Portable, flexible, large ecosystem<\/td>\n<td>Operational overhead; scaling\/HA complexity<\/td>\n<td>You need maximum portability and can operate the stack<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">15. Real-World Example<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Enterprise example: Multi-region e-commerce platform on Alibaba Cloud<\/h3>\n\n\n\n<p><strong>Problem<\/strong>\nA large e-commerce company runs production in multiple regions. During flash sales, they face:\n&#8211; CPU saturation on ECS\n&#8211; database connection spikes on RDS\n&#8211; intermittent 5xx from load balancers\nThey need standardized alerting, dashboards for NOC, and audit trails for changes.<\/p>\n\n\n\n<p><strong>Proposed architecture<\/strong>\n&#8211; CloudMonitor monitors ECS, SLB\/ALB, RDS metrics in each region.\n&#8211; Alarm rules:\n  &#8211; critical: LB 5xx spikes, RDS connections near limit\n  &#8211; warning: CPU\/memory\/disk thresholds\n&#8211; Dashboards:\n  &#8211; per region and global view\n  &#8211; per business service (checkout\/search\/catalog)\n&#8211; ActionTrail audits all alarm configuration changes.\n&#8211; SLS collects application logs for root-cause analysis; ARMS used for deep tracing in critical services (optional).<\/p>\n\n\n\n<p><strong>Why CloudMonitor was chosen<\/strong>\n&#8211; Native Alibaba Cloud integration reduces deployment effort.\n&#8211; Centralized alarming and dashboards are faster to standardize.\n&#8211; Works well as the baseline metric layer for all core services.<\/p>\n\n\n\n<p><strong>Expected outcomes<\/strong>\n&#8211; Faster detection of saturation and error spikes\n&#8211; Reduced time to triage via standardized dashboards\n&#8211; Better governance and auditability for monitoring configuration<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Startup\/small-team example: Single-region SaaS API<\/h3>\n\n\n\n<p><strong>Problem<\/strong>\nA small startup runs:\n&#8211; 2 ECS instances behind a load balancer\n&#8211; RDS database\nThey need basic monitoring and reliable alerts without operating a full observability stack.<\/p>\n\n\n\n<p><strong>Proposed architecture<\/strong>\n&#8211; CloudMonitor for:\n  &#8211; ECS CPU alarms\n  &#8211; RDS storage and connections alarms\n  &#8211; LB health metrics dashboard\n&#8211; Email notifications to the shared on-call inbox; SMS reserved for critical alarms only.\n&#8211; Minimal dashboard for daily checks.<\/p>\n\n\n\n<p><strong>Why CloudMonitor was chosen<\/strong>\n&#8211; Low operational overhead (managed service).\n&#8211; Quick setup and good default metrics for Alibaba Cloud services.<\/p>\n\n\n\n<p><strong>Expected outcomes<\/strong>\n&#8211; Incidents detected early without building custom tooling\n&#8211; Lean ops process suitable for a small team<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">16. FAQ<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">1) Is CloudMonitor the same as AWS CloudWatch?<\/h3>\n\n\n\n<p>No. CloudMonitor is Alibaba Cloud\u2019s monitoring service. AWS CloudWatch is specific to AWS. They solve similar problems but are separate products with different APIs and integrations.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">2) Do I need to install anything to monitor ECS CPU usage?<\/h3>\n\n\n\n<p>Usually no\u2014ECS basic metrics such as CPU usage are typically available as built-in service metrics. OS-level metrics (memory\/disk) often require an agent. Verify ECS metric coverage in official docs.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">3) Can CloudMonitor monitor on-premises servers?<\/h3>\n\n\n\n<p>CloudMonitor is primarily for Alibaba Cloud resources. Some monitoring models might allow external\/custom metrics ingestion, but on-prem host monitoring is not guaranteed. Verify supported hybrid options in official docs.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">4) Can I create custom metrics for my application?<\/h3>\n\n\n\n<p>CloudMonitor commonly supports custom metrics via API (custom monitoring). Availability, quotas, and pricing vary\u2014verify in official docs and pricing pages.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">5) How do I avoid alert fatigue?<\/h3>\n\n\n\n<p>Use:\n&#8211; meaningful thresholds tied to user impact\n&#8211; evaluation durations (e.g., 5 minutes)\n&#8211; severity levels\n&#8211; fewer, higher-quality alerts\nAlso review and tune regularly based on incidents.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">6) What notification methods are supported?<\/h3>\n\n\n\n<p>Typically email and sometimes SMS or webhooks\/integrations. Exact methods vary by region\/account and may change\u2014verify in your CloudMonitor console and docs.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">7) Does SMS alerting cost extra?<\/h3>\n\n\n\n<p>Often yes\u2014telecom-based notifications typically incur charges. Confirm in Alibaba Cloud pricing and your Billing Center.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">8) How long does CloudMonitor retain metrics?<\/h3>\n\n\n\n<p>Retention depends on metric type and service rules. Built-in metrics and custom metrics may differ. Verify retention in official docs.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">9) Can I manage alarms as code?<\/h3>\n\n\n\n<p>CloudMonitor offers APIs (OpenAPI) for many operations. You can script alarm rule creation and updates. Verify API coverage in the CloudMonitor API reference.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">10) Can I monitor Kubernetes metrics with CloudMonitor?<\/h3>\n\n\n\n<p>Kubernetes monitoring is usually handled through Prometheus-based solutions and\/or ARMS\/other Alibaba Cloud services. CloudMonitor may integrate at the infrastructure level. Verify the recommended Alibaba Cloud approach for ACK\/Kubernetes monitoring.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">11) What\u2019s the difference between CloudMonitor and SLS?<\/h3>\n\n\n\n<p>CloudMonitor focuses on <strong>metrics<\/strong> (time-series). SLS focuses on <strong>logs<\/strong> (search, indexing, analytics). They complement each other.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">12) What\u2019s the difference between CloudMonitor and ARMS?<\/h3>\n\n\n\n<p>CloudMonitor is mainly infrastructure\/service metrics and alerting. ARMS is application performance monitoring and tracing (APM). Use ARMS when you need code-level insights and distributed tracing.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">13) Why do I see different metrics for different services?<\/h3>\n\n\n\n<p>Each Alibaba Cloud service exposes a different metric set and granularity based on what\u2019s meaningful for that service. Always consult that service\u2019s metric reference.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">14) How do I secure access to monitoring data?<\/h3>\n\n\n\n<p>Use RAM least-privilege policies, restrict alarm modifications, rotate keys, and audit changes with ActionTrail.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">15) Why do alarms trigger late?<\/h3>\n\n\n\n<p>Common reasons:\n&#8211; metric publishing delay\n&#8211; evaluation window duration\n&#8211; rule configuration (period, consecutive breaches)\nTune the rule and consider metric resolution constraints.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">17. Top Online Resources to Learn CloudMonitor<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Resource Type<\/th>\n<th>Name<\/th>\n<th>Why It Is Useful<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Official documentation<\/td>\n<td>CloudMonitor Documentation<\/td>\n<td>Canonical feature descriptions, configuration guides, and references: https:\/\/www.alibabacloud.com\/help\/en\/cloudmonitor\/<\/td>\n<\/tr>\n<tr>\n<td>Official product page<\/td>\n<td>CloudMonitor Product Page<\/td>\n<td>Overview and entry point for pricing and positioning: https:\/\/www.alibabacloud.com\/product\/cloudmonitor<\/td>\n<\/tr>\n<tr>\n<td>Official API reference<\/td>\n<td>CloudMonitor API Reference (OpenAPI)<\/td>\n<td>Automate alarms\/metrics queries; confirm actions and parameters (navigate from docs): https:\/\/www.alibabacloud.com\/help\/en\/cloudmonitor\/<\/td>\n<\/tr>\n<tr>\n<td>Official CLI docs<\/td>\n<td>Alibaba Cloud CLI<\/td>\n<td>Learn how to authenticate and call CloudMonitor APIs via CLI: https:\/\/www.alibabacloud.com\/help\/en\/alibaba-cloud-cli\/<\/td>\n<\/tr>\n<tr>\n<td>Official RAM docs<\/td>\n<td>Resource Access Management (RAM)<\/td>\n<td>Secure CloudMonitor access with least privilege: https:\/\/www.alibabacloud.com\/help\/en\/ram\/<\/td>\n<\/tr>\n<tr>\n<td>Official audit docs<\/td>\n<td>ActionTrail<\/td>\n<td>Audit monitoring configuration changes: https:\/\/www.alibabacloud.com\/help\/en\/actiontrail\/<\/td>\n<\/tr>\n<tr>\n<td>Official logging docs<\/td>\n<td>Simple Log Service (SLS)<\/td>\n<td>Complement metrics with logs and log-based alerting: https:\/\/www.alibabacloud.com\/help\/en\/sls\/<\/td>\n<\/tr>\n<tr>\n<td>Official APM docs<\/td>\n<td>ARMS<\/td>\n<td>Add application tracing and APM where needed: https:\/\/www.alibabacloud.com\/help\/en\/arms\/<\/td>\n<\/tr>\n<tr>\n<td>Architecture guidance<\/td>\n<td>Alibaba Cloud Architecture Center<\/td>\n<td>Reference architectures and operational patterns (search within): https:\/\/www.alibabacloud.com\/architecture<\/td>\n<\/tr>\n<tr>\n<td>Community learning<\/td>\n<td>Alibaba Cloud Blog<\/td>\n<td>Practical posts and announcements; verify against docs: https:\/\/www.alibabacloud.com\/blog<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">18. Training and Certification Providers<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Institute<\/th>\n<th>Suitable Audience<\/th>\n<th>Likely Learning Focus<\/th>\n<th>Mode<\/th>\n<th>Website URL<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>DevOpsSchool.com<\/td>\n<td>DevOps engineers, SREs, cloud engineers<\/td>\n<td>DevOps practices, monitoring\/observability fundamentals, cloud operations<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.devopsschool.com\/<\/td>\n<\/tr>\n<tr>\n<td>ScmGalaxy.com<\/td>\n<td>Beginners to intermediate engineers<\/td>\n<td>SCM\/DevOps concepts, operational tooling<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.scmgalaxy.com\/<\/td>\n<\/tr>\n<tr>\n<td>CLoudOpsNow.in<\/td>\n<td>Cloud operations teams<\/td>\n<td>Cloud operations practices, monitoring and reliability<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.cloudopsnow.in\/<\/td>\n<\/tr>\n<tr>\n<td>SreSchool.com<\/td>\n<td>SREs and platform teams<\/td>\n<td>SRE principles, incident response, monitoring and alerting design<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.sreschool.com\/<\/td>\n<\/tr>\n<tr>\n<td>AiOpsSchool.com<\/td>\n<td>Ops and engineering leaders, SREs<\/td>\n<td>AIOps concepts, event correlation, automation (verify course outlines)<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.aiopsschool.com\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">19. Top Trainers<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Platform\/Site<\/th>\n<th>Likely Specialization<\/th>\n<th>Suitable Audience<\/th>\n<th>Website URL<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>RajeshKumar.xyz<\/td>\n<td>DevOps\/cloud coaching (verify offerings)<\/td>\n<td>Students, working engineers<\/td>\n<td>https:\/\/rajeshkumar.xyz\/<\/td>\n<\/tr>\n<tr>\n<td>devopstrainer.in<\/td>\n<td>DevOps training and mentoring (verify offerings)<\/td>\n<td>Beginners to intermediate<\/td>\n<td>https:\/\/www.devopstrainer.in\/<\/td>\n<\/tr>\n<tr>\n<td>devopsfreelancer.com<\/td>\n<td>Freelance DevOps support\/training (verify offerings)<\/td>\n<td>Teams needing practical guidance<\/td>\n<td>https:\/\/www.devopsfreelancer.com\/<\/td>\n<\/tr>\n<tr>\n<td>devopssupport.in<\/td>\n<td>DevOps support and enablement (verify offerings)<\/td>\n<td>Ops teams and small organizations<\/td>\n<td>https:\/\/www.devopssupport.in\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">20. Top Consulting Companies<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Company Name<\/th>\n<th>Likely Service Area<\/th>\n<th>Where They May Help<\/th>\n<th>Consulting Use Case Examples<\/th>\n<th>Website URL<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>cotocus.com<\/td>\n<td>Cloud\/DevOps consulting (verify service catalog)<\/td>\n<td>Observability adoption, cloud operations processes<\/td>\n<td>Baseline monitoring rollout, alert tuning workshops, dashboard standardization<\/td>\n<td>https:\/\/cotocus.com\/<\/td>\n<\/tr>\n<tr>\n<td>DevOpsSchool.com<\/td>\n<td>DevOps consulting and enablement (verify service catalog)<\/td>\n<td>DevOps transformation, monitoring practices, training<\/td>\n<td>Monitoring strategy, incident response process, \u201cmonitoring as code\u201d implementation<\/td>\n<td>https:\/\/www.devopsschool.com\/<\/td>\n<\/tr>\n<tr>\n<td>DEVOPSCONSULTING.IN<\/td>\n<td>DevOps consulting (verify service catalog)<\/td>\n<td>CI\/CD, cloud ops, reliability<\/td>\n<td>Alarm rationalization, operational readiness reviews, SRE playbooks<\/td>\n<td>https:\/\/www.devopsconsulting.in\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">21. Career and Learning Roadmap<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">What to learn before CloudMonitor<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Alibaba Cloud fundamentals:<\/li>\n<li>ECS, VPC, security groups, SLB\/ALB, RDS basics<\/li>\n<li>Monitoring fundamentals:<\/li>\n<li>metrics vs logs vs traces<\/li>\n<li>SLI\/SLO\/SLA concepts<\/li>\n<li>alert fatigue and on-call basics<\/li>\n<li>IAM basics:<\/li>\n<li>RAM users\/roles\/policies<\/li>\n<li>least privilege and audit logging<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">What to learn after CloudMonitor<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Simple Log Service (SLS)<\/strong> for log pipelines and log analytics<\/li>\n<li><strong>ARMS<\/strong> for tracing and APM (if you build microservices)<\/li>\n<li>Incident management:<\/li>\n<li>runbooks, postmortems, error budgets<\/li>\n<li>Infrastructure as Code:<\/li>\n<li>automate alarms and dashboards (via OpenAPI\/CLI\/Terraform patterns\u2014verify official support and providers)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Job roles that use CloudMonitor<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Cloud engineer \/ cloud operations<\/li>\n<li>DevOps engineer<\/li>\n<li>SRE<\/li>\n<li>Platform engineer<\/li>\n<li>Security engineer (operational monitoring and audit support)<\/li>\n<li>Solutions architect (designing production-ready operations)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Certification path (if available)<\/h3>\n\n\n\n<p>Alibaba Cloud certifications evolve over time and vary by region. Verify current certification tracks on the official certification portal:\n&#8211; https:\/\/edu.alibabacloud.com\/ (verify current certification pages and paths)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Project ideas for practice<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Build a \u201cproduction baseline\u201d dashboard for ECS + RDS + SLB\/ALB.<\/li>\n<li>Implement a standard set of alarms (CPU, disk, LB 5xx, DB connections).<\/li>\n<li>Add custom metrics for a sample API (requests\/sec, error rate).<\/li>\n<li>Create an \u201calert review\u201d process: track top noisy alarms and tune them.<\/li>\n<li>Multi-region failover drill: monitor primary and standby health and alert on failover signals.<\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">22. Glossary<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Alarm rule<\/strong>: A condition evaluated against a metric that triggers notifications when breached.<\/li>\n<li><strong>Metric<\/strong>: A numerical time-series measurement (CPU %, latency ms, requests count).<\/li>\n<li><strong>Namespace<\/strong>: Logical grouping for metrics (often per product\/service).<\/li>\n<li><strong>Dimension<\/strong>: Metadata that identifies a metric series (e.g., InstanceId, device).<\/li>\n<li><strong>Retention<\/strong>: How long monitoring data is stored.<\/li>\n<li><strong>Granularity \/ resolution<\/strong>: The time interval between metric data points (e.g., 1 minute).<\/li>\n<li><strong>SLI (Service Level Indicator)<\/strong>: A measurable indicator of service performance (latency, availability).<\/li>\n<li><strong>SLO (Service Level Objective)<\/strong>: Target value\/range for an SLI.<\/li>\n<li><strong>SLA (Service Level Agreement)<\/strong>: Contractual commitment, often derived from SLOs.<\/li>\n<li><strong>Alert fatigue<\/strong>: When too many low-quality alerts cause responders to ignore alerts.<\/li>\n<li><strong>Host monitoring<\/strong>: OS-level monitoring, often via an agent installed on the server.<\/li>\n<li><strong>Custom metric<\/strong>: A metric defined and pushed by the user\/application rather than provided by the cloud service.<\/li>\n<li><strong>RAM<\/strong>: Resource Access Management, Alibaba Cloud\u2019s IAM service.<\/li>\n<li><strong>ActionTrail<\/strong>: Alibaba Cloud audit logging service for API actions.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">23. Summary<\/h2>\n\n\n\n<p>CloudMonitor is Alibaba Cloud\u2019s native monitoring, alerting, and dashboard service\u2014an essential part of Developer Tools for operating workloads reliably. It provides built-in metrics for many Alibaba Cloud services, supports alarms and notifications, and can be extended with custom monitoring for application or business KPIs (where supported).<\/p>\n\n\n\n<p>It matters because consistent observability reduces downtime, speeds up incident response, and supports scalable operations. Cost is typically driven by value-added features (custom metrics, synthetic checks, and certain notification channels like SMS), so confirm pricing in official sources and design alerts to minimize noise and high-volume paging.<\/p>\n\n\n\n<p>Use CloudMonitor when you need a managed, Alibaba Cloud-integrated monitoring baseline. Pair it with SLS for logs and ARMS for deep application tracing when your workloads require broader observability.<\/p>\n\n\n\n<p>Next step: build a production-ready dashboard and a minimal set of actionable alarms for ECS + RDS + SLB\/ALB, then expand into logs (SLS) and tracing (ARMS) as your system grows.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Developer Tools<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2,18],"tags":[],"class_list":["post-106","post","type-post","status-publish","format-standard","hentry","category-alibaba-cloud","category-developer-tools"],"_links":{"self":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/posts\/106","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/comments?post=106"}],"version-history":[{"count":0,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/posts\/106\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/media?parent=106"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/categories?post=106"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/tags?post=106"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}