{"id":962,"date":"2026-04-17T07:17:00","date_gmt":"2026-04-17T07:17:00","guid":{"rendered":"https:\/\/www.devopsschool.com\/tutorials\/oracle-cloud-monitoring-tutorial-architecture-pricing-use-cases-and-hands-on-guide-for-observability-and-management\/"},"modified":"2026-04-17T07:17:00","modified_gmt":"2026-04-17T07:17:00","slug":"oracle-cloud-monitoring-tutorial-architecture-pricing-use-cases-and-hands-on-guide-for-observability-and-management","status":"publish","type":"post","link":"https:\/\/www.devopsschool.com\/tutorials\/oracle-cloud-monitoring-tutorial-architecture-pricing-use-cases-and-hands-on-guide-for-observability-and-management\/","title":{"rendered":"Oracle Cloud Monitoring Tutorial: Architecture, Pricing, Use Cases, and Hands-On Guide for Observability and Management"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">Category<\/h2>\n\n\n\n<p>Observability and Management<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">1. Introduction<\/h2>\n\n\n\n<p>Oracle Cloud Infrastructure (OCI) <strong>Monitoring<\/strong> is the metrics and alerting service in <strong>Oracle Cloud<\/strong> under the <strong>Observability and Management<\/strong> category. It collects time-series metrics from OCI services (and optionally from your applications via custom metrics), lets you explore and query those metrics, and triggers alarms when conditions occur.<\/p>\n\n\n\n<p>In simple terms: <strong>Monitoring tells you what is happening right now (and what happened recently) in your OCI resources<\/strong>\u2014CPU is high, a load balancer is failing health checks, a database is running out of storage, or your application\u2019s error rate has increased\u2014and it can notify your team automatically.<\/p>\n\n\n\n<p>Technically, Monitoring is a <strong>regional<\/strong> metrics platform that stores and serves metrics (service metrics and custom metrics), supports metric query and aggregation, and evaluates <strong>alarms<\/strong> based on metric query rules. Alarms typically deliver notifications through <strong>OCI Notifications<\/strong> (email, SMS in supported regions, HTTPS endpoints, Functions, etc., depending on Notifications capabilities in your tenancy\/region).<\/p>\n\n\n\n<p>Monitoring solves common operational problems:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Detecting outages and performance regressions early<\/li>\n<li>Turning \u201csomeone noticed something is slow\u201d into measurable SLO-driven operations<\/li>\n<li>Reducing mean time to detect (MTTD) and mean time to resolve (MTTR)<\/li>\n<li>Providing evidence for incident timelines and capacity planning inputs<\/li>\n<\/ul>\n\n\n\n<blockquote>\n<p>Service status note: As of this writing, <strong>Monitoring<\/strong> is an active OCI service. OCI also uses the term <strong>Telemetry<\/strong> in some API\/endpoint naming for metrics. Always verify the latest naming and feature scope in the official docs.<\/p>\n<\/blockquote>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">2. What is Monitoring?<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Official purpose<\/h3>\n\n\n\n<p>OCI <strong>Monitoring<\/strong> provides a way to <strong>observe the health, performance, and behavior of resources<\/strong> by collecting and querying <strong>metrics<\/strong>, and to <strong>act<\/strong> on those metrics by configuring <strong>alarms<\/strong> that trigger notifications when conditions are met.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Core capabilities (what you can do)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>View <strong>service metrics<\/strong> emitted automatically by OCI services (for example, compute, networking, load balancers, databases\u2014availability depends on the service).<\/li>\n<li>Publish <strong>custom metrics<\/strong> from your own applications and systems.<\/li>\n<li>Explore metrics using the console (Metric Explorer) and query metrics using APIs\/CLI\/SDKs.<\/li>\n<li>Create <strong>alarms<\/strong> driven by metric queries to detect thresholds, errors, saturation, or absence of signals.<\/li>\n<li>Route alarm notifications via <strong>OCI Notifications<\/strong> topics and subscriptions.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Major components<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Metrics<\/strong><\/li>\n<li><strong>Service metrics<\/strong>: Provided by OCI services.<\/li>\n<li><strong>Custom metrics<\/strong>: You publish metric datapoints to a custom namespace.<\/li>\n<li><strong>Namespaces<\/strong>: Logical grouping of metrics (service namespaces and your custom namespaces).<\/li>\n<li><strong>Dimensions<\/strong>: Key\/value attributes that describe and filter metrics (for example, <code>resourceId<\/code>, <code>availabilityDomain<\/code>, <code>app<\/code>, <code>environment<\/code>).<\/li>\n<li><strong>Metric queries<\/strong>: Queries that aggregate and filter time-series data.<\/li>\n<li><strong>Alarms<\/strong>: Rules that evaluate metric queries and trigger notifications.<\/li>\n<li><strong>Notifications integration<\/strong>: Alarm destinations are typically <strong>Notifications topics<\/strong> (OCI Notifications service).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Service type<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Managed cloud service<\/strong> for metrics storage, querying, and alarm evaluation.<\/li>\n<li>Integrates tightly with other OCI services for metric emission and alerting workflows.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Scope (regional\/global, tenancy\/compartment)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Monitoring is <strong>regional<\/strong>: metrics and alarms are evaluated and stored in the region where they are created and where the emitting resources exist.<\/li>\n<li>Access and organization are <strong>tenancy- and compartment-aware<\/strong> through OCI IAM policies.<\/li>\n<li>Metrics belong to a <strong>compartment<\/strong> (for service metrics, typically the resource\u2019s compartment; for custom metrics, you specify the compartment when posting datapoints).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Fit in the Oracle Cloud ecosystem<\/h3>\n\n\n\n<p>In OCI Observability and Management, Monitoring typically works alongside:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Notifications<\/strong>: Deliver alarm events to people or systems.<\/li>\n<li><strong>Logging<\/strong> \/ <strong>Logging Analytics<\/strong>: Investigate logs related to alarm triggers.<\/li>\n<li><strong>Events<\/strong>: Event-driven automation (separate service; often used with Notifications\/Functions).<\/li>\n<li><strong>APM (Application Performance Monitoring)<\/strong>: Tracing and application-level observability (separate product area).<\/li>\n<li><strong>Dashboards<\/strong>: Build operational dashboards that can visualize metrics (separate OCI dashboard capability; verify your console\u2019s current dashboard offering).<\/li>\n<\/ul>\n\n\n\n<p>Monitoring is the \u201cmetrics + alerts\u201d foundation; other observability services add logs, traces, and deeper analytics.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">3. Why use Monitoring?<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Business reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Reduce downtime cost<\/strong> by detecting failures faster and alerting the right team automatically.<\/li>\n<li><strong>Improve customer experience<\/strong> by catching latency or saturation before it becomes an incident.<\/li>\n<li><strong>Operational accountability<\/strong>: metrics and alarm history provide auditability of incident conditions.<\/li>\n<li><strong>Enable SLO\/SLA reporting inputs<\/strong> (Monitoring provides signals; reporting often requires additional tooling\/process).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Technical reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Built-in service metrics<\/strong>: You often get useful metrics without deploying agents.<\/li>\n<li><strong>Custom metrics support<\/strong>: publish application KPIs (orders\/minute, queue depth, error rate) to the same platform.<\/li>\n<li><strong>Programmatic access<\/strong>: CLI\/SDK\/API lets you automate alarm creation and metric retrieval in CI\/CD and IaC.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Operational reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Centralized alerting<\/strong> using alarms and Notifications topics.<\/li>\n<li><strong>Standardized metric model<\/strong>: namespaces, dimensions, aggregations, and queries.<\/li>\n<li><strong>Faster troubleshooting<\/strong>: correlate metric changes with deployments or infrastructure changes.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Security\/compliance reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>IAM-controlled access<\/strong> to metrics and alarms.<\/li>\n<li>Supports governance patterns (compartment isolation, tagging strategies, least privilege).<\/li>\n<li>Integrates with OCI\u2019s auditing model (actions on alarms\/policies are auditable via OCI Audit; verify details in official docs).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Scalability\/performance reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Managed service scales with your footprint; no need to operate a metrics backend for common cases.<\/li>\n<li>Enables consistent alarms across hundreds\/thousands of resources using dimensions and consistent naming.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">When teams should choose Monitoring<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>You are running <strong>workloads on OCI<\/strong> and want a first-party way to monitor OCI resource health.<\/li>\n<li>You want <strong>basic-to-advanced metric alerting<\/strong> integrated with OCI IAM and Notifications.<\/li>\n<li>You want to publish <strong>custom business metrics<\/strong> without running your own time-series database (or as a complement to one).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">When teams should not choose Monitoring (or should complement it)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>You need <strong>full observability stacks<\/strong> with long retention, complex dashboards, cross-cloud correlation, or deep tracing: consider <strong>APM<\/strong>, <strong>Logging Analytics<\/strong>, or third-party tools, and use Monitoring as a signal source.<\/li>\n<li>You have a mature <strong>Prometheus\/Grafana<\/strong> ecosystem and want to keep a single metrics backend for all environments; you might still use OCI Monitoring for OCI-native alarms or integrate via exporters\/bridges (verify supported integrations in current docs).<\/li>\n<li>You require <strong>very long metric retention<\/strong> or specialized analytics that Monitoring does not provide\u2014verify retention and capabilities in official docs.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">4. Where is Monitoring used?<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Industries<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>SaaS and technology (platform reliability, SLO monitoring)<\/li>\n<li>Finance (availability and latency monitoring with strict change control)<\/li>\n<li>Retail\/e-commerce (traffic and order pipeline metrics)<\/li>\n<li>Healthcare (system health, audit-driven operations)<\/li>\n<li>Manufacturing\/IoT backends (telemetry aggregation signals\u2014often combined with Streaming and custom metrics)<\/li>\n<li>Education and public sector (cost-controlled baseline monitoring)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Team types<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>DevOps and SRE teams (incident response, on-call, automation)<\/li>\n<li>Platform engineering (golden alarms, baseline dashboards, tenancy governance)<\/li>\n<li>Cloud operations\/NOC teams (central alarm routing and triage)<\/li>\n<li>Security and compliance teams (monitoring critical controls and availability signals)<\/li>\n<li>Application teams (custom metrics + alerting for app KPIs)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Workloads<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>OCI Compute instances and autoscaling groups<\/li>\n<li>Containerized workloads (OKE\/Kubernetes\u2014often combined with Prometheus, verify current OCI observability options)<\/li>\n<li>API backends behind OCI Load Balancers<\/li>\n<li>Databases (Autonomous Database or DB systems\u2014service metrics)<\/li>\n<li>Event-driven\/serverless (Functions, Streaming-based pipelines)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Architectures<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Single-region production with local alarms and notifications<\/li>\n<li>Multi-compartment multi-environment setups (dev\/test\/prod)<\/li>\n<li>Multi-region DR: regional alarms per region, centralized incident routing (often via shared Notifications integrations)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Real-world deployment contexts<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Production<\/strong>: alarms must be tuned (avoid noise), integrated with incident management, and use strong IAM boundaries.<\/li>\n<li><strong>Dev\/test<\/strong>: fewer alarms, shorter retention needs, focus on validating metrics and alarm logic; keep costs low by minimizing custom metric cardinality and ingestion volume.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">5. Top Use Cases and Scenarios<\/h2>\n\n\n\n<p>Below are realistic scenarios where Oracle Cloud Monitoring fits well.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1) Compute CPU saturation alarm<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> An application VM becomes CPU-bound and starts timing out.<\/li>\n<li><strong>Why Monitoring fits:<\/strong> OCI emits compute-related metrics; alarms can detect sustained high utilization.<\/li>\n<li><strong>Scenario:<\/strong> Trigger an alarm when CPU utilization exceeds a threshold for 10 minutes and notify on-call.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">2) Memory pressure detection (when available via agent\/service metrics)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Instances fail due to OOM or swapping, but CPU looks fine.<\/li>\n<li><strong>Why Monitoring fits:<\/strong> Some memory metrics may be available via OCI agents\/plugins depending on OS and configuration (verify in official docs).<\/li>\n<li><strong>Scenario:<\/strong> Alarm on memory utilization or swap usage to act before incidents.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">3) Load balancer backend health degradation<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Backends become unhealthy and traffic errors increase.<\/li>\n<li><strong>Why Monitoring fits:<\/strong> Load balancer services typically emit health\/HTTP metrics; alarms can detect unhealthy backend count or error rate.<\/li>\n<li><strong>Scenario:<\/strong> Alarm when unhealthy backends &gt; 0 for 5 minutes; notify and trigger an automated remediation runbook.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">4) Autonomous Database storage or CPU threshold alerting<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Database resources approach limits; performance degrades.<\/li>\n<li><strong>Why Monitoring fits:<\/strong> OCI database services emit service metrics; alarms can notify proactively.<\/li>\n<li><strong>Scenario:<\/strong> Alarm when storage used exceeds a percentage or when CPU is consistently high.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">5) Custom business KPI: orders per minute<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Infrastructure is \u201cgreen\u201d but business throughput drops.<\/li>\n<li><strong>Why Monitoring fits:<\/strong> Custom metrics allow app-level KPIs, enabling operational alerting on business impact.<\/li>\n<li><strong>Scenario:<\/strong> Publish <code>orders_processed<\/code> metric; alarm if it drops below baseline during peak hours.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6) Custom metric: queue depth \/ lag for data pipelines<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Consumers fall behind; processing latency increases.<\/li>\n<li><strong>Why Monitoring fits:<\/strong> Custom metrics can represent queue depth, lag, or backlog.<\/li>\n<li><strong>Scenario:<\/strong> Alarm if backlog exceeds threshold; notify data engineering.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">7) Detect \u201csilence\u201d (absence of expected metrics)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> A scheduled job stops running; no failures are logged centrally.<\/li>\n<li><strong>Why Monitoring fits:<\/strong> Alarms can be built around missing signals (depending on supported query patterns; verify in official docs).<\/li>\n<li><strong>Scenario:<\/strong> Publish a heartbeat metric; alarm if no datapoints are received in a window.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">8) Multi-compartment operational guardrails<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Different teams deploy resources inconsistently, leading to monitoring gaps.<\/li>\n<li><strong>Why Monitoring fits:<\/strong> Standard alarm patterns can be applied per compartment, with IAM controls and standardized notification routing.<\/li>\n<li><strong>Scenario:<\/strong> Platform team provides Terraform modules that create baseline alarms for new workloads.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">9) Capacity trending inputs<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> You need capacity data to plan scale-ups.<\/li>\n<li><strong>Why Monitoring fits:<\/strong> Metric history supports trend views; export via API to external analytics if needed.<\/li>\n<li><strong>Scenario:<\/strong> Pull CPU\/memory\/network metrics regularly to a data lake for forecasting.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">10) Incident correlation with logs and deployments<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Alert fires; you need fast root cause.<\/li>\n<li><strong>Why Monitoring fits:<\/strong> Monitoring provides the \u201csignal\u201d; you correlate with OCI Logging\/Logging Analytics and your CI\/CD deployment timeline.<\/li>\n<li><strong>Scenario:<\/strong> Alarm triggers; on-call checks logs for the same time window and compares to last deployment.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">11) SLA monitoring at the edge (combined design)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Need external availability checks.<\/li>\n<li><strong>Why Monitoring fits:<\/strong> Monitoring can ingest custom results (for example, synthetic check results posted as custom metrics) or be combined with OCI Health Checks (separate service).<\/li>\n<li><strong>Scenario:<\/strong> Synthetic probe posts <code>api_availability<\/code> and <code>latency_ms<\/code> metrics; alarms notify if availability drops.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">12) Security operations signals (availability and misconfig indicators)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> You want to detect unusual spikes (traffic, errors) that may indicate abuse.<\/li>\n<li><strong>Why Monitoring fits:<\/strong> Alarms on network\/edge metrics can be an early indicator; integrate with security workflows.<\/li>\n<li><strong>Scenario:<\/strong> Alarm on sudden surge of 4xx\/5xx responses; notify security\/on-call for investigation.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">6. Core Features<\/h2>\n\n\n\n<blockquote>\n<p>Feature availability can vary by region and by OCI service integration. Verify specifics in the official docs for your region and tenancy.<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\">6.1 Service metrics (OCI-provided metrics)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Collects and stores metrics emitted automatically by OCI services.<\/li>\n<li><strong>Why it matters:<\/strong> You can monitor core infrastructure health without deploying a metrics pipeline.<\/li>\n<li><strong>Practical benefit:<\/strong> Fast setup\u2014go from \u201cno metrics\u201d to dashboards\/alarms quickly.<\/li>\n<li><strong>Caveats:<\/strong> Metric names, dimensions, and availability differ by service; some signals require enabling agents\/plugins or service-specific settings.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6.2 Custom metrics (publish your own datapoints)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Lets you publish time-series datapoints to Monitoring under a custom namespace.<\/li>\n<li><strong>Why it matters:<\/strong> Enables application-level and business-level observability.<\/li>\n<li><strong>Practical benefit:<\/strong> You can alert on KPIs (orders\/minute, queue depth) not visible via infrastructure metrics.<\/li>\n<li><strong>Caveats:<\/strong> Custom metrics can introduce cost and complexity\u2014especially high-cardinality dimensions (many unique dimension values).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6.3 Namespaces, metrics, dimensions model<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Organizes metrics by namespace and describes series by dimensions.<\/li>\n<li><strong>Why it matters:<\/strong> Dimension filters are how you isolate metrics for specific resources, apps, environments, or tenants.<\/li>\n<li><strong>Practical benefit:<\/strong> Scales monitoring patterns; one metric name can cover many resources.<\/li>\n<li><strong>Caveats:<\/strong> Too many unique dimension combinations can explode the number of time series and increase cost\/limits consumption.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6.4 Metric Explorer (console visualization)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Interactive browsing, filtering, and charting of metrics in the OCI Console.<\/li>\n<li><strong>Why it matters:<\/strong> Great for quick investigation and validating that metrics are flowing.<\/li>\n<li><strong>Practical benefit:<\/strong> Reduce time to diagnose and validate alarm conditions.<\/li>\n<li><strong>Caveats:<\/strong> For advanced dashboards and long-term views, you may need dedicated dashboards tooling (OCI dashboards or external tools\u2014verify current options).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6.5 Metric Query Language (MQL) for alarms and queries<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Lets you define how to aggregate and evaluate metric data over time windows (for example, mean CPU over 5 minutes).<\/li>\n<li><strong>Why it matters:<\/strong> Alarm correctness depends on correct query design (window, aggregation, filters).<\/li>\n<li><strong>Practical benefit:<\/strong> Detect sustained issues rather than spikes; reduce alert noise.<\/li>\n<li><strong>Caveats:<\/strong> Query syntax and functions are specific to OCI; confirm supported syntax and patterns in official MQL documentation.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6.6 Alarms (metric-based alert rules)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Evaluates metric queries and transitions alarm states when conditions are met.<\/li>\n<li><strong>Why it matters:<\/strong> Alarms are your automation boundary\u2014turn metrics into action.<\/li>\n<li><strong>Practical benefit:<\/strong> Detect incidents proactively and consistently.<\/li>\n<li><strong>Caveats:<\/strong> Poorly tuned alarms cause fatigue; missing dimension filters can alert on the wrong resource(s).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6.7 Notifications integration (alarm destinations)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Sends alarm notifications to OCI Notifications topics; subscribers receive messages (email, HTTPS, etc., depending on configuration).<\/li>\n<li><strong>Why it matters:<\/strong> Separates \u201calarm evaluation\u201d from \u201cmessage delivery,\u201d enabling fan-out and routing.<\/li>\n<li><strong>Practical benefit:<\/strong> One alarm can notify multiple teams\/systems through topic subscriptions.<\/li>\n<li><strong>Caveats:<\/strong> Email subscriptions require confirmation; delivery formats and endpoints must be validated.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6.8 Alarm history and state transitions<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Tracks when alarms fire, clear, and change state.<\/li>\n<li><strong>Why it matters:<\/strong> Helps reconstruct incidents and validate alarm tuning.<\/li>\n<li><strong>Practical benefit:<\/strong> Post-incident reviews can use alarm timestamps to correlate events.<\/li>\n<li><strong>Caveats:<\/strong> Historical depth and retention for alarm history should be verified in official docs.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6.9 API\/CLI\/SDK access (automation)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Programmatically publish metrics, query metrics, and manage alarms.<\/li>\n<li><strong>Why it matters:<\/strong> Enables IaC and GitOps patterns for monitoring configuration.<\/li>\n<li><strong>Practical benefit:<\/strong> Repeatable, reviewable monitoring changes across environments.<\/li>\n<li><strong>Caveats:<\/strong> Requires careful IAM design and secrets handling for automation credentials.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6.10 Compartment-aware governance<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Uses OCI compartments as a security and management boundary for metrics and alarms.<\/li>\n<li><strong>Why it matters:<\/strong> Large organizations need isolation between teams\/environments.<\/li>\n<li><strong>Practical benefit:<\/strong> Separate prod vs non-prod alarms, limit who can modify alarms.<\/li>\n<li><strong>Caveats:<\/strong> Cross-compartment visibility requires explicit IAM policies.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">7. Architecture and How It Works<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">High-level service architecture<\/h3>\n\n\n\n<p>At a high level:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Metrics are emitted<\/strong> either:\n   &#8211; Automatically by OCI services (service metrics), or\n   &#8211; Explicitly by your code\/automation (custom metrics API).<\/li>\n<li><strong>Monitoring stores metrics<\/strong> as time-series keyed by namespace + metric name + dimension set.<\/li>\n<li>Users and tools <strong>query and visualize metrics<\/strong> in the console or through API\/CLI\/SDK.<\/li>\n<li><strong>Alarms evaluate metric queries<\/strong> on a schedule.<\/li>\n<li>When an alarm condition is met, Monitoring <strong>publishes a message<\/strong> to an <strong>OCI Notifications<\/strong> topic.<\/li>\n<li>Notifications <strong>delivers<\/strong> to configured subscriptions (email, HTTPS, Functions, etc., depending on Notifications support).<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Request\/data\/control flow<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Data plane<\/strong><\/li>\n<li>Datapoints flow into Monitoring (service-emitted or posted).<\/li>\n<li>Alarms evaluate stored datapoints.<\/li>\n<li><strong>Control plane<\/strong><\/li>\n<li>IAM policies determine who can read metrics, post custom metrics, and manage alarms.<\/li>\n<li>Console\/CLI\/SDK calls create\/update\/delete alarms, query metrics.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Integrations with related services<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>OCI Notifications<\/strong>: alarm destinations for alert delivery.<\/li>\n<li><strong>OCI Logging<\/strong>: investigate logs during alarm events.<\/li>\n<li><strong>OCI Events<\/strong>: often used for automation patterns (not required for Monitoring itself).<\/li>\n<li><strong>OCI Functions \/ Streaming<\/strong>: frequently used as downstream targets through Notifications or event-driven pipelines.<\/li>\n<li><strong>Terraform \/ Resource Manager<\/strong>: manage alarms and topics as code (verify provider resource names in current Terraform docs).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Dependency services<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>OCI IAM<\/strong>: authentication\/authorization.<\/li>\n<li><strong>OCI Notifications<\/strong>: if you want delivered alerts.<\/li>\n<li><strong>The monitored OCI services<\/strong>: compute, networking, database, etc., for service metrics.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Security\/authentication model<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Requests are authenticated using OCI IAM:<\/li>\n<li>Console sessions (federated or local users)<\/li>\n<li>API signing keys (for CLI\/SDK)<\/li>\n<li>Instance Principals \/ Resource Principals (for workloads posting custom metrics\u2014verify best practice for your architecture)<\/li>\n<li>Authorization is controlled by IAM policies at tenancy\/compartment scope.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Networking model<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The Monitoring API endpoints are OCI regional service endpoints.<\/li>\n<li>From within OCI, you may use public endpoints or private access patterns depending on your network design (for example, using NAT\/Service Gateway patterns\u2014verify current OCI guidance for accessing public OCI services privately).<\/li>\n<li>For notifications delivered to HTTPS endpoints, ensure your endpoint is reachable and secured (TLS, auth).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Monitoring\/logging\/governance considerations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Monitoring is itself an operational control; treat alarm configuration as production code:<\/li>\n<li>version control alarm definitions (Terraform)<\/li>\n<li>least privilege IAM<\/li>\n<li>standardized naming\/tags<\/li>\n<li>Use <strong>OCI Audit<\/strong> to track changes to alarm configuration and IAM policies (verify audit event coverage in official docs).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Simple architecture diagram (Mermaid)<\/h3>\n\n\n\n<pre><code class=\"language-mermaid\">flowchart LR\n  A[OCI Resource&lt;br\/&gt;Compute \/ LB \/ DB] --&gt;|Service metrics| M[OCI Monitoring]\n  C[App \/ Script] --&gt;|Post custom metrics| M\n  U[Engineer \/ SRE] --&gt;|Query &amp; charts| M\n  M --&gt;|Alarm triggers| N[OCI Notifications Topic]\n  N --&gt; E[Email \/ SMS \/ HTTPS \/ Function&lt;br\/&gt;(per Notifications subscriptions)]\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Production-style architecture diagram (Mermaid)<\/h3>\n\n\n\n<pre><code class=\"language-mermaid\">flowchart TB\n  subgraph Tenancy[\"Oracle Cloud Tenancy\"]\n    subgraph Compartments[\"Compartments: prod \/ nonprod \/ shared\"]\n      subgraph Prod[\"Prod Compartment\"]\n        W1[OKE \/ Compute Apps]\n        LB[Load Balancer]\n        DB[(Database Service)]\n      end\n\n      subgraph Shared[\"Shared Ops Compartment\"]\n        MON[Monitoring&lt;br\/&gt;Metrics + Alarms]\n        TOPIC[Notifications Topic(s)]\n      end\n    end\n  end\n\n  W1 --&gt;|Custom metrics (KPI, errors)| MON\n  LB --&gt;|Service metrics| MON\n  DB --&gt;|Service metrics| MON\n\n  MON --&gt;|Alarm messages| TOPIC\n  TOPIC --&gt; ONCALL[On-call Email\/ChatOps Gateway]\n  TOPIC --&gt; WEBHOOK[HTTPS Webhook&lt;br\/&gt;Incident Mgmt \/ SOAR]\n  TOPIC --&gt; FN[OCI Function&lt;br\/&gt;Auto-remediation]\n\n  ONCALL --&gt; RUNBOOK[Runbooks + Dashboards + Logs]\n  WEBHOOK --&gt; RUNBOOK\n  FN --&gt; RUNBOOK\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">8. Prerequisites<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Tenancy and account requirements<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>An active <strong>Oracle Cloud<\/strong> tenancy with permission to use Observability and Management services.<\/li>\n<li>A user account (or federated identity) with rights to create\/read Monitoring resources.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Permissions \/ IAM policies<\/h3>\n\n\n\n<p>You need IAM permissions for at least:\n&#8211; Reading metrics (to explore metrics)\n&#8211; Managing alarms (to create alarm rules)\n&#8211; Posting custom metrics (for the hands-on lab)\n&#8211; Managing Notifications topics\/subscriptions (to receive alarm messages)<\/p>\n\n\n\n<p>OCI IAM policies are expressed in human-readable statements. Exact policy verbs and resource families can vary; <strong>verify the latest Monitoring and Notifications IAM policy examples in the official docs<\/strong>.<\/p>\n\n\n\n<p>Typical patterns include:\n&#8211; Allow a group to manage alarms in a compartment\n&#8211; Allow a group to read\/use metrics in a compartment\n&#8211; Allow a group to manage topics\/subscriptions in a compartment\n&#8211; If posting custom metrics from automation: allow that principal to post metrics<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Billing requirements<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Service metrics and basic alarm usage may be included or have minimal direct cost, but <strong>custom metrics ingestion\/storage<\/strong> and downstream services may incur charges depending on your usage and region.<\/li>\n<li>You need a payment method configured if you plan to exceed Always Free or free allocations (verify current free tier details).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Tools<\/h3>\n\n\n\n<p>For the hands-on tutorial, you can use:\n&#8211; <strong>OCI Console<\/strong> (web UI)\n&#8211; <strong>OCI CLI<\/strong> for posting custom metrics and optional validation<br\/>\n  CLI docs: https:\/\/docs.oracle.com\/en-us\/iaas\/Content\/API\/Concepts\/cliconcepts.htm<\/p>\n\n\n\n<p>Optional:\n&#8211; Terraform (for IaC)\n&#8211; SDKs (Python\/Java\/Go\/etc.) for programmatic posting and querying<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Region availability<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Monitoring is available in OCI commercial regions and many other OCI regions, but availability can vary.<\/li>\n<li>Confirm on the OCI region\/service availability pages (verify in official docs).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Quotas\/limits<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Limits exist for alarms, metric ingestion, namespaces, and API rates.<\/li>\n<li>Check <strong>OCI Service Limits<\/strong> and <strong>Quotas<\/strong> for Monitoring and Notifications in your tenancy (Console: Governance &amp; Administration \u2192 Limits\/Quotas; exact navigation may vary).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Prerequisite services<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>OCI Notifications<\/strong> is required if you want alarms to send messages to email\/webhooks.<\/li>\n<li>No compute resources are strictly required to try custom metrics (you can post from your local machine using OCI CLI).<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">9. Pricing \/ Cost<\/h2>\n\n\n\n<blockquote>\n<p>Do not treat this section as a quote. OCI pricing is region-dependent and can change. Always confirm in official pricing pages and your tenancy\u2019s rate card.<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\">Official pricing sources<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>OCI price list (Observability and Management): https:\/\/www.oracle.com\/cloud\/price-list\/#observability-and-management  <\/li>\n<li>OCI Cost Estimator: https:\/\/www.oracle.com\/cloud\/costestimator.html  <\/li>\n<li>OCI Free Tier overview: https:\/\/www.oracle.com\/cloud\/free\/<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Pricing dimensions (how costs are commonly determined)<\/h3>\n\n\n\n<p>OCI Monitoring cost typically depends on factors such as:\n&#8211; <strong>Custom metrics ingestion<\/strong>: how many datapoints you publish (frequency \u00d7 number of time series).\n&#8211; <strong>Custom metrics storage\/retention<\/strong>: how long datapoints are retained (if priced separately; verify current model).\n&#8211; <strong>API requests<\/strong>: heavy querying\/exporting can have request costs or rate limits (verify).\n&#8211; <strong>Alarms<\/strong>: some providers price per alarm or per evaluation; OCI\u2019s model must be confirmed in the official price list for your region.\n&#8211; <strong>Notifications delivery<\/strong>: topic usage and delivery endpoints may have their own pricing dimensions (verify Notifications pricing).<\/p>\n\n\n\n<p>Service metrics emitted by OCI services are often available without a separate \u201cingestion charge,\u201d but <strong>your overall bill still includes the monitored services<\/strong> (compute, database, networking, etc.). Treat Monitoring as a cost multiplier only when you add custom metrics, heavy queries, long retention requirements, or downstream integrations.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Free tier (if applicable)<\/h3>\n\n\n\n<p>OCI has a Free Tier; Monitoring and Notifications may have Always Free components or free allocations. <strong>Verify current Always Free limits for Monitoring and Notifications<\/strong> on the official Free Tier pages and the price list.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Main cost drivers<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>High-cardinality custom metrics<\/strong>\n   &#8211; Example: dimensions include <code>userId<\/code>, <code>requestId<\/code>, or <code>podUid<\/code> for thousands of unique values.\n   &#8211; Result: explosive growth in time series count and datapoints.<\/li>\n<li><strong>High-frequency datapoints<\/strong>\n   &#8211; Publishing every 1 second instead of every 60 seconds increases ingestion by 60\u00d7.<\/li>\n<li><strong>Many environments<\/strong>\n   &#8211; Dev\/test\/prod each with its own custom metrics and alarms.<\/li>\n<li><strong>Exporting\/reading at scale<\/strong>\n   &#8211; Frequent dashboards, external exporters, and API-based polling.<\/li>\n<li><strong>Downstream alert delivery<\/strong>\n   &#8211; Notifications to many endpoints and high alert volume.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Hidden or indirect costs<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Compute\/network costs<\/strong> for any collectors\/agents you run to generate custom metrics.<\/li>\n<li><strong>Data egress<\/strong> if you send notifications to external systems or export metrics to external locations (depends on architecture; verify OCI data transfer pricing).<\/li>\n<li><strong>Operational overhead<\/strong>: alert fatigue costs real engineering time.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Network\/data transfer implications<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Posting custom metrics from outside OCI uses public endpoints; your local network egress is on your side; OCI ingress is typically not charged but verify.<\/li>\n<li>Sending alerts to external HTTPS endpoints could involve <strong>OCI egress<\/strong> from the Notifications service path (verify).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">How to optimize cost<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Prefer <strong>service metrics<\/strong> when available.<\/li>\n<li>For custom metrics:<\/li>\n<li>Use <strong>low cardinality dimensions<\/strong> (environment, app, region) rather than per-user identifiers.<\/li>\n<li>Publish at the <strong>lowest frequency<\/strong> that meets your alerting needs (often 1 minute).<\/li>\n<li>Aggregate upstream when possible (publish counts\/sums rather than raw event-per-request).<\/li>\n<li>Reduce alert noise: fewer triggered alarms reduces downstream delivery volume and operational load.<\/li>\n<li>Use compartments and tagging to track cost by team\/app.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Example low-cost starter estimate (no fabricated numbers)<\/h3>\n\n\n\n<p>A low-cost starter approach typically looks like:\n&#8211; Use <strong>service metrics<\/strong> for compute\/load balancer\/database.\n&#8211; Add a <strong>small number of custom metrics<\/strong> (single namespace, a few metric names, 1-minute resolution) for key KPIs.\n&#8211; Create <strong>a handful of alarms<\/strong> (CPU high, LB backend unhealthy, KPI drop) routed to one Notifications topic.<\/p>\n\n\n\n<p>To estimate:\n1. Determine custom metric datapoints per month:\n   &#8211; datapoints = (time series count) \u00d7 (datapoints per minute) \u00d7 (minutes per month)\n2. Plug datapoints into the Monitoring price dimension for your region in the price list.\n3. Add Notifications delivery estimates if applicable.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Example production cost considerations<\/h3>\n\n\n\n<p>In production, costs can rise due to:\n&#8211; Many microservices each emitting multiple custom metrics\n&#8211; Multiple clusters\/regions and per-team compartments\n&#8211; Extensive dashboards, exports, and third-party integrations\n&#8211; Alert storms causing high message volume downstream<\/p>\n\n\n\n<p>For production, formalize:\n&#8211; A metric taxonomy and dimension policy\n&#8211; A custom metrics budget per team\n&#8211; Automated checks in CI to prevent high-cardinality dimensions<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">10. Step-by-Step Hands-On Tutorial<\/h2>\n\n\n\n<p>This lab creates a complete \u201cmetrics \u2192 alarm \u2192 notification\u201d flow using <strong>Oracle Cloud Monitoring<\/strong> and <strong>OCI Notifications<\/strong>. It uses <strong>custom metrics<\/strong> so you can complete the tutorial without provisioning compute resources.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Objective<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Post a custom metric datapoint into <strong>Oracle Cloud Monitoring<\/strong><\/li>\n<li>Visualize it in the OCI Console<\/li>\n<li>Create an alarm that triggers based on the metric<\/li>\n<li>Receive an email notification through <strong>OCI Notifications<\/strong><\/li>\n<li>Clean up all created resources<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Lab Overview<\/h3>\n\n\n\n<p>You will:\n1. Create (or choose) a compartment for the lab.\n2. Create a Notifications topic and email subscription.\n3. Configure OCI CLI (if not already configured).\n4. Post custom metric datapoints to Monitoring.\n5. Create an alarm on that custom metric and route it to the topic.\n6. Trigger the alarm and validate the email notification.\n7. Clean up.<\/p>\n\n\n\n<blockquote>\n<p>Expected time: ~30\u201360 minutes (depends mostly on email subscription confirmation).<\/p>\n<\/blockquote>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 1: Choose or create a compartment for the lab<\/h3>\n\n\n\n<p><strong>Console steps<\/strong>\n1. Open the OCI Console.\n2. Navigate to <strong>Identity &amp; Security \u2192 Compartments<\/strong>.\n3. Either:\n   &#8211; Select an existing non-production compartment you can use, or\n   &#8211; Create a new compartment, for example: <code>lab-monitoring<\/code><\/p>\n\n\n\n<p><strong>Expected outcome<\/strong>\n&#8211; You have a compartment OCID available for later steps.<\/p>\n\n\n\n<p><strong>Verification<\/strong>\n&#8211; In the compartment details page, copy:\n  &#8211; <strong>Compartment OCID<\/strong>\n  &#8211; Confirm it shows as <strong>Active<\/strong><\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 2: Ensure you have the required IAM permissions<\/h3>\n\n\n\n<p>To complete the lab, your user\/group needs permissions for:\n&#8211; Monitoring metrics (post\/read)\n&#8211; Monitoring alarms (create\/manage)\n&#8211; Notifications topics\/subscriptions (create\/manage)<\/p>\n\n\n\n<p>If you do not have admin access, ask your tenancy administrator to grant you the minimum required permissions. OCI policies vary by org; use official examples and least privilege.<\/p>\n\n\n\n<p><strong>Where to verify<\/strong>\n&#8211; Official Monitoring docs (IAM\/policies section): https:\/\/docs.oracle.com\/en-us\/iaas\/Content\/Monitoring\/home.htm<br\/>\n&#8211; Official Notifications docs (IAM\/policies section): https:\/\/docs.oracle.com\/en-us\/iaas\/Content\/Notification\/home.htm<\/p>\n\n\n\n<p><strong>Expected outcome<\/strong>\n&#8211; You can create a topic, create an alarm, and post custom metrics without authorization errors.<\/p>\n\n\n\n<p><strong>Common error<\/strong>\n&#8211; <code>NotAuthorizedOrNotFound<\/code> when creating alarms or posting metrics: usually missing IAM policy or using the wrong compartment.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 3: Create a Notifications topic and email subscription<\/h3>\n\n\n\n<p>Alarms typically publish to a Notifications topic. Subscriptions deliver messages to endpoints such as email.<\/p>\n\n\n\n<p><strong>Console steps<\/strong>\n1. Navigate to <strong>Observability &amp; Management \u2192 Notifications<\/strong> (in some consoles: <strong>Developer Services \u2192 Notifications<\/strong>).\n2. Select your lab compartment.\n3. Click <strong>Create Topic<\/strong>\n   &#8211; Name: <code>lab-monitoring-topic<\/code>\n   &#8211; (Optional) Description: <code>Alarm notifications for Monitoring lab<\/code>\n4. After the topic is created, open it and click <strong>Create Subscription<\/strong>\n   &#8211; Protocol: <code>EMAIL<\/code>\n   &#8211; Email: your email address\n5. Check your inbox for the confirmation email and confirm the subscription.<\/p>\n\n\n\n<p><strong>Expected outcome<\/strong>\n&#8211; A topic exists and the subscription is in <strong>Confirmed<\/strong> state.<\/p>\n\n\n\n<p><strong>Verification<\/strong>\n&#8211; In the topic\u2019s subscription list, confirm:\n  &#8211; Protocol: EMAIL\n  &#8211; Lifecycle: Confirmed (or similar state wording)<\/p>\n\n\n\n<p><strong>Common error<\/strong>\n&#8211; Subscription stays pending: check spam\/junk folder; resubmit the subscription if needed.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 4: Set up OCI CLI (local machine)<\/h3>\n\n\n\n<p>If you already use OCI CLI, you can skip to Step 5.<\/p>\n\n\n\n<p><strong>Install and configure<\/strong>\n&#8211; OCI CLI install docs: https:\/\/docs.oracle.com\/en-us\/iaas\/Content\/API\/Concepts\/cliconcepts.htm<\/p>\n\n\n\n<p>After installing, configure:<\/p>\n\n\n\n<pre><code class=\"language-bash\">oci setup config\n<\/code><\/pre>\n\n\n\n<p>You will be prompted for:\n&#8211; Tenancy OCID\n&#8211; User OCID\n&#8211; Region (for example <code>us-ashburn-1<\/code>)\n&#8211; Path for config and keys<\/p>\n\n\n\n<p><strong>Expected outcome<\/strong>\n&#8211; You have <code>~\/.oci\/config<\/code> created (or equivalent on Windows).\n&#8211; <code>oci<\/code> commands run successfully.<\/p>\n\n\n\n<p><strong>Verification<\/strong>\nRun:<\/p>\n\n\n\n<pre><code class=\"language-bash\">oci iam region list --output table\n<\/code><\/pre>\n\n\n\n<p>If authentication is correct, you will see a table of regions.<\/p>\n\n\n\n<p><strong>Common errors and fixes<\/strong>\n&#8211; <code>Failed to verify the SSL certificate<\/code>: update CA certificates or corporate proxy settings.\n&#8211; <code>NotAuthorizedOrNotFound<\/code>: wrong region\/OCID or missing IAM policy.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 5: Post a custom metric datapoint to Monitoring<\/h3>\n\n\n\n<p>Now you will publish a custom metric called <code>orders_processed<\/code> in namespace <code>lab_metrics<\/code>.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">5.1 Gather required IDs<\/h4>\n\n\n\n<p>You need:\n&#8211; The <strong>compartment OCID<\/strong> for your lab compartment (from Step 1)<\/p>\n\n\n\n<p>Set it in your shell:<\/p>\n\n\n\n<pre><code class=\"language-bash\">export COMPARTMENT_OCID=\"ocid1.compartment.oc1..exampleuniqueID\"\n<\/code><\/pre>\n\n\n\n<p>Also set a timestamp:<\/p>\n\n\n\n<pre><code class=\"language-bash\">export TS=\"$(date -u +\"%Y-%m-%dT%H:%M:%SZ\")\"\necho \"$TS\"\n<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\">5.2 Create the metric payload file<\/h4>\n\n\n\n<p>Create a file named <code>metric_data.json<\/code>:<\/p>\n\n\n\n<pre><code class=\"language-bash\">cat &gt; metric_data.json &lt;&lt;EOF\n[\n  {\n    \"namespace\": \"lab_metrics\",\n    \"compartmentId\": \"${COMPARTMENT_OCID}\",\n    \"name\": \"orders_processed\",\n    \"dimensions\": {\n      \"app\": \"demo-store\",\n      \"environment\": \"lab\"\n    },\n    \"datapoints\": [\n      {\n        \"timestamp\": \"${TS}\",\n        \"value\": 1\n      }\n    ]\n  }\n]\nEOF\n<\/code><\/pre>\n\n\n\n<blockquote>\n<p>JSON structure note: OCI CLI expects a list of metric objects for <code>--metric-data<\/code>. If the CLI interface changes, <strong>verify the exact payload format in current CLI docs<\/strong> for <code>oci monitoring metric-data post<\/code>.<\/p>\n<\/blockquote>\n\n\n\n<h4 class=\"wp-block-heading\">5.3 Post the metric<\/h4>\n\n\n\n<pre><code class=\"language-bash\">oci monitoring metric-data post --metric-data file:\/\/metric_data.json\n<\/code><\/pre>\n\n\n\n<p><strong>Expected outcome<\/strong>\n&#8211; The command returns a response indicating the datapoints were accepted (look for a successful HTTP status\/response).<\/p>\n\n\n\n<p><strong>Verification (console)<\/strong>\n1. Go to <strong>Observability &amp; Management \u2192 Monitoring \u2192 Metrics Explorer<\/strong> (naming may vary slightly).\n2. Select the region and compartment.\n3. Choose namespace: <code>lab_metrics<\/code>\n4. Find metric name: <code>orders_processed<\/code>\n5. Filter dimensions:\n   &#8211; <code>app=demo-store<\/code>\n   &#8211; <code>environment=lab<\/code>\n6. Set the time window to \u201cLast 5\u201315 minutes\u201d and confirm you see the datapoint.<\/p>\n\n\n\n<p><strong>Common errors<\/strong>\n&#8211; <code>InvalidParameter<\/code> or payload errors: JSON format mismatch; re-check commas\/quotes and consult CLI reference.\n&#8211; No datapoint appears: confirm you\u2019re viewing the same <strong>region<\/strong> and <strong>compartment<\/strong> you posted to, and widen time range.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 6: Create an alarm on the custom metric<\/h3>\n\n\n\n<p>Now you\u2019ll create an alarm that fires when <code>orders_processed<\/code> is greater than or equal to 10 (we will post a value of 10+ to trigger it).<\/p>\n\n\n\n<p><strong>Console steps<\/strong>\n1. Navigate to <strong>Observability &amp; Management \u2192 Monitoring \u2192 Alarms<\/strong>.\n2. Select the lab compartment.\n3. Click <strong>Create Alarm<\/strong>.\n4. Set:\n   &#8211; <strong>Alarm name:<\/strong> <code>lab-orders-processed-alarm<\/code>\n   &#8211; <strong>Metric namespace:<\/strong> <code>lab_metrics<\/code>\n   &#8211; <strong>Metric name:<\/strong> <code>orders_processed<\/code>\n   &#8211; <strong>Dimensions:<\/strong> <code>app=demo-store<\/code>, <code>environment=lab<\/code> (so the alarm is scoped)\n   &#8211; <strong>Statistic\/Aggregation:<\/strong> choose an appropriate aggregation for your use case (for example, <code>max<\/code> or <code>sum<\/code>).\n   &#8211; <strong>Trigger rule:<\/strong> threshold <code>&gt;= 10<\/code>\n   &#8211; <strong>Evaluation window \/ interval:<\/strong> choose defaults or a short evaluation window for the lab (exact UI wording varies).\n5. <strong>Destination<\/strong>\n   &#8211; Choose <strong>Notifications topic<\/strong>: <code>lab-monitoring-topic<\/code>\n6. Create the alarm.<\/p>\n\n\n\n<blockquote>\n<p>Alarm query note: OCI uses a metric query language for alarms. If the console shows the underlying query, review it carefully and confirm dimension filters are applied. For MQL syntax and best practices, verify in official docs.<\/p>\n<\/blockquote>\n\n\n\n<p><strong>Expected outcome<\/strong>\n&#8211; Alarm is created and appears in the Alarms list.\n&#8211; Initial state is typically OK\/No data (depends on your datapoints and evaluation settings).<\/p>\n\n\n\n<p><strong>Verification<\/strong>\n&#8211; Open the alarm details and confirm:\n  &#8211; Destination topic is correct\n  &#8211; Metric namespace\/name and dimensions match your posted datapoints<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 7: Trigger the alarm by posting a higher datapoint<\/h3>\n\n\n\n<p>Update the timestamp and value and post again.<\/p>\n\n\n\n<pre><code class=\"language-bash\">export TS=\"$(date -u +\"%Y-%m-%dT%H:%M:%SZ\")\"\n\ncat &gt; metric_data.json &lt;&lt;EOF\n[\n  {\n    \"namespace\": \"lab_metrics\",\n    \"compartmentId\": \"${COMPARTMENT_OCID}\",\n    \"name\": \"orders_processed\",\n    \"dimensions\": {\n      \"app\": \"demo-store\",\n      \"environment\": \"lab\"\n    },\n    \"datapoints\": [\n      {\n        \"timestamp\": \"${TS}\",\n        \"value\": 15\n      }\n    ]\n  }\n]\nEOF\n\noci monitoring metric-data post --metric-data file:\/\/metric_data.json\n<\/code><\/pre>\n\n\n\n<p><strong>Expected outcome<\/strong>\n&#8211; Within the alarm evaluation period, the alarm changes to <strong>Firing<\/strong> (or equivalent).\n&#8211; You receive an email notification via Notifications.<\/p>\n\n\n\n<p><strong>Verification<\/strong>\n&#8211; In <strong>Monitoring \u2192 Alarms<\/strong>, open the alarm and check:\n  &#8211; Current state shows firing\n  &#8211; Alarm history shows a transition event\n&#8211; Check your email for the alarm message.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Validation<\/h3>\n\n\n\n<p>Use this checklist:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Metric exists<\/strong>\n   &#8211; Metric Explorer shows <code>lab_metrics \/ orders_processed<\/code> with your dimensions.<\/li>\n<li><strong>Alarm exists<\/strong>\n   &#8211; Alarm is created and scoped to the correct compartment and dimensions.<\/li>\n<li><strong>Alarm triggers<\/strong>\n   &#8211; Alarm state transitions to Firing after posting <code>value: 15<\/code>.<\/li>\n<li><strong>Notification delivered<\/strong>\n   &#8211; Email subscription is confirmed.\n   &#8211; You received the alarm email.<\/li>\n<\/ol>\n\n\n\n<p>If any item fails, use Troubleshooting below.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Troubleshooting<\/h3>\n\n\n\n<p><strong>Problem: No datapoints visible in Metric Explorer<\/strong>\n&#8211; Confirm you are in the correct <strong>region<\/strong>.\n&#8211; Confirm the <strong>compartment<\/strong> is correct.\n&#8211; Expand time range to last 1 hour.\n&#8211; Re-check dimensions: if your Explorer filter doesn\u2019t match posted dimensions, you won\u2019t see the series.<\/p>\n\n\n\n<p><strong>Problem: <code>NotAuthorizedOrNotFound<\/code> from CLI<\/strong>\n&#8211; Check that your OCI CLI profile is pointing to the correct tenancy\/user\/region in <code>~\/.oci\/config<\/code>.\n&#8211; Confirm your user\/group has the required IAM policies for metrics posting and alarm management.<\/p>\n\n\n\n<p><strong>Problem: Alarm never fires<\/strong>\n&#8211; Ensure the alarm\u2019s metric query filters match your metric\u2019s dimensions.\n&#8211; Confirm the alarm uses an aggregation and interval that will catch your datapoint (for example, if you used a longer window, wait longer).\n&#8211; Post multiple datapoints to ensure the evaluation window contains values.<\/p>\n\n\n\n<p><strong>Problem: Email never arrives<\/strong>\n&#8211; Confirm the subscription is <strong>Confirmed<\/strong>.\n&#8211; Check spam\/junk.\n&#8211; Verify the alarm destination topic is correct.\n&#8211; If your organization blocks automated emails, consider an HTTPS subscription endpoint instead (verify Notifications options).<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Cleanup<\/h3>\n\n\n\n<p>To avoid ongoing cost and to keep your tenancy tidy, delete lab resources.<\/p>\n\n\n\n<p><strong>Console cleanup<\/strong>\n1. Delete the alarm:\n   &#8211; Monitoring \u2192 Alarms \u2192 select <code>lab-orders-processed-alarm<\/code> \u2192 Delete\n2. Delete the Notifications subscription and topic:\n   &#8211; Notifications \u2192 open <code>lab-monitoring-topic<\/code>\n   &#8211; Delete the email subscription\n   &#8211; Delete the topic\n3. If you created a compartment for the lab:\n   &#8211; Ensure it has no remaining resources\n   &#8211; Delete the compartment (it will move to a \u201cDeleted\u201d state after resources are removed)<\/p>\n\n\n\n<p><strong>Local cleanup<\/strong>\n&#8211; Remove <code>metric_data.json<\/code> if desired.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">11. Best Practices<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Architecture best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Start with <strong>service metrics<\/strong> before adding custom metrics.<\/li>\n<li>Design a <strong>metric taxonomy<\/strong>:<\/li>\n<li>Namespaces by domain (<code>payments<\/code>, <code>orders<\/code>, <code>platform<\/code>)<\/li>\n<li>Metric names that are stable and consistent (<code>request_count<\/code>, <code>error_count<\/code>, <code>latency_ms<\/code>)<\/li>\n<li>Dimensions that support filtering without high cardinality (<code>environment<\/code>, <code>service<\/code>, <code>region<\/code>)<\/li>\n<li>Use <strong>compartments<\/strong> to separate environments and teams:<\/li>\n<li><code>prod<\/code>, <code>stage<\/code>, <code>dev<\/code>, <code>shared-ops<\/code><\/li>\n<li>Standardize <strong>alarm patterns<\/strong>:<\/li>\n<li>Saturation (CPU, memory), errors (5xx), latency, availability, and \u201csilence\u201d heartbeats<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">IAM\/security best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Apply <strong>least privilege<\/strong>:<\/li>\n<li>Separate roles: viewers (read metrics), operators (manage alarms), publishers (post custom metrics)<\/li>\n<li>Avoid using long-lived user API keys in apps:<\/li>\n<li>Prefer <strong>Instance Principals<\/strong> or <strong>Resource Principals<\/strong> for OCI-native workloads (verify supported auth for your architecture).<\/li>\n<li>Restrict who can change alarm destinations to avoid misrouting alerts.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Cost best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Control custom metric volume:<\/li>\n<li>Publish at 1-minute intervals unless you truly need higher frequency.<\/li>\n<li>Aggregate before publishing (send sums\/averages, not per-request metrics).<\/li>\n<li>Keep dimensions low cardinality:<\/li>\n<li>Do not use <code>requestId<\/code>, <code>sessionId<\/code>, <code>userId<\/code> as dimensions.<\/li>\n<li>Use tags on alarms and topics to enable cost allocation and ownership tracking.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Performance best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use dimension filters in queries to avoid pulling broad datasets.<\/li>\n<li>Avoid creating alarms that scan huge sets of time series unless necessary.<\/li>\n<li>Prefer a small number of well-designed alarms over many noisy alarms.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Reliability best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use <strong>multi-channel alerting<\/strong> for critical alarms:<\/li>\n<li>Email + webhook to incident management (depending on Notifications options)<\/li>\n<li>Build <strong>runbooks<\/strong> linked from alarm descriptions:<\/li>\n<li>\u201cWhat does this alarm mean?\u201d<\/li>\n<li>\u201cWhat\u2019s the first check?\u201d<\/li>\n<li>\u201cWhat are safe mitigations?\u201d<\/li>\n<li>Regularly test alarms (game days).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Operations best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Maintain an \u201calarm hygiene\u201d routine:<\/li>\n<li>Monthly review of top noisy alarms<\/li>\n<li>Remove obsolete alarms after architecture changes<\/li>\n<li>Use naming conventions:<\/li>\n<li><code>env.service.signal.severity<\/code> (example: <code>prod.orders.5xx_rate.critical<\/code>)<\/li>\n<li>Use consistent severity definitions and on-call routing topics per team.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Governance\/tagging\/naming best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Tag alarms and topics with:<\/li>\n<li><code>owner<\/code>, <code>costCenter<\/code>, <code>environment<\/code>, <code>application<\/code>, <code>team<\/code><\/li>\n<li>Enforce standards through IaC (Terraform) and code review.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">12. Security Considerations<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Identity and access model<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Monitoring access is controlled by <strong>OCI IAM policies<\/strong> at tenancy and compartment scope.<\/li>\n<li>Separate privileges for:<\/li>\n<li>Reading metrics (engineers, dashboards)<\/li>\n<li>Managing alarms (ops\/SRE)<\/li>\n<li>Posting custom metrics (applications\/automation)<\/li>\n<\/ul>\n\n\n\n<p><strong>Recommendation<\/strong>\n&#8211; Use dedicated dynamic groups and principals for workloads posting metrics.\n&#8211; Do not grant broad <code>manage all-resources<\/code> unless absolutely necessary.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Encryption<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>OCI services generally encrypt data at rest and in transit. For Monitoring-specific encryption guarantees, <strong>verify the official Monitoring security documentation<\/strong> and Oracle Cloud security documentation.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Network exposure<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Posting custom metrics uses OCI service endpoints; ensure:<\/li>\n<li>TLS is used (default)<\/li>\n<li>Your environment\u2019s egress policy allows access to OCI endpoints<\/li>\n<li>For HTTPS subscriptions (webhooks), ensure your endpoint:<\/li>\n<li>Uses TLS<\/li>\n<li>Requires authentication\/verification (to prevent spoofed alerts)<\/li>\n<li>Has rate limiting (to withstand alert storms)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Secrets handling<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Avoid embedding OCI config files and API keys in containers or repos.<\/li>\n<li>Use OCI-native identity (instance\/resource principals) when possible.<\/li>\n<li>If you must use API keys, store them in <strong>OCI Vault<\/strong> (separate service) and rotate regularly (verify your org\u2019s standard).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Audit\/logging<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use <strong>OCI Audit<\/strong> to track changes to:<\/li>\n<li>IAM policies that grant Monitoring permissions<\/li>\n<li>Alarm creation\/modification\/deletion<\/li>\n<li>Notifications topics and subscriptions<br\/>\n  Verify exact audit event coverage in official docs.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Compliance considerations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Alarms and metrics can contain sensitive context if you encode it in dimensions (for example, customer identifiers).<\/li>\n<li>Treat custom metric payload design as a data classification issue:<\/li>\n<li>Do not put PII into metric dimensions or names.<\/li>\n<li>Use anonymized or aggregated identifiers.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Common security mistakes<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Posting high-cardinality identifiers (PII) as dimensions.<\/li>\n<li>Granting developers <code>manage alarms<\/code> in production without controls.<\/li>\n<li>Using a shared email topic for all severities (leaks incident details broadly).<\/li>\n<li>Not authenticating webhook subscribers.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Secure deployment recommendations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Compartmentalize prod monitoring resources (alarms\/topics) and restrict modifications.<\/li>\n<li>Use separate topics per:<\/li>\n<li>Severity (critical vs warning)<\/li>\n<li>Team ownership (payments vs platform)<\/li>\n<li>Validate webhook endpoints and log delivery outcomes.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">13. Limitations and Gotchas<\/h2>\n\n\n\n<blockquote>\n<p>Exact numeric limits can change. Always check OCI Service Limits for Monitoring and Notifications in your region\/tenancy.<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\">Known limitation categories<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Regional scope<\/strong><\/li>\n<li>Metrics and alarms are regional; multi-region monitoring requires per-region configuration and aggregation outside Monitoring if needed.<\/li>\n<li><strong>Service metric availability<\/strong><\/li>\n<li>Not all services emit all desired metrics; some require enabling agent plugins or service-specific options.<\/li>\n<li><strong>Custom metric cardinality<\/strong><\/li>\n<li>Too many dimension combinations can:<ul>\n<li>hit service limits<\/li>\n<li>increase ingestion cost<\/li>\n<li>make queries slow or confusing<\/li>\n<\/ul>\n<\/li>\n<li><strong>Alarm noise<\/strong><\/li>\n<li>A poorly designed alarm (too sensitive, no delay, no aggregation) will flap and create alert fatigue.<\/li>\n<li><strong>Email confirmation requirement<\/strong><\/li>\n<li>Notifications email subscriptions require confirmation; missing confirmation causes \u201csilent\u201d non-delivery.<\/li>\n<li><strong>IAM complexity<\/strong><\/li>\n<li>Cross-compartment visibility is not automatic; missing policies are a frequent cause of \u201cno metrics found\u201d confusion.<\/li>\n<li><strong>Time window mismatches<\/strong><\/li>\n<li>Alarm evaluation windows and metric publishing intervals must align; single datapoints might not trigger if evaluation expects sustained conditions.<\/li>\n<li><strong>Dimension mismatches<\/strong><\/li>\n<li>A common gotcha: your alarm filters don\u2019t match posted dimensions exactly (case\/typos), resulting in \u201cno data.\u201d<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Pricing surprises<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Custom metrics can be inexpensive at small scale but can grow rapidly with:<\/li>\n<li>per-pod\/per-container dimensions<\/li>\n<li>second-level publishing<\/li>\n<li>many environments and microservices<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Compatibility issues<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>If you rely on agents\/plugins for OS-level metrics, compatibility depends on:<\/li>\n<li>OS version<\/li>\n<li>agent version<\/li>\n<li>network egress and permissions<br\/>\n  Verify in official docs for the specific agent\/plugin involved.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Migration challenges (from other tools)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Metric naming and query semantics differ from Prometheus\/AWS CloudWatch\/Azure Monitor.<\/li>\n<li>Alarm threshold semantics and aggregation windows may need redesign, not just a \u201clift and shift.\u201d<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">14. Comparison with Alternatives<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">In Oracle Cloud (nearest services)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Logging<\/strong>: event\/log record collection; not a metrics system. Great for root cause after alarms.<\/li>\n<li><strong>Logging Analytics<\/strong>: advanced log analytics and correlation; complements Monitoring.<\/li>\n<li><strong>APM<\/strong>: application tracing, spans, transactions; complements Monitoring for deep app performance.<\/li>\n<li><strong>Health Checks<\/strong>: external availability probing (separate service); complements Monitoring.<\/li>\n<li><strong>Dashboards<\/strong>: visualization layer; complements Monitoring.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Other clouds (nearest equivalents)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>AWS: CloudWatch (metrics\/alarms\/logs), with different pricing and query patterns.<\/li>\n<li>Azure: Azure Monitor (metrics\/logs\/alerts).<\/li>\n<li>GCP: Cloud Monitoring (metrics\/alerting), integrated with Cloud Logging.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Open-source\/self-managed alternatives<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Prometheus + Alertmanager + Grafana<\/li>\n<li>VictoriaMetrics \/ Thanos \/ Cortex for scalable metrics backends<\/li>\n<li>OpenTelemetry metrics pipelines (then choose backend)<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">Comparison table<\/h4>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Option<\/th>\n<th>Best For<\/th>\n<th>Strengths<\/th>\n<th>Weaknesses<\/th>\n<th>When to Choose<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>OCI Monitoring<\/strong><\/td>\n<td>OCI-native metrics + alarms<\/td>\n<td>Integrated with OCI services\/IAM; managed; service metrics out of the box; custom metrics supported<\/td>\n<td>Regional scope; feature set focused on OCI; custom metric costs\/limits must be managed<\/td>\n<td>You run primarily on OCI and want first-party monitoring and alerting<\/td>\n<\/tr>\n<tr>\n<td><strong>OCI Logging<\/strong><\/td>\n<td>Log collection and troubleshooting<\/td>\n<td>Great for forensic analysis; structured\/unstructured logs<\/td>\n<td>Not a metrics platform; alerting differs<\/td>\n<td>Use with Monitoring to investigate alarm triggers<\/td>\n<\/tr>\n<tr>\n<td><strong>OCI Logging Analytics<\/strong><\/td>\n<td>Advanced log analytics<\/td>\n<td>Powerful search\/correlation on logs<\/td>\n<td>Additional setup\/cost; not a pure metrics replacement<\/td>\n<td>You need deep log insights alongside Monitoring<\/td>\n<\/tr>\n<tr>\n<td><strong>OCI APM<\/strong><\/td>\n<td>App tracing and deep performance<\/td>\n<td>End-to-end tracing, app-level visibility<\/td>\n<td>Requires instrumentation\/agents; separate pricing<\/td>\n<td>You need tracing and app performance diagnostics beyond metrics<\/td>\n<\/tr>\n<tr>\n<td><strong>AWS CloudWatch<\/strong><\/td>\n<td>AWS workloads<\/td>\n<td>Mature ecosystem; integrated metrics\/logs<\/td>\n<td>Different semantics\/pricing; not OCI-native<\/td>\n<td>You are primarily on AWS<\/td>\n<\/tr>\n<tr>\n<td><strong>Azure Monitor<\/strong><\/td>\n<td>Azure workloads<\/td>\n<td>Broad monitoring suite<\/td>\n<td>Not OCI-native<\/td>\n<td>You are primarily on Azure<\/td>\n<\/tr>\n<tr>\n<td><strong>GCP Cloud Monitoring<\/strong><\/td>\n<td>GCP workloads<\/td>\n<td>Strong managed monitoring<\/td>\n<td>Not OCI-native<\/td>\n<td>You are primarily on GCP<\/td>\n<\/tr>\n<tr>\n<td><strong>Prometheus + Grafana<\/strong><\/td>\n<td>Cloud-neutral Kubernetes and custom monitoring<\/td>\n<td>Industry-standard; flexible queries (PromQL); portable<\/td>\n<td>You operate and scale it; storage\/HA burden; integration effort<\/td>\n<td>You need portability, Kubernetes-native metrics, or custom control of retention and dashboards<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">15. Real-World Example<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Enterprise example: regulated financial services platform<\/h3>\n\n\n\n<p><strong>Problem<\/strong>\nA financial services company runs customer-facing APIs on OCI across multiple compartments (prod, staging, shared). They need strong operational control, least privilege access, and reliable incident routing with auditability.<\/p>\n\n\n\n<p><strong>Proposed architecture<\/strong>\n&#8211; OCI Monitoring for service metrics:\n  &#8211; Load balancer health and error rates\n  &#8211; Compute resource saturation\n  &#8211; Database service metrics\n&#8211; Custom metrics:\n  &#8211; <code>transaction_success_rate<\/code>\n  &#8211; <code>authorization_latency_ms<\/code>\n  &#8211; <code>queue_backlog<\/code>\n&#8211; Alarms per tier:\n  &#8211; Critical alarms route to a <code>prod-critical<\/code> Notifications topic\n  &#8211; Warning alarms route to <code>prod-warning<\/code>\n&#8211; Notifications:\n  &#8211; Email distribution lists for on-call\n  &#8211; HTTPS webhook to incident management system\n&#8211; Governance:\n  &#8211; IAM policies restrict alarm modification to SRE group\n  &#8211; Alarms managed via Terraform with code review\n  &#8211; Tags enforce ownership and cost attribution<\/p>\n\n\n\n<p><strong>Why Monitoring was chosen<\/strong>\n&#8211; OCI-native metrics and IAM integration fits regulated environments.\n&#8211; Service metrics reduce operational overhead.\n&#8211; Custom metrics enable business-level alerting without a separate metrics stack.<\/p>\n\n\n\n<p><strong>Expected outcomes<\/strong>\n&#8211; Reduced MTTD with consistent alerting across compartments\n&#8211; Better auditability and change control over monitoring rules\n&#8211; Improved incident response via routing and runbooks linked in alarm metadata<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Startup\/small-team example: SaaS MVP on OCI<\/h3>\n\n\n\n<p><strong>Problem<\/strong>\nA small team runs a single-region SaaS app on OCI with a small VM pool and a managed database. They need basic alerting without operating Prometheus.<\/p>\n\n\n\n<p><strong>Proposed architecture<\/strong>\n&#8211; OCI Monitoring service metrics:\n  &#8211; VM CPU utilization\n  &#8211; Load balancer backend health\n  &#8211; Database CPU\/storage metrics\n&#8211; Minimal custom metrics:\n  &#8211; <code>signup_count<\/code>\n  &#8211; <code>job_failures<\/code>\n&#8211; A few alarms routed to one Notifications topic with email subscriptions to founders\/on-call.<\/p>\n\n\n\n<p><strong>Why Monitoring was chosen<\/strong>\n&#8211; Low operational burden, quick setup.\n&#8211; Sufficient for MVP operational needs.\n&#8211; Scales as they add more services and compartments.<\/p>\n\n\n\n<p><strong>Expected outcomes<\/strong>\n&#8211; Basic reliability guardrails without additional infrastructure\n&#8211; Faster debugging when performance issues happen\n&#8211; Controlled costs by limiting custom metrics<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">16. FAQ<\/h2>\n\n\n\n<p>1) <strong>Is Oracle Cloud Monitoring the same as Logging?<\/strong><br\/>\nNo. Monitoring focuses on <strong>metrics<\/strong> (time-series numeric values) and <strong>alarms<\/strong>. Logging captures <strong>log events<\/strong> (text\/structured records). They complement each other.<\/p>\n\n\n\n<p>2) <strong>Is Monitoring a regional service in OCI?<\/strong><br\/>\nYes, Monitoring is typically <strong>regional<\/strong>. Metrics and alarms are tied to the region. For multi-region architectures, plan alarms per region and centralize routing downstream if needed.<\/p>\n\n\n\n<p>3) <strong>What are service metrics vs custom metrics?<\/strong><br\/>\nService metrics are emitted by OCI services automatically. Custom metrics are datapoints you publish to Monitoring for your apps\/systems.<\/p>\n\n\n\n<p>4) <strong>Do I need an agent to use Monitoring?<\/strong><br\/>\nFor many OCI services, no\u2014service metrics are automatic. For OS-level metrics (like memory) you may need an OCI agent\/plugin depending on the service and OS. Verify in official docs for your target metric.<\/p>\n\n\n\n<p>5) <strong>How do alarms send notifications?<\/strong><br\/>\nAlarms usually publish to an <strong>OCI Notifications topic<\/strong>. Subscriptions on the topic deliver messages to email\/HTTPS\/etc.<\/p>\n\n\n\n<p>6) <strong>Can I create alarms with Terraform?<\/strong><br\/>\nYes, typically you can manage alarms and topics as code using OCI Terraform provider resources. Verify current provider documentation for the exact resource names and arguments.<\/p>\n\n\n\n<p>7) <strong>What\u2019s the biggest cost risk in Monitoring?<\/strong><br\/>\nHigh-volume\/high-cardinality <strong>custom metrics<\/strong>. Avoid dimensions that create many unique time series.<\/p>\n\n\n\n<p>8) <strong>Can I monitor Kubernetes (OKE) with OCI Monitoring?<\/strong><br\/>\nYou can monitor OCI service metrics around OKE and related infrastructure. For detailed pod\/container metrics, many teams use Prometheus-based tooling. Verify current OCI OKE observability guidance.<\/p>\n\n\n\n<p>9) <strong>How do I avoid alert fatigue?<\/strong><br\/>\nUse aggregation windows, delays, and clear thresholds; scope alarms with dimensions; classify severity; review noisy alarms regularly.<\/p>\n\n\n\n<p>10) <strong>Can I trigger automation from an alarm?<\/strong><br\/>\nIndirectly, yes\u2014alarms publish to Notifications. Notifications can deliver to endpoints like HTTPS or Functions (depending on Notifications features). Use this to trigger auto-remediation carefully.<\/p>\n\n\n\n<p>11) <strong>How do I design custom metrics for business KPIs?<\/strong><br\/>\nPublish aggregated counts\/rates\/latency percentiles (if you compute them upstream), use stable metric names, and include low-cardinality dimensions like <code>service<\/code>, <code>environment<\/code>.<\/p>\n\n\n\n<p>12) <strong>Why do I see \u201cNo data\u201d for an alarm?<\/strong><br\/>\nCommon causes: wrong region\/compartment, wrong dimension filters, publishing interval too sparse for evaluation window, or metric not emitted as expected.<\/p>\n\n\n\n<p>13) <strong>Can multiple teams share the same Monitoring setup?<\/strong><br\/>\nYes\u2014use compartments and IAM policies to isolate. Share only common topics\/routing if desired.<\/p>\n\n\n\n<p>14) <strong>How quickly do alarms detect issues?<\/strong><br\/>\nDepends on metric emission frequency and alarm evaluation settings (window, interval). Choose settings that balance speed and noise. Verify exact evaluation behavior in official docs.<\/p>\n\n\n\n<p>15) <strong>Can I export metrics to external systems?<\/strong><br\/>\nYes, you can query via API\/CLI\/SDK and forward to external systems. Consider API rate limits and data transfer costs.<\/p>\n\n\n\n<p>16) <strong>Is there a built-in dashboard for all metrics?<\/strong><br\/>\nYou can explore metrics in Metric Explorer; for curated dashboards, use OCI\u2019s dashboard capabilities or external tools. Verify your tenancy\u2019s current dashboard options.<\/p>\n\n\n\n<p>17) <strong>What\u2019s the difference between Monitoring and APM?<\/strong><br\/>\nMonitoring is metrics and alarms for infrastructure and custom numeric signals. APM adds tracing and application performance diagnostics (transactions, spans), typically requiring instrumentation.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">17. Top Online Resources to Learn Monitoring<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Resource Type<\/th>\n<th>Name<\/th>\n<th>Why It Is Useful<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Official documentation<\/td>\n<td>OCI Monitoring documentation<\/td>\n<td>Primary reference for metrics, namespaces, alarms, and APIs: https:\/\/docs.oracle.com\/en-us\/iaas\/Content\/Monitoring\/home.htm<\/td>\n<\/tr>\n<tr>\n<td>Official documentation<\/td>\n<td>OCI Notifications documentation<\/td>\n<td>Required for alarm delivery via topics\/subscriptions: https:\/\/docs.oracle.com\/en-us\/iaas\/Content\/Notification\/home.htm<\/td>\n<\/tr>\n<tr>\n<td>Official documentation<\/td>\n<td>OCI CLI concepts and setup<\/td>\n<td>Install\/configure CLI used in labs and automation: https:\/\/docs.oracle.com\/en-us\/iaas\/Content\/API\/Concepts\/cliconcepts.htm<\/td>\n<\/tr>\n<tr>\n<td>Official pricing<\/td>\n<td>OCI price list (Observability and Management)<\/td>\n<td>Official pricing dimensions for Monitoring\/related services: https:\/\/www.oracle.com\/cloud\/price-list\/#observability-and-management<\/td>\n<\/tr>\n<tr>\n<td>Official tool<\/td>\n<td>OCI Cost Estimator<\/td>\n<td>Model regional cost impacts: https:\/\/www.oracle.com\/cloud\/costestimator.html<\/td>\n<\/tr>\n<tr>\n<td>Official free tier<\/td>\n<td>Oracle Cloud Free Tier<\/td>\n<td>Understand Always Free and trial allowances: https:\/\/www.oracle.com\/cloud\/free\/<\/td>\n<\/tr>\n<tr>\n<td>Architecture guidance<\/td>\n<td>Oracle Architecture Center<\/td>\n<td>Reference architectures and operational patterns (search for observability): https:\/\/www.oracle.com\/cloud\/architecture-center\/<\/td>\n<\/tr>\n<tr>\n<td>Tutorials\/labs<\/td>\n<td>Oracle LiveLabs<\/td>\n<td>Hands-on labs for OCI services including observability topics: https:\/\/livelabs.oracle.com\/<\/td>\n<\/tr>\n<tr>\n<td>Official GitHub<\/td>\n<td>OCI CLI repository<\/td>\n<td>Source, releases, and examples for CLI: https:\/\/github.com\/oracle\/oci-cli<\/td>\n<\/tr>\n<tr>\n<td>Official GitHub<\/td>\n<td>OCI SDKs<\/td>\n<td>Programmatic access samples and SDKs: https:\/\/github.com\/oracle\/oci-python-sdk (and related org repos)<\/td>\n<\/tr>\n<tr>\n<td>Community (reputable)<\/td>\n<td>Oracle Cloud blogs and solution playbooks<\/td>\n<td>Practical patterns and updates (validate against docs): https:\/\/blogs.oracle.com\/cloud-infrastructure\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">18. Training and Certification Providers<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Institute<\/th>\n<th>Suitable Audience<\/th>\n<th>Likely Learning Focus<\/th>\n<th>Mode<\/th>\n<th>Website<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>DevOpsSchool.com<\/td>\n<td>DevOps engineers, SREs, platform teams<\/td>\n<td>DevOps practices, cloud operations, monitoring\/observability fundamentals<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.devopsschool.com\/<\/td>\n<\/tr>\n<tr>\n<td>ScmGalaxy.com<\/td>\n<td>Beginners to intermediate engineers<\/td>\n<td>DevOps, CI\/CD, operations tooling foundations<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.scmgalaxy.com\/<\/td>\n<\/tr>\n<tr>\n<td>CLoudOpsNow.in<\/td>\n<td>Cloud engineers, operations teams<\/td>\n<td>Cloud operations and monitoring practices<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.cloudopsnow.in\/<\/td>\n<\/tr>\n<tr>\n<td>SreSchool.com<\/td>\n<td>SREs, reliability engineers<\/td>\n<td>SRE principles, alerting, SLOs, incident management<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.sreschool.com\/<\/td>\n<\/tr>\n<tr>\n<td>AiOpsSchool.com<\/td>\n<td>Ops\/SRE, automation-focused teams<\/td>\n<td>AIOps concepts, event correlation, automation<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.aiopsschool.com\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">19. Top Trainers<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Platform\/Site<\/th>\n<th>Likely Specialization<\/th>\n<th>Suitable Audience<\/th>\n<th>Website<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>RajeshKumar.xyz<\/td>\n<td>DevOps\/cloud coaching and consulting-style training resources (verify offerings)<\/td>\n<td>Engineers seeking guided learning<\/td>\n<td>https:\/\/rajeshkumar.xyz\/<\/td>\n<\/tr>\n<tr>\n<td>devopstrainer.in<\/td>\n<td>DevOps training programs (verify current courses)<\/td>\n<td>Beginners to intermediate DevOps learners<\/td>\n<td>https:\/\/www.devopstrainer.in\/<\/td>\n<\/tr>\n<tr>\n<td>devopsfreelancer.com<\/td>\n<td>Freelance DevOps services and potentially training\/support resources (verify scope)<\/td>\n<td>Teams needing hands-on help<\/td>\n<td>https:\/\/www.devopsfreelancer.com\/<\/td>\n<\/tr>\n<tr>\n<td>devopssupport.in<\/td>\n<td>DevOps support and training resources (verify offerings)<\/td>\n<td>Ops teams needing troubleshooting help<\/td>\n<td>https:\/\/www.devopssupport.in\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">20. Top Consulting Companies<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Company<\/th>\n<th>Likely Service Area<\/th>\n<th>Where They May Help<\/th>\n<th>Consulting Use Case Examples<\/th>\n<th>Website<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>cotocus.com<\/td>\n<td>Cloud\/DevOps consulting (verify current portfolio)<\/td>\n<td>Observability setup, automation, cloud operations<\/td>\n<td>Alarm strategy design, custom metrics pipeline design, IaC for alarms\/topics<\/td>\n<td>https:\/\/cotocus.com\/<\/td>\n<\/tr>\n<tr>\n<td>DevOpsSchool.com<\/td>\n<td>DevOps consulting and enablement<\/td>\n<td>Training + implementation support for DevOps\/observability<\/td>\n<td>Monitoring baseline implementation, on-call readiness, CI\/CD integration for alarm-as-code<\/td>\n<td>https:\/\/www.devopsschool.com\/<\/td>\n<\/tr>\n<tr>\n<td>DEVOPSCONSULTING.IN<\/td>\n<td>DevOps consulting services (verify current offerings)<\/td>\n<td>Implementation and support<\/td>\n<td>Setting up alert routing, building runbooks, governance and IAM reviews<\/td>\n<td>https:\/\/www.devopsconsulting.in\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">21. Career and Learning Roadmap<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">What to learn before Monitoring<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>OCI fundamentals:<\/li>\n<li>Regions, availability domains, compartments<\/li>\n<li>OCI IAM basics (groups, policies, dynamic groups)<\/li>\n<li>Core services: Compute, Networking (VCN), Load Balancer<\/li>\n<li>Observability fundamentals:<\/li>\n<li>Metrics vs logs vs traces<\/li>\n<li>Basic SRE concepts: SLIs\/SLOs, alert fatigue, incident lifecycle<\/li>\n<li>CLI basics:<\/li>\n<li>Authentication, profiles, regions, compartments<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">What to learn after Monitoring<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>OCI Logging and Logging Analytics for root cause analysis.<\/li>\n<li>OCI APM for tracing and application diagnostics (if you own app performance).<\/li>\n<li>Automation:<\/li>\n<li>Terraform modules for alarms and notification topics<\/li>\n<li>Functions-based remediation workflows<\/li>\n<li>Reliability engineering:<\/li>\n<li>SLO-based alerting and error budgets<\/li>\n<li>Capacity planning and performance testing<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Job roles that use Monitoring<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Cloud Engineer (OCI)<\/li>\n<li>DevOps Engineer<\/li>\n<li>Site Reliability Engineer (SRE)<\/li>\n<li>Platform Engineer<\/li>\n<li>Cloud Solutions Architect<\/li>\n<li>Operations\/NOC Engineer<\/li>\n<li>Security Engineer (for availability and abuse signals)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Certification path (if available)<\/h3>\n\n\n\n<p>Oracle certification offerings change. Look for OCI certifications that cover:\n&#8211; OCI Foundations (baseline)\n&#8211; Architect or DevOps-focused OCI certifications<br\/>\nVerify current certification tracks on Oracle University \/ Oracle Certification pages (official): https:\/\/education.oracle.com\/<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Project ideas for practice<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Golden alarms module<\/strong>\n   &#8211; Build Terraform module that creates standard alarms (CPU, LB health, DB storage).<\/li>\n<li><strong>Custom KPI monitoring<\/strong>\n   &#8211; Publish 5 KPIs from a demo app (requests, errors, latency, queue depth, throughput).<\/li>\n<li><strong>Alarm routing by severity<\/strong>\n   &#8211; Two topics (critical\/warning), subscriptions to different teams.<\/li>\n<li><strong>Game day<\/strong>\n   &#8211; Intentionally trigger CPU saturation or simulated error rate and validate notifications\/runbooks.<\/li>\n<li><strong>Cost guardrails<\/strong>\n   &#8211; Implement checks to prevent high-cardinality dimensions in custom metrics.<\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">22. Glossary<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Alarm<\/strong>: A rule that evaluates a metric query and triggers notifications when conditions are met.<\/li>\n<li><strong>Aggregation<\/strong>: A method to combine datapoints over time (for example, mean\/max\/sum) for evaluation and charting.<\/li>\n<li><strong>Compartment<\/strong>: An OCI organizational boundary for resources and IAM policies.<\/li>\n<li><strong>Custom metric<\/strong>: A metric you publish to OCI Monitoring via API\/CLI\/SDK.<\/li>\n<li><strong>Datapoint<\/strong>: A single metric value at a timestamp.<\/li>\n<li><strong>Dimension<\/strong>: A key\/value attribute that describes a time series and enables filtering (for example, <code>resourceId<\/code>, <code>app<\/code>, <code>environment<\/code>).<\/li>\n<li><strong>Metric<\/strong>: A named time-series signal, typically numeric, representing a system or application measurement.<\/li>\n<li><strong>Metric Explorer<\/strong>: OCI Console UI for browsing and charting metrics.<\/li>\n<li><strong>Namespace<\/strong>: A container for related metrics (service namespace or custom namespace).<\/li>\n<li><strong>Notifications topic<\/strong>: A message channel in OCI Notifications; publishers send messages to a topic, and subscribers receive them.<\/li>\n<li><strong>Subscription<\/strong>: A delivery endpoint (email\/HTTPS\/etc.) attached to a Notifications topic.<\/li>\n<li><strong>SLO (Service Level Objective)<\/strong>: A reliability target (for example, 99.9% availability).<\/li>\n<li><strong>SLI (Service Level Indicator)<\/strong>: A measurement that feeds an SLO (for example, success rate).<\/li>\n<li><strong>Telemetry<\/strong>: A general term for metrics data; used in some OCI API naming around Monitoring.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">23. Summary<\/h2>\n\n\n\n<p><strong>Oracle Cloud Monitoring<\/strong> is OCI\u2019s managed <strong>metrics and alarms<\/strong> service in the <strong>Observability and Management<\/strong> category. It collects <strong>service metrics<\/strong> from OCI resources, accepts <strong>custom metrics<\/strong> from your applications, and evaluates <strong>alarms<\/strong> that can notify teams through <strong>OCI Notifications<\/strong>.<\/p>\n\n\n\n<p>It matters because it forms the operational backbone for detecting incidents early, reducing downtime, and turning system behavior into actionable alerts. Cost and scale considerations center on <strong>custom metrics volume and cardinality<\/strong>, while security hinges on <strong>least-privilege IAM<\/strong>, compartment design, and safe notification endpoints.<\/p>\n\n\n\n<p>Use Monitoring when you want OCI-native, IAM-integrated metrics and alerting with minimal operational overhead. For deeper troubleshooting and correlation, pair it with <strong>Logging<\/strong>, <strong>Logging Analytics<\/strong>, and <strong>APM<\/strong> as appropriate.<\/p>\n\n\n\n<p>Next step: implement a \u201cgolden signals\u201d alarm baseline (latency, traffic, errors, saturation) in Terraform and roll it out across your compartments with consistent tagging and routing.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Observability and Management<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[75,62],"tags":[],"class_list":["post-962","post","type-post","status-publish","format-standard","hentry","category-observability-and-management","category-oracle-cloud"],"_links":{"self":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/posts\/962","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/comments?post=962"}],"version-history":[{"count":0,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/posts\/962\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/media?parent=962"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/categories?post=962"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/tags?post=962"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}