Category
Observability and Management
1. Introduction
Oracle Cloud Stack Monitoring is an Oracle Cloud (OCI) Observability and Management service designed to help you monitor and troubleshoot the health and performance of your technology stack—from hosts to middleware and databases—using centralized discovery, metrics, topology views, and alerting integrations.
In simple terms: Stack Monitoring helps you see what you’re running, understand how components depend on each other, and detect problems early by collecting and organizing operational telemetry (primarily metrics) across supported resources.
Technically, Stack Monitoring uses the OCI Management Agent for agent-based data collection and resource discovery, builds a monitored resource inventory and topology (dependency) model, and surfaces metrics and status so operations teams can analyze behavior over time and respond to incidents. It integrates with other OCI services such as Monitoring (metrics/alarms), Notifications, and Logging (for agent/service logs), depending on your setup and workflows.
The problem it solves is common in real environments: when you have multiple layers (compute, OS, middleware, databases), troubleshooting becomes slow because telemetry is scattered and dependencies are unclear. Stack Monitoring focuses on stack-level visibility and operational clarity rather than requiring you to assemble everything yourself from raw metrics and logs.
Service name status: As of the latest publicly available OCI documentation, “Stack Monitoring” is an active OCI service name under Observability and Management. If you find a renamed UI label in your tenancy, verify in official docs and your region’s console because OCI UI groupings can evolve.
2. What is Stack Monitoring?
Official purpose (what it is for)
Stack Monitoring’s purpose is to provide monitoring for your application infrastructure stack by: – Discovering supported resources (for example, hosts and certain Oracle middleware/database technologies) – Collecting key performance and availability metrics – Presenting topology and resource relationships – Enabling alerting/incident workflows through OCI integrations
For the official product documentation, start here:
https://docs.oracle.com/en-us/iaas/stack-monitoring/
Core capabilities (high-level)
- Resource discovery (agent-based) and creation of monitored resources
- Inventory of monitored resources by compartment
- Topology visualization to understand dependencies
- Metrics collection and visualization for supported resource types
- Alerting integration (commonly via OCI Monitoring alarms + Notifications, depending on how your tenancy is configured)
- OCI-native governance using compartments, IAM policies, tags, and audit logging
Major components
While naming can vary slightly across console versions, Stack Monitoring solutions typically involve:
-
Stack Monitoring service (control plane + UI) – Defines monitored resource types – Maintains inventory and topology model – Provides views, dashboards, and administration workflows
-
OCI Management Agent – Installed on a host (OCI compute, on-prem VM/bare metal, or other cloud VM) – Performs local discovery/collection and securely sends telemetry to OCI endpoints
Docs: https://docs.oracle.com/en-us/iaas/management-agents/ -
Monitored resources – Logical representations of discovered components (example: a host, a middleware domain, a database target—depending on supported types in your region/tenancy)
-
Metrics storage and alerting integrations – Metrics are typically surfaced through OCI’s observability primitives. – Alerting is commonly implemented using OCI Monitoring alarms and Notifications (implementation details can vary; verify exact integration behavior in official docs for your target types and region).
Service type
- Managed OCI service (SaaS-like within your OCI tenancy)
- Uses agent-based collection for many monitored resource types
Scope: regional vs global, and OCI resource boundaries
- Regional service behavior: Stack Monitoring is operated within an OCI region. Agents register to a region endpoint. Telemetry and monitored resource inventory are associated with that region.
- Tenancy and compartment scoped: You organize and control access using compartments and IAM policies.
- Not zonal: OCI uses Availability Domains (ADs) and Fault Domains for compute, but Stack Monitoring itself is not “zonal” in the way some other clouds describe services.
How it fits into the Oracle Cloud ecosystem
Stack Monitoring sits inside OCI’s Observability and Management portfolio and complements: – OCI Monitoring for metrics/alarms at OCI resource level – OCI Logging and (optionally) Logging Analytics for logs – OCI APM for application tracing and end-user performance (different focus) – Database Management and Operations Insights for database-centric and capacity analytics (different scope) – OCI Events + Notifications + Functions for incident automation patterns (integration pattern; verify specifics)
3. Why use Stack Monitoring?
Business reasons
- Reduced downtime and faster incident resolution: Topology + curated metrics reduce “time to identify” and “time to resolve”.
- Standardized operations across environments: A consistent approach for OCI + on-prem + hybrid (where agent connectivity is possible).
- Lower operational overhead: Instead of assembling custom scripts/collectors for every layer, use a managed approach.
Technical reasons
- Dependency awareness: Understanding which host supports which middleware component helps avoid treating symptoms instead of root cause.
- Curated telemetry: For supported technologies, Stack Monitoring can surface meaningful metrics without you hand-crafting every collection pipeline.
- Agent-based extensibility: You can often onboard new hosts quickly by installing an agent and running discovery.
Operational reasons (SRE/DevOps)
- Inventory and ownership boundaries: Compartment-based organization aligns with teams and environments (prod/dev).
- Alerting workflows: Integrate alarms with on-call notification channels (email, PagerDuty via webhook bridges, etc.—implementation varies).
- Repeatable onboarding: Standardize how hosts and supported applications are discovered and monitored.
Security/compliance reasons
- IAM-controlled access: Fine-grained control via OCI IAM policies for who can view or manage monitored resources.
- Auditability: OCI Audit can capture administrative actions taken in OCI services (verify which Stack Monitoring actions are audited in your region).
- Data residency: Choose region to align with residency needs; telemetry stored in-region (verify retention and residency details in official docs).
Scalability/performance reasons
- Designed for fleets: Many organizations monitor large numbers of hosts and middleware instances.
- Centralized operational view: Scales better than SSH-ing into servers and manually checking health.
When teams should choose Stack Monitoring
Choose Stack Monitoring when: – You need stack-level monitoring (host + middleware + database layers) with dependency/topology views. – You run supported Oracle technologies (and want Oracle-aligned monitoring semantics). – You want OCI-native governance (compartments/tags/IAM) for observability operations.
When teams should not choose it
Avoid or deprioritize Stack Monitoring when: – Your primary need is distributed tracing and code-level profiling (look at OCI APM). – You only need basic infrastructure metrics for OCI resources (OCI Monitoring may be enough). – Your stack is mostly non-supported components and you need deep, vendor-neutral integrations (you may prefer Prometheus/Grafana or a third-party platform). – You cannot install or operate the required agent(s) due to policy or environment constraints.
4. Where is Stack Monitoring used?
Industries
- Financial services (regulated ops, hybrid environments)
- Telecom (large fleets, strict SLAs)
- Retail/e-commerce (capacity and incident response)
- Healthcare (compliance + high availability needs)
- Public sector (governance + regional residency)
- SaaS providers (standardizing operations)
Team types
- SRE and platform engineering teams
- DevOps/operations teams
- Middleware administrators (e.g., WebLogic administrators)
- Database operations teams (when combined with complementary services)
- Cloud Center of Excellence (CCoE) and governance teams
Workloads and architectures
- Three-tier enterprise apps: web tier + middleware + database
- Oracle middleware-centric stacks (where supported)
- Hybrid apps with on-prem databases and OCI compute front-ends
- Lift-and-shift migrations needing operational parity
- Shared services platforms hosting multiple applications
Real-world deployment contexts
- OCI-only: All components in OCI VCNs, agents installed on compute instances
- Hybrid: Agents installed on on-prem VMs/bare metal; private connectivity via VPN/FastConnect
- Multi-cloud (partial): Agents on VMs in other clouds that can securely reach OCI endpoints (architecture and security review required)
Production vs dev/test usage
- Production: Strong value due to alerting, topology, and operational dashboards
- Dev/test: Useful for validating performance baselines and catching regressions, but cost/effort may be reduced; some teams only monitor key shared environments
5. Top Use Cases and Scenarios
Below are realistic scenarios that align with how Stack Monitoring is typically used in OCI.
1) Host fleet monitoring with consistent baselines
- Problem: Teams manage many Linux hosts; CPU/memory/disk issues are detected late.
- Why Stack Monitoring fits: Agent-based onboarding + standardized host metrics views.
- Scenario: A platform team installs OCI Management Agent on 200 OCI compute instances and monitors capacity trends and availability.
2) Middleware dependency troubleshooting (host ↔ middleware)
- Problem: Middleware performance issues get misdiagnosed because host saturation isn’t visible.
- Why it fits: Topology helps correlate middleware symptoms with host metrics.
- Scenario: WebLogic response time spikes are correlated to host memory pressure and swapping.
3) Migration validation (on-prem to OCI)
- Problem: During migration, you need to ensure the OCI deployment behaves like the old environment.
- Why it fits: Standard monitoring views across environments if agents are deployed similarly.
- Scenario: During a phased cutover, both on-prem and OCI stacks are monitored to compare performance.
4) Standardized alerting for common failure modes
- Problem: Alert rules differ per team and environment; on-call noise is high.
- Why it fits: Central visibility + alarm integration patterns.
- Scenario: A shared “CPU high for 15 minutes” rule is standardized and routed via Notifications topics.
5) Environment segmentation by compartment (prod vs non-prod)
- Problem: Ops teams need separation of duties and visibility boundaries.
- Why it fits: OCI compartments + IAM policies map to environments.
- Scenario: Only SREs can manage discovery jobs in Prod; developers can view non-prod metrics.
6) Capacity planning signals for middleware hosts
- Problem: Capacity planning is done from spreadsheets, not telemetry.
- Why it fits: Trends and inventory help quantify growth.
- Scenario: Quarterly planning uses host memory utilization trends and growth rates from monitoring.
7) Incident triage with topology-first navigation
- Problem: During incidents, teams don’t know where to start.
- Why it fits: Topology helps prioritize likely root causes.
- Scenario: A database-dependent middleware cluster shows upstream degradation; topology points to the database host resource.
8) Compliance operations: auditable access to monitoring data
- Problem: Monitoring access must be controlled and audited.
- Why it fits: IAM policies and OCI Audit integrate with governance.
- Scenario: Audit logs show who changed monitored resource associations and who altered alerting settings (verify audited events).
9) Monitoring of shared platform services
- Problem: Shared services (identity, integration, messaging) affect many apps.
- Why it fits: Central view and ownership tagging.
- Scenario: Platform team monitors key middleware/hosts and shares read-only dashboards with app teams.
10) Post-patch validation
- Problem: After OS or middleware patching, regressions occur.
- Why it fits: Compare pre/post metrics and availability behavior.
- Scenario: After kernel patching, file system IO wait increases; monitoring highlights the regression.
6. Core Features
Note: Exact supported target types and feature names can vary by region and OCI release. Use the official Stack Monitoring docs to confirm support for your specific technologies and versions: https://docs.oracle.com/en-us/iaas/stack-monitoring/
1) OCI Management Agent-based collection
- What it does: Uses a locally installed agent to collect telemetry and perform discovery.
- Why it matters: Enables monitoring for resources not natively emitting OCI metrics, including on-prem.
- Practical benefit: Standard onboarding pattern; avoids building custom collectors.
- Limitations/caveats: Requires OS-level installation and outbound connectivity to OCI endpoints; may require sudo/root privileges.
2) Resource discovery and onboarding
- What it does: Discovers supported resources and creates them as monitored resources in Stack Monitoring.
- Why it matters: Inventory creation is foundational—without it, you don’t have a reliable model of what exists.
- Practical benefit: Faster onboarding than manual “register everything”.
- Limitations/caveats: Discovery typically works for supported technologies only; discovery accuracy depends on permissions and local configuration.
3) Monitored resource inventory (by compartment)
- What it does: Provides a list of monitored resources organized via OCI compartments.
- Why it matters: Operations require an authoritative inventory.
- Practical benefit: Filter by environment/team; align with IAM boundaries.
- Limitations/caveats: Inventory reflects what’s discovered/onboarded, not necessarily every resource in your estate.
4) Topology and relationship modeling
- What it does: Shows dependencies between monitored resources (for supported types).
- Why it matters: Incidents often propagate across tiers; topology reduces guesswork.
- Practical benefit: Faster root cause analysis and blast-radius assessment.
- Limitations/caveats: Relationship depth depends on discovered resource types and supported integrations.
5) Metrics visualization for supported resources
- What it does: Surfaces performance/availability metrics for monitored resources.
- Why it matters: Metrics are the core signal for performance troubleshooting.
- Practical benefit: Quickly see CPU/memory/disk (hosts) and key middleware/database metrics (where supported).
- Limitations/caveats: Metric names, granularity, and retention depend on OCI configurations and service defaults; verify retention and limits in official docs.
6) Alerting integration (alarms + notifications patterns)
- What it does: Enables alerting on collected metrics, typically via OCI Monitoring alarms and Notifications.
- Why it matters: Monitoring without alerting is mostly reactive.
- Practical benefit: Route incidents to email/SMS/webhooks via Notifications topics; integrate into incident response.
- Limitations/caveats: Alerting setup may span multiple services (Monitoring, Notifications). Confirm alarm creation workflow supported in your region.
7) OCI-native governance (IAM, compartments, tags)
- What it does: Uses OCI IAM policies for access control and compartments for scoping.
- Why it matters: In enterprises, observability access must be controlled.
- Practical benefit: Separate duties between platform and app teams; enforce least privilege.
- Limitations/caveats: Mis-scoped policies can block discovery/collection; you must plan policy boundaries.
8) APIs/automation support (where available)
- What it does: OCI services typically provide APIs and SDKs for automation.
- Why it matters: Fleet onboarding and policy-driven operations are hard to do manually.
- Practical benefit: Automate agent rollout and discovery job creation using IaC/CI pipelines.
- Limitations/caveats: API surface and SDK support vary; verify the Stack Monitoring API reference for current operations in your region.
7. Architecture and How It Works
High-level architecture
At a high level, Stack Monitoring looks like this:
- You install an OCI Management Agent on a host.
- The agent authenticates/identifies itself to OCI and sends telemetry to OCI endpoints in your region.
- Stack Monitoring uses discovery to create monitored resources (inventory).
- Metrics and health status are displayed in Stack Monitoring views.
- Alerts are configured to notify teams via OCI’s notification and alarm mechanisms.
Data flow and control flow (typical)
-
Admin control plane actions – IAM policies grant rights to manage agents and stack monitoring resources. – You configure discovery and monitoring scope (compartments, agent groups, etc., depending on your setup).
-
Agent runtime – Agent runs on monitored host, collects data periodically. – Agent transmits telemetry to OCI endpoints over TLS.
-
Service-side processing – Stack Monitoring associates telemetry with monitored resources. – Metrics become available for visualization and alerting.
-
Alerting – Alarm rules evaluate metrics. – On trigger, Notifications routes messages to endpoints (email, HTTPS, etc.).
Integrations with related OCI services
Common integrations/patterns include: – OCI IAM: policies for managing agents and monitoring configurations – OCI Monitoring: alarms and metric handling patterns – OCI Notifications: delivering alerts to responders – OCI Logging: agent logs and diagnostic logs – OCI Events / Functions: automation pattern for remediation (for example: alarms → notifications/webhook → function runbook). This is an architectural pattern; verify supported direct triggers in your region.
Dependency services
- OCI Management Agent service (agent lifecycle management)
- Networking (VCN + routing/NAT/Service Gateway) to allow agents to reach OCI endpoints
Security/authentication model (practical view)
- Human access: controlled by OCI IAM policies (groups, dynamic groups, compartments).
- Agent access: agent registration keys and agent identity managed through OCI agent management workflows. Exact mechanism can vary by agent install method; follow the Management Agent documentation for current registration methods: https://docs.oracle.com/en-us/iaas/management-agents/
Networking model
- Agents typically require outbound connectivity to OCI service endpoints in the region.
- In private networks, you may use:
- NAT Gateway for outbound internet access, or
- Service Gateway / private endpoints where supported (verify current requirements), or
- Controlled egress via proxies/firewalls (ensure TLS inspection policies don’t break agent connectivity)
Monitoring/logging/governance considerations
- Treat observability configuration as production infrastructure:
- Tag monitored resources by environment and owner (where supported).
- Centralize alert routing through Notifications topics.
- Review IAM regularly and audit changes.
- Use compartments to separate prod from non-prod.
Simple architecture diagram (Mermaid)
flowchart LR
U[Operator / SRE] -->|Console / API| SM[OCI Stack Monitoring]
H1[Host / VM] -->|Install| MA[OCI Management Agent]
MA -->|TLS telemetry| SM
SM --> M[OCI Monitoring (alarms/metrics patterns)]
M --> N[OCI Notifications]
N --> O[Email / HTTPS / On-call tooling]
Production-style architecture diagram (Mermaid)
flowchart TB
subgraph Tenancy[OCI Tenancy]
subgraph CompProd[Compartment: Prod]
SM[Stack Monitoring (Region)]
MON[OCI Monitoring]
NOTIF[OCI Notifications Topic]
LOG[OCI Logging]
end
IAM[OCI IAM Policies & Groups]
AUD[OCI Audit]
end
subgraph Network[OCI VCNs / Hybrid Network]
subgraph AppVCN[VCN: App]
WLS1[Compute: Middleware Host(s)]
DBH[Compute/On-prem: DB Host (if supported)]
NAT[NAT Gateway / Controlled Egress]
end
OnPrem[On-Prem Network]
VPN[VPN / FastConnect]
end
IAM --> SM
SM --> AUD
WLS1 -->|Agent installed| MA1[Management Agent]
DBH -->|Agent installed| MA2[Management Agent]
MA1 -->|TLS via NAT/egress| SM
MA2 -->|TLS via VPN/FastConnect + egress| SM
SM --> MON
SM --> LOG
MON --> NOTIF
NOTIF --> OnCall[On-call Email/HTTPS Integration]
8. Prerequisites
Tenancy / account requirements
- An active Oracle Cloud tenancy with permissions to use Observability and Management services.
- A target OCI region where Stack Monitoring is available. Availability can vary; verify regional availability in your console or official docs.
Permissions / IAM roles (typical)
You need IAM permissions for: – Managing Stack Monitoring resources in the target compartment – Managing Management Agents and agent install keys – Reading metrics (and creating alarms if you plan to)
OCI policies vary by org design. A common starting point (adapt to least privilege) is:
Allow group <YourGroup> to manage stack-monitoring-family in compartment <YourCompartment>
Allow group <YourGroup> to manage management-agents in compartment <YourCompartment>
Allow group <YourGroup> to read metrics in compartment <YourCompartment>
Allow group <YourGroup> to manage alarms in compartment <YourCompartment>
Allow group <YourGroup> to manage ons-topics in compartment <YourCompartment>
Notes: – Policy verbs and resource families must match OCI’s current IAM model. If a policy statement fails validation, use the policy builder and consult official IAM docs. – Some tenancies centralize Notifications topics in a shared compartment; scope accordingly.
Billing requirements
- A billing-enabled tenancy is usually required for paid services.
- Do not assume Stack Monitoring is included in Always Free. Verify free tier eligibility (if any) on official pricing pages.
Tools
- OCI Console access (enough for this tutorial)
- Optional:
- OCI CLI: https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/cliinstall.htm
- SSH client for connecting to the compute instance where you install the agent
Region availability
- Choose a region where:
- Stack Monitoring is enabled/available
- Management Agent endpoints are reachable from your network
Quotas / limits
- Expect limits around:
- Number of management agents
- Number of monitored resources
- Discovery operations
- Alarm counts
- Review Limits, Quotas, and Usage in OCI Console and the service limits documentation. Verify current limits for Stack Monitoring in your tenancy.
Prerequisite services
For the hands-on lab you will need: – An OCI Compute instance (Oracle Linux recommended for beginner lab) – Network egress to reach OCI endpoints – IAM permissions for agent management and stack monitoring
9. Pricing / Cost
Pricing changes over time and varies by region and contract. Do not rely on blog posts or old SKUs. Always confirm using official Oracle pricing sources.
Official pricing sources
- Oracle Cloud pricing landing page: https://www.oracle.com/cloud/pricing/
- Oracle Cloud Price List (search for “Stack Monitoring”): https://www.oracle.com/cloud/price-list/
- Oracle Cloud Cost Estimator: https://www.oracle.com/cloud/costestimator.html
Pricing dimensions (how you are typically billed)
Stack Monitoring pricing is generally usage-based (metered). Common metering dimensions for monitoring services include: – Number/type of monitored resources (for example, per monitored host or per monitored instance) and duration (per hour/month) – Associated service usage such as: – Alarms (OCI Monitoring) – Notifications deliveries – Logging ingestion and storage (if enabled) – Compute/network required to run monitored workloads and agents
Because exact SKUs and meters can differ, verify the specific Stack Monitoring meters and units in the Oracle Cloud Price List for your region.
Free tier (if applicable)
- Some OCI services have Always Free quotas, but Stack Monitoring free eligibility is not guaranteed.
- Treat Stack Monitoring as potentially billable even if you run it on Always Free compute. Verify in the price list and your tenancy’s billing dashboard.
Main cost drivers
-
Monitored resource-hours – The more hosts/middleware instances you monitor and the longer you keep them onboarded, the higher the cost.
-
Environment sprawl – Monitoring dev/test/stage/prod equally can multiply resource counts.
-
Alarm and notification volume – Large fleets with noisy thresholds can drive indirect operational cost and possibly service usage charges (depending on SKUs).
-
Logging ingestion – If you forward agent logs or enable additional diagnostic logs, ingestion and retention can add cost.
-
Network egress architecture – Agents need outbound access; NAT Gateways and data transfer patterns can add cost (particularly in hybrid designs).
Hidden or indirect costs
- Compute cost for monitored workloads is usually far larger than monitoring cost.
- Operations cost due to alert noise and insufficient routing.
- Change management overhead: agent deployment, patching, and inventory drift.
Network/data transfer implications
- Agent telemetry is outbound. In private subnets, you may pay for NAT Gateway usage and data processing (pricing varies).
- For on-prem → OCI telemetry, consider VPN/FastConnect costs and egress from your on-prem environment (not an OCI charge but a real cost).
How to optimize cost
- Monitor what matters:
- Start with production and critical shared services.
- Add dev/test selectively or use shorter retention and fewer alarms.
- Use compartments and tags to track ownership and chargeback.
- Standardize alert thresholds and use suppression/maintenance windows patterns (where supported) to reduce noise.
- Periodically remove stale monitored resources (decommissioned hosts).
Example low-cost starter estimate (no fabricated numbers)
A reasonable starter approach: – 1 monitored host for a short test period (1–3 days) – Minimal alarms (1–2) – Basic agent logs only
To estimate:
1. Find the Stack Monitoring meter in the Oracle Cloud Price List (your region).
2. Multiply:
– rate_per_monitored_resource_hour × 24 × number_of_days × number_of_resources
3. Add:
– Notifications deliveries (if any)
– Logging ingestion/retention (if enabled)
Example production cost considerations
For production you should model:
– N hosts + M middleware instances (as monitored resources)
– Multiple environments (prod + DR)
– Alarm count and notification throughput
– Hybrid connectivity and NAT/data costs
– Potential need for separate compartments per app domain (organizational overhead more than service cost)
10. Step-by-Step Hands-On Tutorial
Objective
Onboard a single OCI Compute Linux host into Oracle Cloud Stack Monitoring using the OCI Management Agent, verify that the host appears as a monitored resource, and configure a basic alert notification workflow.
Lab Overview
You will: 1. Create a compartment and IAM policies (or validate existing access). 2. Launch a small OCI compute instance. 3. Install and register the OCI Management Agent. 4. Run Stack Monitoring discovery (host). 5. Validate resource inventory and basic metrics visibility. 6. Configure a simple notification channel and an alarm (pattern). 7. Clean up all resources to avoid ongoing charges.
This lab is designed to be low-risk and low-cost. Actual charges depend on your tenancy and region—verify pricing before enabling monitoring in production.
Step 1: Prepare a compartment and IAM access
- In the OCI Console, open Identity & Security → Compartments.
- Create a compartment, for example:
– Name:
obs-lab– Description:Stack Monitoring lab compartment
Expected outcome: You have a dedicated compartment to keep lab resources isolated.
- Ensure your user (or group) has permissions. In Identity & Security → Policies, create or update a policy in the parent compartment (often the root compartment) that allows your group to manage Stack Monitoring and Management Agents in
obs-lab.
Example policy statements (adjust names):
Allow group ObservabilityAdmins to manage stack-monitoring-family in compartment obs-lab
Allow group ObservabilityAdmins to manage management-agents in compartment obs-lab
Allow group ObservabilityAdmins to read metrics in compartment obs-lab
Allow group ObservabilityAdmins to manage alarms in compartment obs-lab
Allow group ObservabilityAdmins to manage ons-topics in compartment obs-lab
Allow group ObservabilityAdmins to manage ons-subscriptions in compartment obs-lab
Verification: – You can open the Stack Monitoring pages without authorization errors. – You can access Observability & Management → Management Agents.
If policy validation fails: OCI policy families can differ by tenancy features. Use the console’s policy syntax helper and verify in official IAM docs.
Step 2: Create a compute instance for the agent
- Go to Compute → Instances and ensure you are in the correct region.
- Click Create instance.
-
Choose: – Name:
sm-lab-host-01– Compartment:obs-lab– Image: Oracle Linux (a current supported version) – Shape: a small VM shape suitable for labs – Networking: create a new VCN or use an existing lab VCN – Ensure outbound connectivity:- Easiest: public subnet with public IP (lab only)
- More realistic: private subnet + NAT Gateway
-
Create or use an existing SSH keypair and save the private key securely.
Expected outcome: Instance is in RUNNING state and you can SSH to it.
Verification (SSH):
ssh -i /path/to/private_key opc@<public_ip>
uname -a
If you used a private subnet, connect through a bastion or VPN as required.
Step 3: Install and register the OCI Management Agent
You must install the agent using the official install steps for your OS. The safest, most current method is to copy the install command directly from the OCI Console.
- In the OCI Console, go to Observability & Management → Management Agents.
- Select the compartment
obs-lab. - Click Download and install agent (or similarly named action).
-
Select your platform (Oracle Linux) and follow the wizard to: – Create or select an Agent Install Key (naming varies by console version) – Copy the installation command
-
SSH into your instance and run the copied command (as instructed by OCI). This often requires
sudo.
Expected outcome: The agent installs and registers, and the instance appears as an agent in the console.
Verification: – In Management Agents, your agent status becomes Active (may take a few minutes). – On the host, verify the agent service is running. The exact service name can differ by version; check the install output and official docs. Common checks include:
sudo systemctl status <agent-service-name>
If you’re not sure, list services:
sudo systemctl list-units --type=service | grep -i agent
Notes: – If the agent stays “Inactive”, the most common causes are missing outbound connectivity, DNS issues, time drift, or missing IAM permissions.
Step 4: Enable Stack Monitoring and run discovery for the host
- Go to Observability & Management → Stack Monitoring.
- Ensure compartment is set to
obs-lab. - Find the Discovery area (often under “Monitored Resources” or “Administration” depending on console layout).
-
Create a discovery job (name examples vary). Choose: – Discovery type: Host-based (or similar) – Agent: select your registered management agent – Scope/compartment:
obs-lab– Schedule: one-time for this lab -
Run the discovery job and wait for completion.
Expected outcome: The host becomes visible in Stack Monitoring as a monitored resource.
Verification:
– Go to Stack Monitoring → Monitored Resources.
– Filter by type (Host) and search for sm-lab-host-01.
– Open the resource and confirm you see:
– Resource details (name, compartment)
– Status/availability indicators (if provided)
– Metrics charts/tabs (exact charts depend on supported metrics)
If you do not see metrics immediately, wait several collection intervals.
Step 5: Create a notification topic and subscription (for alerts)
This step sets up where alarms will send notifications.
- Go to Developer Services → Notifications.
-
In compartment
obs-lab, create a Topic: – Name:sm-lab-alerts -
Create a Subscription: – Protocol: Email – Email: your address
- Confirm the subscription from the email you receive.
Expected outcome: Topic exists and your subscription is confirmed.
Verification: – Notifications topic shows subscription status as Confirmed.
Step 6: Create a basic alarm for the monitored host (pattern)
Alarm creation can be done either: – Directly in OCI Monitoring (common), or – From within Stack Monitoring metric views (if your console provides a shortcut)
Because metric namespaces and names can vary by resource type and release, the most reliable approach is to create an alarm from an existing metric chart in the console (where available), or to locate the relevant metric in OCI Monitoring.
Option A (recommended for beginners): create from a chart
1. In Stack Monitoring, open your monitored host resource.
2. Open a metric chart (for example CPU utilization if available).
3. Use the action menu (often “Create alarm”) to create an alarm based on that metric.
4. Configure:
– Alarm name: sm-lab-host-cpu-high
– Severity: Warning
– Trigger: CPU > threshold for N minutes (choose a conservative lab value)
– Notification: select topic sm-lab-alerts
Option B: create in OCI Monitoring 1. Go to Observability & Management → Monitoring → Alarms. 2. Click Create Alarm and select the metric emitted for your monitored resource. 3. Attach the Notifications topic.
Expected outcome: Alarm exists and is in “OK” state initially.
Verification: – View the alarm’s status in OCI Monitoring. – If you want to force a test, create a temporary CPU load on the host:
# Install stress tool if available in your distro repositories
sudo dnf -y install stress-ng || sudo yum -y install stress || true
# Run a short CPU stress (command may vary based on tool availability)
stress-ng --cpu 1 --timeout 180s || true
Then observe whether the alarm triggers (it may take several minutes depending on evaluation window).
Validation
Use this checklist:
-
Agent validation – Management Agent appears in Management Agents with status Active.
-
Discovery validation – Discovery job completed successfully. – Host appears under Stack Monitoring → Monitored Resources.
-
Metrics validation – You can view at least some host metrics charts (exact set varies).
-
Alerting validation – Notifications topic and confirmed subscription exist. – Alarm exists and shows evaluation/OK state. – Optional: alarm triggers under induced load (may take time).
Troubleshooting
Common issues and fixes:
-
Agent is not Active – Check outbound connectivity (security list/NSG, route tables, NAT/internet gateway). – Check DNS resolution from the host. – Ensure system time is correct (NTP/chrony). – Confirm IAM policies allow management agent operations in the compartment.
-
Discovery job fails – Verify the correct compartment is selected. – Verify you selected the correct agent. – Check agent logs (often available via local files and/or OCI Logging if enabled). – Confirm the resource type you’re trying to discover is supported (host discovery is the simplest baseline).
-
Host appears but no metrics – Wait for collection intervals. – Confirm the agent is running and not blocked by host firewall rules. – Verify whether additional plugins/config are required for metrics collection (depends on resource type; verify in official docs).
-
No alert emails received – Confirm subscription is “Confirmed”. – Check spam filters. – Ensure the alarm is actually firing and routed to the correct topic.
Cleanup
To avoid ongoing charges and clutter:
-
Delete alarm – Monitoring → Alarms → delete
sm-lab-host-cpu-high -
Delete Notifications subscription and topic – Notifications → Topic
sm-lab-alerts→ delete subscription – Delete the topic -
Deregister/delete management agent record (optional) – Observability & Management → Management Agents → select agent → delete/deregister (if your console supports it)
-
Terminate compute instance – Compute → Instances →
sm-lab-host-01→ Terminate – Choose to delete boot volume if you do not need it -
Delete compartment (optional) – Delete resources first; compartments can only be deleted when empty
11. Best Practices
Architecture best practices
- Start with a reference compartment model
prod,nonprod,shared-services,securityare common patterns.- Standardize onboarding
- Use consistent naming:
env-app-tier-##(e.g.,prod-payments-wls-01) - Use tags:
CostCenter,Owner,Environment,AppName - Design for hybrid
- If monitoring on-prem, plan network connectivity and egress carefully.
- Use VPN/FastConnect and tightly controlled outbound rules.
IAM/security best practices
- Least privilege
- Separate roles:
- Agent installers (can manage management-agents)
- Monitoring admins (can manage stack-monitoring)
- Readers (can read monitored resources/metrics)
- Compartment scoping
- Keep production monitoring configuration changes limited to a smaller admin group.
- Audit changes
- Review OCI Audit logs for changes to monitoring resources and policies.
Cost best practices
- Monitor production first
- Expand to non-prod based on value and budget.
- Prune stale monitored resources
- Decommissioned instances should be removed from monitoring.
- Reduce alert noise
- Use sane thresholds, longer evaluation windows, and maintenance patterns (where supported).
Performance best practices
- Agent resource footprint
- Validate CPU/memory overhead on small shapes.
- Keep OS patched and avoid resource contention.
- Collection interval tuning
- Use defaults first; tighten only when you have a clear need (and understand cost/noise implications).
Reliability best practices
- Egress resilience
- Ensure agents can reach OCI endpoints consistently (redundant NAT if required, stable DNS).
- Document runbooks
- For common alerts, include “first checks” and “escalation path”.
Operations best practices
- Golden signals
- Latency, traffic, errors, saturation—map host/middleware metrics to these signals.
- Unified incident routing
- Route alarms through centralized Notifications topics with consistent naming.
- Change management
- Treat monitoring changes like code where possible (review, test in non-prod).
Governance/tagging/naming best practices
- Use tags consistently:
Environment=Prod/DevOwnerEmail=team@company.comDataClassification=Internal/Restricted- Use naming conventions for alarms and topics:
alarm-prod-payments-host-cpu-hightopic-prod-oncall
12. Security Considerations
Identity and access model
- Stack Monitoring is governed by OCI IAM.
- Use:
- Groups for human users
- Policies scoped to compartments
- Where automation is used, consider OCI-native identities (for example instance principals/dynamic groups) as appropriate—verify the supported automation identity patterns for Stack Monitoring APIs you plan to use.
Encryption
- OCI services use encryption at rest and TLS in transit as standard practice, but details can vary per service and region.
- For sensitive environments, confirm:
- In-transit TLS requirements
- At-rest encryption behavior
- Whether customer-managed keys (KMS) are supported for any stored data (verify in official docs)
Network exposure
- Agents require outbound access to OCI endpoints.
- Avoid placing monitored hosts on open public networks in production.
- Prefer private subnets + NAT, or private connectivity designs (VPN/FastConnect).
Secrets handling
- Do not store agent install keys in source code repositories.
- Treat registration keys like credentials:
- Limit who can create them
- Rotate if exposure is suspected
- Store in a secure secrets manager (OCI Vault) if you must distribute them
Audit/logging
- Enable and review:
- OCI Audit logs for administrative actions
- Agent logs for connectivity/collection issues
- If you centralize logging, ensure logs don’t leak secrets or internal hostnames beyond approved boundaries.
Compliance considerations
- Validate:
- Data residency (region)
- Retention policies for metrics/logs
- Access review processes and separation of duties
Common security mistakes
- Overbroad policies like
manage all-resources in tenancy - Agents installed with excessive OS privileges without change control
- Sending alerts to public webhooks without authentication
- Monitoring production from a shared non-prod compartment
Secure deployment recommendations
- Use separate compartments for prod and non-prod.
- Lock down who can:
- Register agents
- Run discovery
- Create/modify alarms
- Require MFA for privileged accounts.
- Use private networking patterns and controlled egress.
13. Limitations and Gotchas
The most important limitation to understand: Stack Monitoring only provides deep monitoring for supported resource types and versions. Always check compatibility before you standardize on it.
Known limitations (verify current list in docs)
- Supported technologies are specific
- Not every middleware/database/platform is supported.
- Agent requirement
- Many onboarding flows require an agent and local access.
- Cross-region visibility
- Stack Monitoring is regional; if you operate multi-region, plan per-region monitoring and dashboards.
- Topology completeness
- Relationships depend on discovery and supported types; topology may not show every dependency in complex microservices.
Quotas
- You may encounter limits on:
- Agent counts
- Monitored resources
- Discovery jobs
- Alarm resources
- Review in OCI Console: Governance & Administration → Limits, Quotas and Usage (menu names may vary).
Regional constraints
- Service availability and supported target types can vary by region.
- Some features appear earlier in commercial regions than in restricted/government regions.
Pricing surprises
- Monitoring non-prod at the same scale as prod can multiply costs.
- NAT gateways and logging ingestion can add unexpected cost.
- Long retention and high-frequency collection can increase telemetry volume (depending on service behavior).
Compatibility issues
- OS version and package compatibility for management agent installation.
- Corporate proxy/TLS inspection breaking agent connectivity.
- Host hardening policies blocking agent services or required system calls.
Operational gotchas
- Alarm fatigue if thresholds are too aggressive.
- Discovery drift if environments are frequently rebuilt (immutable infrastructure)—consider automating agent install and cleanup.
Migration challenges
- Migrating from Enterprise Manager or Prometheus requires:
- Metric mapping
- Alarm re-creation
- On-call process updates
- Don’t attempt a “big bang” cutover; run in parallel until confidence is high.
Vendor-specific nuances
- Stack Monitoring aligns well with Oracle technology stacks, but for heterogeneous stacks you may need additional tools.
14. Comparison with Alternatives
Nearest services in Oracle Cloud
- OCI Monitoring: great for OCI resource metrics and alarms; less focused on topology for application stacks.
- OCI Logging / Logging Analytics: logs-centric (search, parsing, analytics) rather than stack topology.
- OCI APM: code-level tracing and performance for applications; different focus than infrastructure stack monitoring.
- Database Management / Operations Insights: database operations and capacity analytics; complementary.
Nearest services in other clouds (not OCI)
- AWS CloudWatch (metrics/logs/alarms) + CloudWatch Application Insights (application-centric patterns)
- Azure Monitor + Application Insights
- Google Cloud Operations Suite (Cloud Monitoring/Logging/Trace)
Open-source or self-managed alternatives
- Prometheus + Alertmanager + Grafana
- OpenTelemetry collectors + backends (Prometheus, Tempo, Loki, etc.)
- Zabbix, Nagios (legacy but common)
Comparison table
| Option | Best For | Strengths | Weaknesses | When to Choose |
|---|---|---|---|---|
| OCI Stack Monitoring | Stack-level monitoring with supported resource discovery/topology in Oracle Cloud | Topology + inventory model; OCI IAM/compartments; agent-based hybrid potential | Supported targets are specific; requires agent rollout; pricing depends on meters | You run supported Oracle stacks and want OCI-native operations |
| OCI Monitoring | OCI resource metrics + alarms | Simple, native, scalable; integrates with Notifications | Less “application stack topology” context by itself | You mainly monitor OCI services and need alarms quickly |
| OCI Logging / Logging Analytics | Central log collection and analysis | Great for troubleshooting text logs and searches | Not a replacement for metrics/topology; can become expensive at scale | You need log search, parsing, and forensic investigation |
| OCI APM | Distributed tracing and app performance | Deep application insight; traces, spans, service maps (APM-specific) | Requires instrumentation; different focus than host/middleware ops | You need code-level insight and distributed tracing |
| Oracle Enterprise Manager (self-managed) | Deep Oracle ecosystem monitoring on-prem | Mature enterprise monitoring for Oracle tech | You manage infrastructure and upgrades; licensing/ops overhead | You need on-prem centralized monitoring and already run OEM |
| Prometheus + Grafana (self-managed) | Vendor-neutral metrics at scale | Flexible, open ecosystem; strong community | You operate and scale it; topology modeling is custom | You need portability and have platform engineering bandwidth |
| Datadog / New Relic (SaaS) | Full-stack observability across vendors | Quick onboarding, broad integrations | Subscription costs; data residency concerns | You want fast time-to-value across heterogeneous stacks |
15. Real-World Example
Enterprise example (hybrid Oracle middleware estate)
Problem
A financial services organization runs:
– On-prem WebLogic clusters
– Oracle databases
– A growing OCI footprint for new services
Incidents take too long because ownership is split and dependencies aren’t visible.
Proposed architecture
– Install OCI Management Agent on:
– OCI compute hosts running middleware
– On-prem VMs (where allowed)
– Use Stack Monitoring in an OCI region aligned to compliance needs
– Organize monitored resources by compartments:
– prod-payments, prod-risk, shared-services
– Centralize alarms through:
– OCI Monitoring alarms
– OCI Notifications topics per domain (payments-oncall, risk-oncall)
– Enable OCI Logging for agent/service diagnostics; send to centralized logging compartment if required
Why Stack Monitoring was chosen – Need stack-centric visibility and topology for supported Oracle technologies – Strong alignment with OCI governance (IAM + compartments) – Supports hybrid patterns via agent connectivity
Expected outcomes – Faster triage via topology and consistent dashboards – Better separation of duties and auditability – More predictable onboarding for new environments
Startup/small-team example (lean operations on OCI compute)
Problem A small SaaS team runs several OCI compute instances hosting: – A small middleware tier – Background jobs They lack a dedicated ops team and want early warnings for host saturation.
Proposed architecture – Install Management Agent on each instance – Use Stack Monitoring to maintain inventory and basic host health – Create a few alarms: – CPU sustained high – Disk usage above threshold – Send alerts to a single Notifications topic subscribed by on-call email
Why Stack Monitoring was chosen – Low friction to onboard hosts (agent + discovery) – Central view in OCI without building a full Prometheus stack
Expected outcomes – Reduced “surprise outages” from disk exhaustion – Clearer visibility into which instance is unhealthy – A baseline monitoring foundation that can expand later
16. FAQ
1) Is Stack Monitoring the same as OCI Monitoring?
No. OCI Monitoring is a general metrics and alarms service for OCI resources. Stack Monitoring focuses on stack-level monitored resources, discovery, and topology across supported components, typically using the Management Agent.
2) Do I need to install an agent?
Often yes. Stack Monitoring commonly relies on the OCI Management Agent for discovery and telemetry collection.
3) Can I monitor on-premises servers with Stack Monitoring?
Potentially, if you can install the Management Agent and provide secure outbound connectivity to OCI endpoints. Verify supported OS and network requirements in the Management Agent docs.
4) Is Stack Monitoring regional?
Yes, it operates within an OCI region. Plan multi-region monitoring accordingly.
5) Does Stack Monitoring support Kubernetes monitoring?
Do not assume so. OCI has Kubernetes observability patterns using other services (and open standards). Check the Stack Monitoring supported resources list in official docs.
6) What Oracle technologies are supported?
Support is specific (and changes over time). Use the official Stack Monitoring documentation for the current supported resource types and versions.
7) How do alarms work with Stack Monitoring?
In many OCI setups, metrics can be used with OCI Monitoring alarms and routed via OCI Notifications. Exact workflows can vary; verify in official docs for your resource type.
8) Can I automate onboarding?
Yes, typically through agent installation automation (cloud-init/Ansible) and OCI APIs/SDKs where available. Verify the Stack Monitoring API reference for supported operations.
9) What’s the difference between Stack Monitoring and OCI APM?
APM is for application-level observability (traces, spans, service performance). Stack Monitoring focuses more on infrastructure/middleware stack monitoring with topology and curated metrics.
10) Does Stack Monitoring store logs?
Stack Monitoring is primarily metrics/topology-oriented. Logs are usually handled by OCI Logging / Logging Analytics. Agents generate logs that you can collect and analyze separately.
11) How secure is agent communication?
Agents communicate to OCI endpoints over TLS. For strict environments, validate certificates, proxy behavior, and network paths per your security policies.
12) Can I use private networking only (no public internet) for agents?
Often you can use controlled egress (NAT, proxies, private connectivity). Whether fully private endpoints are supported depends on region/service capabilities—verify in official docs.
13) How quickly do metrics appear after onboarding?
Usually within minutes, but depends on collection intervals and discovery completion. If metrics don’t appear, check agent status and logs.
14) What should I monitor first?
Start with:
– Availability
– CPU/memory saturation
– Disk utilization
Then expand into middleware/database signals as supported.
15) How do I avoid alert fatigue?
Use longer evaluation windows, tune thresholds based on baselines, route alerts by ownership, and silence alerts during planned maintenance (using your organization’s standard process and supported OCI features).
17. Top Online Resources to Learn Stack Monitoring
| Resource Type | Name | Why It Is Useful |
|---|---|---|
| Official documentation | Stack Monitoring docs — https://docs.oracle.com/en-us/iaas/stack-monitoring/ | Primary source for supported resources, onboarding, discovery, and concepts |
| Official documentation | Management Agent docs — https://docs.oracle.com/en-us/iaas/management-agents/ | Installation, registration, troubleshooting, and OS support for the agent |
| Official pricing | Oracle Cloud Pricing — https://www.oracle.com/cloud/pricing/ | High-level pricing entry point and service pricing navigation |
| Official price list | Oracle Cloud Price List — https://www.oracle.com/cloud/price-list/ | Authoritative SKU/meter definitions (search for “Stack Monitoring”) |
| Official cost tool | Oracle Cloud Cost Estimator — https://www.oracle.com/cloud/costestimator.html | Practical way to model costs without guessing |
| Official docs | OCI CLI installation — https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/cliinstall.htm | Useful if you want to script onboarding and operations |
| Official architecture | Oracle Architecture Center — https://docs.oracle.com/en/solutions/ | Reference architectures and best practices (use search for observability/monitoring patterns) |
| Official product overview | Observability and Management overview — https://www.oracle.com/cloud/observability/ | Portfolio context; helps choose the right OCI observability services |
| Community (use with care) | Oracle Cloud customer/community blogs (search) | Practical war stories and setups; always validate against official docs |
| Community (tooling) | OpenTelemetry documentation — https://opentelemetry.io/docs/ | Helpful for broader observability strategy; complementary to OCI-native services |
18. Training and Certification Providers
| Institute | Suitable Audience | Likely Learning Focus | Mode | Website URL |
|---|---|---|---|---|
| DevOpsSchool.com | DevOps engineers, SREs, platform teams | DevOps practices, cloud operations, observability fundamentals | Check website | https://www.devopsschool.com/ |
| ScmGalaxy.com | Beginners to intermediate engineers | DevOps/SCM concepts, CI/CD, tooling foundations | Check website | https://www.scmgalaxy.com/ |
| CLoudOpsNow.in | Cloud engineers, operations teams | Cloud operations and production support practices | Check website | https://www.cloudopsnow.in/ |
| SreSchool.com | SREs, reliability engineers, ops leads | SRE principles, incident response, monitoring/alerting strategy | Check website | https://www.sreschool.com/ |
| AiOpsSchool.com | Ops teams exploring automation | AIOps concepts, automation patterns, event correlation | Check website | https://www.aiopsschool.com/ |
19. Top Trainers
| Platform/Site | Likely Specialization | Suitable Audience | Website URL |
|---|---|---|---|
| RajeshKumar.xyz | DevOps/cloud training content (verify offerings) | Beginners to practitioners | https://rajeshkumar.xyz/ |
| devopstrainer.in | DevOps coaching and workshops (verify offerings) | DevOps engineers and teams | https://www.devopstrainer.in/ |
| devopsfreelancer.com | Freelance DevOps enablement (verify offerings) | Small teams needing practical guidance | https://www.devopsfreelancer.com/ |
| devopssupport.in | DevOps support and training resources (verify offerings) | Ops/DevOps teams | https://www.devopssupport.in/ |
20. Top Consulting Companies
| Company | Likely Service Area | Where They May Help | Consulting Use Case Examples | Website URL |
|---|---|---|---|---|
| cotocus.com | Cloud/DevOps consulting (verify specialties) | Architecture, implementation, operations | Designing observability baselines; onboarding environments; building runbooks | https://cotocus.com/ |
| DevOpsSchool.com | DevOps consulting and enablement | DevOps transformations, tooling, training | Implementing monitoring strategy; alerting governance; operational playbooks | https://www.devopsschool.com/ |
| DEVOPSCONSULTING.IN | DevOps consulting (verify specialties) | CI/CD, cloud ops, observability foundations | Standardizing metrics/alarms; integrating notifications; operational maturity uplift | https://www.devopsconsulting.in/ |
21. Career and Learning Roadmap
What to learn before Stack Monitoring
- OCI fundamentals:
- Compartments, VCNs, subnets, routing, NAT/IGW
- IAM policies, groups, and least privilege
- Linux fundamentals:
- systemd services, packages, logs, firewall basics
- Observability basics:
- Metrics vs logs vs traces
- SLI/SLO concepts and alert hygiene
What to learn after Stack Monitoring
- OCI Monitoring deep dive:
- Metric queries, alarms, notification routing patterns
- OCI Logging / Logging Analytics:
- Centralized log ingestion, parsing, and search
- OCI APM (if you need app-level telemetry)
- Infrastructure as Code:
- Terraform for OCI (to manage alarms, topics, compartments, and compute)
- Incident management:
- Runbooks, escalation policies, postmortems, error budgets
Job roles that use it
- Cloud Engineer (operations)
- DevOps Engineer
- Site Reliability Engineer (SRE)
- Platform Engineer
- Middleware Administrator / Operations
- Cloud Solution Architect (operational design)
Certification path (if available)
Oracle certifications evolve. For current OCI certification tracks, verify on Oracle University: https://education.oracle.com/
Stack Monitoring is often learned as part of broader OCI operations/observability skillsets rather than a single dedicated certification.
Project ideas for practice
- Build a “prod vs non-prod” compartment model and onboard 5 hosts with consistent tags.
- Create standard alarms for CPU, memory, disk and route to different on-call topics.
- Implement a maintenance workflow (document + test) for patch windows to reduce noise.
- Run a failure injection exercise (disk fill, CPU load) and validate alerting + triage steps.
- Hybrid practice: install an agent on a VM outside OCI (lab) and validate secure connectivity.
22. Glossary
- Agent: A software component installed on a host to collect telemetry and interact with a management service.
- OCI Management Agent: Oracle-provided agent used for data collection and discovery for management/observability services.
- Monitored Resource: A logical object in Stack Monitoring representing a discovered component (host, middleware, etc.).
- Discovery Job: A process that scans/identifies supported resources and onboards them as monitored resources.
- Topology: A representation of relationships and dependencies between monitored resources.
- Compartment: OCI’s logical container for organizing and isolating resources and access control.
- IAM Policy: A statement defining what actions a group or dynamic group can perform on which resource types in a scope.
- Metric: A time-series numerical measurement (CPU %, memory usage, response time).
- Alarm: A rule that evaluates metric data and triggers notifications when conditions are met.
- Notifications Topic: A message distribution channel in OCI Notifications, used to deliver alarm messages to subscribers.
- SLI/SLO: Service Level Indicator / Objective, used to define and measure reliability targets.
- Alert fatigue: Operational risk when too many alerts reduce the ability to respond effectively.
- NAT Gateway: Provides outbound internet connectivity for resources in private subnets without inbound exposure.
- Service Gateway: Allows private access from a VCN to certain OCI services without using the public internet (availability depends on service).
23. Summary
Oracle Cloud Stack Monitoring is an Observability and Management service that helps you discover, monitor, and troubleshoot supported stack components using agent-based telemetry, monitored resource inventory, and topology views. It matters because real incidents are rarely isolated to one layer—Stack Monitoring is designed to make dependencies and operational signals easier to interpret.
Cost is typically driven by how many resources you monitor and for how long, plus any related usage in alarms, notifications, logging, and network egress. Security and governance rely heavily on OCI IAM, compartment design, controlled agent rollout, and audited operational processes.
Use Stack Monitoring when you want OCI-native, stack-aware monitoring (especially in Oracle technology environments). If you need deep application tracing, consider pairing it with OCI APM, and if you need log forensics, pair it with OCI Logging / Logging Analytics.
Next step: review the official Stack Monitoring docs and expand the lab to include your real production compartment model and a standardized alarm/notification routing strategy:
https://docs.oracle.com/en-us/iaas/stack-monitoring/