{"id":275,"date":"2026-04-13T11:17:05","date_gmt":"2026-04-13T11:17:05","guid":{"rendered":"https:\/\/www.devopsschool.com\/tutorials\/aws-amazon-managed-service-for-prometheus-tutorial-architecture-pricing-use-cases-and-hands-on-guide-for-management-and-governance\/"},"modified":"2026-04-13T11:17:05","modified_gmt":"2026-04-13T11:17:05","slug":"aws-amazon-managed-service-for-prometheus-tutorial-architecture-pricing-use-cases-and-hands-on-guide-for-management-and-governance","status":"publish","type":"post","link":"https:\/\/www.devopsschool.com\/tutorials\/aws-amazon-managed-service-for-prometheus-tutorial-architecture-pricing-use-cases-and-hands-on-guide-for-management-and-governance\/","title":{"rendered":"AWS Amazon Managed Service for Prometheus Tutorial: Architecture, Pricing, Use Cases, and Hands-On Guide for Management and governance"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">Category<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Management and governance<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">1. Introduction<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Amazon Managed Service for Prometheus is AWS\u2019s managed, Prometheus-compatible metrics monitoring service. It is designed for teams that want Prometheus-style scraping, time-series storage, and PromQL querying without operating and scaling a long-lived Prometheus server fleet.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In simple terms: you run Prometheus (or a Prometheus-compatible collector) close to your workloads to scrape metrics, and you \u201cremote write\u201d those metrics into Amazon Managed Service for Prometheus. AWS stores them durably and lets you query them using PromQL, typically visualized in Grafana (often Amazon Managed Grafana).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Technically, Amazon Managed Service for Prometheus provides <strong>managed Prometheus workspaces<\/strong> with <strong>ingestion endpoints<\/strong> (Prometheus <code>remote_write<\/code>) and <strong>query endpoints<\/strong> (PromQL via the Prometheus HTTP API). Authentication and authorization are handled through <strong>AWS IAM<\/strong> (SigV4 request signing), and access can be controlled through IAM policies, VPC networking options (where available), and auditing via AWS CloudTrail.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The problem it solves: self-managed Prometheus can become operationally heavy at scale\u2014high-availability, long retention, multi-AZ durability, storage growth, upgrades, query performance, and tenancy boundaries. Amazon Managed Service for Prometheus offloads much of that undifferentiated operational work while keeping Prometheus compatibility.<\/p>\n\n\n\n<blockquote>\n<p>Naming note (important for AWS docs and APIs): In AWS documentation and APIs you may see references to <strong>AMP<\/strong> (common shorthand) and the API namespace historically called <strong>APS<\/strong>. The current service name is <strong>Amazon Managed Service for Prometheus<\/strong>. Verify the latest naming in the official docs if you see older terms in tooling or examples.<\/p>\n<\/blockquote>\n\n\n\n<h2 class=\"wp-block-heading\">2. What is Amazon Managed Service for Prometheus?<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Amazon Managed Service for Prometheus is a <strong>fully managed, Prometheus-compatible metrics service<\/strong> in AWS. Its official purpose is to let you <strong>collect, store, and query Prometheus metrics at scale<\/strong> while reducing the operational burden of running Prometheus infrastructure yourself.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Core capabilities<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Managed workspaces<\/strong> for metrics isolation (often mapped to an environment, team, or application domain).<\/li>\n<li><strong>Prometheus <code>remote_write<\/code> ingestion<\/strong> so you can keep scraping local to the workload (Kubernetes, EC2, on-prem, other clouds) and centralize durable storage in AWS.<\/li>\n<li><strong>PromQL querying<\/strong> through Prometheus-compatible query APIs.<\/li>\n<li><strong>AWS-native identity and access control<\/strong> using IAM (SigV4) for ingestion and querying.<\/li>\n<li><strong>Integrations<\/strong> with Grafana (including Amazon Managed Grafana), Kubernetes (Amazon EKS), and AWS observability tooling (commonly via AWS Distro for OpenTelemetry).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Major components<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Workspace<\/strong>: The primary logical container in Amazon Managed Service for Prometheus. A workspace has endpoints for ingesting and querying metrics.<\/li>\n<li><strong>Ingestion endpoint<\/strong>: Receives <code>remote_write<\/code> traffic from Prometheus servers\/agents\/collectors.<\/li>\n<li><strong>Query endpoint<\/strong>: Serves PromQL queries (commonly from Grafana).<\/li>\n<li><strong>IAM authorization<\/strong>: Controls who\/what can write metrics and who\/what can query.<\/li>\n<li><strong>(Optional \/ verify in official docs)<\/strong> Rule evaluation and alerting capabilities: AWS documentation has evolved over time; verify whether your region and account support managed rule groups and alerting features in Amazon Managed Service for Prometheus, and what the pricing dimensions are for those features.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Service type and scope<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Service type<\/strong>: Managed service for time-series metrics (Prometheus-compatible).<\/li>\n<li><strong>Scope<\/strong>: Workspaces are <strong>regional<\/strong> resources (you create a workspace in an AWS Region). Access is scoped by <strong>AWS account<\/strong> and <strong>IAM<\/strong>, and you can further isolate by creating multiple workspaces per account\/region.<\/li>\n<li><strong>Multi-environment design<\/strong>: Commonly, you use separate workspaces for <code>dev<\/code>, <code>test<\/code>, <code>prod<\/code>, or per business unit\/team.<\/li>\n<li><strong>How it fits into the AWS ecosystem<\/strong>:<\/li>\n<li><strong>Amazon EKS \/ Kubernetes<\/strong>: scrape cluster workloads locally and remote_write to the workspace.<\/li>\n<li><strong>Amazon Managed Grafana<\/strong>: visualize and alert on Prometheus metrics (Grafana data source for AMP).<\/li>\n<li><strong>AWS IAM + CloudTrail<\/strong>: govern and audit access.<\/li>\n<li><strong>AWS networking<\/strong>: route traffic over the internet, or via private networking patterns where supported (verify PrivateLink\/VPC endpoint support for your region).<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">3. Why use Amazon Managed Service for Prometheus?<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Business reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Reduce operational toil<\/strong>: Avoid running, patching, scaling, and backing up Prometheus storage clusters.<\/li>\n<li><strong>Faster time-to-value<\/strong>: Provision a workspace quickly and connect existing Prometheus scrapers\/collectors.<\/li>\n<li><strong>Standardize observability<\/strong>: Provide a consistent metrics backend across teams and accounts.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Technical reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Prometheus compatibility<\/strong>: Works with Prometheus\u2019s data model, PromQL, and common exporters.<\/li>\n<li><strong>Decouple scraping from storage<\/strong>: Keep scrapes close to targets (reduces cross-network scrape complexity), but centralize storage and queries.<\/li>\n<li><strong>Elastic-ish backend behavior<\/strong>: Managed backends typically absorb spikes and growth better than a single Prometheus server (exact scaling characteristics are AWS-managed; verify performance guidance in docs).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Operational reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Workspace isolation<\/strong>: Separate tenants and environments to limit blast radius.<\/li>\n<li><strong>Simplified HA<\/strong>: You can run multiple scrapers\/agents for resiliency while using a centralized managed store.<\/li>\n<li><strong>Centralized access control<\/strong>: IAM controls ingestion and querying.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Security\/compliance reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>IAM-authenticated endpoints<\/strong>: No shared static passwords required for basic access control.<\/li>\n<li><strong>Auditability<\/strong>: Management actions and many access patterns can be audited with CloudTrail (verify which data plane events are recorded in your setup).<\/li>\n<li><strong>Encryption<\/strong>: AWS-managed encryption at rest and TLS in transit (verify exact encryption model and key options in docs).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Scalability\/performance reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Handle higher ingestion volumes<\/strong> than a single Prometheus server is comfortable with, especially when metrics retention needs grow.<\/li>\n<li><strong>Enable multi-cluster, multi-account, multi-environment patterns<\/strong> more cleanly than ad hoc Prometheus federation.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">When teams should choose it<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Choose Amazon Managed Service for Prometheus when:\n&#8211; You already use Prometheus exporters and PromQL and want a managed, AWS-governed backend.\n&#8211; You need centralized, durable metrics storage across many Kubernetes clusters, accounts, or regions.\n&#8211; You want to pair it with Amazon Managed Grafana for a managed \u201cPrometheus + Grafana\u201d stack.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">When teams should not choose it<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Avoid (or reconsider) Amazon Managed Service for Prometheus when:\n&#8211; You only need basic infrastructure metrics and are fine with <strong>Amazon CloudWatch Metrics<\/strong> dashboards and alarms.\n&#8211; You rely heavily on <strong>Prometheus local features<\/strong> that assume a local TSDB (very low-latency local queries, ad hoc label exploration) and don\u2019t want remote-write architecture tradeoffs.\n&#8211; Your primary need is <strong>logs or traces<\/strong> rather than metrics (consider CloudWatch Logs, AWS X-Ray, and OpenTelemetry pipelines).\n&#8211; You have strict requirements that depend on features not supported by the managed backend (verify supported Prometheus API surface and limits in the official docs).<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">4. Where is Amazon Managed Service for Prometheus used?<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Industries<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>SaaS and software companies running microservices at scale.<\/li>\n<li>Financial services and regulated industries needing governance and audit trails.<\/li>\n<li>Media\/gaming platforms with bursty workloads and strong SLO\/SLA monitoring.<\/li>\n<li>Healthcare and enterprise IT where standardized monitoring is required.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Team types<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>SRE and platform engineering teams centralizing observability.<\/li>\n<li>DevOps teams supporting multiple application squads.<\/li>\n<li>Security and compliance teams needing consistent access controls and audit logs.<\/li>\n<li>Operations teams migrating from legacy monitoring stacks.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Workloads<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Kubernetes workloads (Amazon EKS, self-managed K8s on EC2, on-prem).<\/li>\n<li>ECS\/EC2-based services running Prometheus exporters.<\/li>\n<li>Hybrid and multi-cloud workloads where Prometheus scraping occurs locally, with metrics centralized into AWS.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Architectures<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Multi-cluster Kubernetes platforms with a shared metrics backend.<\/li>\n<li>Multi-account AWS organizations where each account\/environment writes to designated workspaces.<\/li>\n<li>Event-driven \/ autoscaled architectures where cardinality and ingestion volume fluctuate.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Production vs dev\/test usage<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Dev\/test<\/strong>: single workspace, short-lived collectors, limited retention needs, light dashboards.<\/li>\n<li><strong>Production<\/strong>: multiple workspaces, strict IAM boundaries, private networking patterns, cost controls for high cardinality, standardized dashboards and alerting, and disaster recovery plans for collectors.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">5. Top Use Cases and Scenarios<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Below are realistic scenarios where Amazon Managed Service for Prometheus is commonly used.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1) Centralized Kubernetes metrics backend for many clusters<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Each cluster runs its own Prometheus; long retention and HA are complex and expensive.<\/li>\n<li><strong>Why this service fits<\/strong>: Remote write from each cluster into a managed workspace; consistent PromQL across all clusters.<\/li>\n<li><strong>Example scenario<\/strong>: A platform team runs 20 EKS clusters (prod\/stage\/dev) and centralizes metrics into per-environment workspaces.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">2) Long-term retention for SLO and capacity planning<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Local Prometheus disks fill up; retaining months of metrics requires custom storage and maintenance.<\/li>\n<li><strong>Why this service fits<\/strong>: Managed storage with AWS-operated durability; queries still use PromQL.<\/li>\n<li><strong>Example scenario<\/strong>: Capacity planning dashboards require historical CPU\/memory\/latency trends across quarters.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">3) Multi-tenant observability for internal platform customers<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: One shared Prometheus becomes a noisy multi-tenant system with weak isolation.<\/li>\n<li><strong>Why this service fits<\/strong>: Workspaces provide logical isolation; IAM policies restrict access.<\/li>\n<li><strong>Example scenario<\/strong>: Business unit A and B get separate workspaces and Grafana folders, with least-privilege IAM roles.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">4) Standard metrics backend for microservices on ECS\/EC2<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Teams run exporters but have inconsistent storage\/query tooling.<\/li>\n<li><strong>Why this service fits<\/strong>: Keep exporters and Prometheus scrape patterns; central backend for queries.<\/li>\n<li><strong>Example scenario<\/strong>: Node Exporter + application metrics scraped by a small Prometheus agent on each host, remote-written centrally.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">5) Hybrid monitoring (on-prem + AWS)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: On-prem workloads are monitored separately; unifying metrics is hard.<\/li>\n<li><strong>Why this service fits<\/strong>: Remote write can traverse secure connectivity (VPN\/Direct Connect); store metrics centrally.<\/li>\n<li><strong>Example scenario<\/strong>: On-prem Kubernetes and AWS EKS both remote_write to the same workspace (with careful label\/tenant design).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6) Migration from self-managed Prometheus + Thanos\/Cortex-like stacks<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Existing stack is complex to operate; upgrades and scaling are risky.<\/li>\n<li><strong>Why this service fits<\/strong>: Managed backend reduces ops burden while keeping PromQL and Prometheus exporters.<\/li>\n<li><strong>Example scenario<\/strong>: Replace object-store-backed long-term storage components with Amazon Managed Service for Prometheus.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">7) Governance-focused monitoring for regulated environments<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Need auditable access control, centralized policies, and controlled sharing.<\/li>\n<li><strong>Why this service fits<\/strong>: IAM policies + CloudTrail; workspace isolation; integrate with AWS Organizations patterns.<\/li>\n<li><strong>Example scenario<\/strong>: Production workspace access limited to SRE role; developers only see aggregated metrics via Grafana.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">8) Observability for ephemeral workloads and autoscaling fleets<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Instances\/pods churn; scraping and retention become brittle.<\/li>\n<li><strong>Why this service fits<\/strong>: Local scraping continues; backend provides stability and consistent query history (subject to retention).<\/li>\n<li><strong>Example scenario<\/strong>: Spot-heavy fleets export metrics; collectors remote_write while instances come and go.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">9) Metrics backbone for incident response and postmortems<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: During incidents, local Prometheus may overload or be unavailable.<\/li>\n<li><strong>Why this service fits<\/strong>: Central managed backend supports cross-service queries; access controlled.<\/li>\n<li><strong>Example scenario<\/strong>: Incident commander uses PromQL to correlate saturation and error rate trends across services.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">10) Organization-wide standardization on Grafana + Prometheus<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Different teams use different metrics systems; dashboards are fragmented.<\/li>\n<li><strong>Why this service fits<\/strong>: Prometheus-compatible backend pairs cleanly with Grafana; consistent approach across teams.<\/li>\n<li><strong>Example scenario<\/strong>: Adopt Amazon Managed Grafana with standardized dashboards and AMP workspaces per environment.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">6. Core Features<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The features below reflect common, current capabilities of Amazon Managed Service for Prometheus. Always confirm region-by-region availability in the official docs.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Managed Prometheus workspaces<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Provides a logical container for metrics ingestion and querying.<\/li>\n<li><strong>Why it matters<\/strong>: Enables isolation, environment separation, and controlled access.<\/li>\n<li><strong>Practical benefit<\/strong>: Create separate workspaces for prod\/dev or per team; reduce blast radius.<\/li>\n<li><strong>Caveats<\/strong>: Workspaces are regional; cross-region designs require explicit architecture.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Prometheus <code>remote_write<\/code> ingestion endpoint<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Accepts Prometheus remote write traffic from Prometheus servers\/agents\/collectors.<\/li>\n<li><strong>Why it matters<\/strong>: Lets you keep local scraping (close to workloads) while centralizing storage.<\/li>\n<li><strong>Practical benefit<\/strong>: Minimizes cross-network scraping complexity; supports hybrid models.<\/li>\n<li><strong>Caveats<\/strong>: Remote write requires careful queue tuning and failure handling on the sender side (WAL growth, retries).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">PromQL query endpoint (Prometheus-compatible API)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Serves PromQL queries via Prometheus query APIs.<\/li>\n<li><strong>Why it matters<\/strong>: Grafana dashboards and alerts often rely on PromQL.<\/li>\n<li><strong>Practical benefit<\/strong>: Move backend without rewriting dashboards.<\/li>\n<li><strong>Caveats<\/strong>: Not every Prometheus API endpoint is necessarily supported exactly like OSS Prometheus; verify supported API surface.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">IAM (SigV4) authentication and authorization<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Uses AWS request signing and IAM policies to authorize <code>remote_write<\/code> and query actions.<\/li>\n<li><strong>Why it matters<\/strong>: Removes the need for shared passwords and enables least privilege.<\/li>\n<li><strong>Practical benefit<\/strong>: Use IAM roles for EKS service accounts (IRSA), EC2 instance roles, ECS task roles, etc.<\/li>\n<li><strong>Caveats<\/strong>: SigV4 signing can be a learning curve; many teams use a signing proxy or AWS-supported collectors.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Encryption in transit and at rest<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: TLS endpoints and AWS-managed encryption at rest.<\/li>\n<li><strong>Why it matters<\/strong>: Protects metrics (which can be sensitive due to labels, hostnames, internal topology).<\/li>\n<li><strong>Practical benefit<\/strong>: Meets baseline security expectations with minimal effort.<\/li>\n<li><strong>Caveats<\/strong>: If you require customer-managed keys (KMS CMKs) or specific compliance controls, verify current support.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Integration with Grafana (including Amazon Managed Grafana)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Grafana can use AMP as a Prometheus data source to build dashboards and alerts.<\/li>\n<li><strong>Why it matters<\/strong>: Grafana is a common UI for Prometheus metrics.<\/li>\n<li><strong>Practical benefit<\/strong>: Managed visualization reduces ops overhead and improves consistency.<\/li>\n<li><strong>Caveats<\/strong>: Grafana is a separate service with separate pricing and access configuration.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Kubernetes integration patterns (EKS + collectors)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Supports common K8s collection patterns (Prometheus Operator, Helm charts, OpenTelemetry Collector with Prometheus receiver + remote_write exporter).<\/li>\n<li><strong>Why it matters<\/strong>: Kubernetes is a primary environment for Prometheus metrics.<\/li>\n<li><strong>Practical benefit<\/strong>: Standardized cluster monitoring across many clusters.<\/li>\n<li><strong>Caveats<\/strong>: The managed service does not scrape targets for you; you still run collectors\/scrapers.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">(Optional \/ verify) Managed rules and alerting<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Some managed Prometheus offerings support recording\/alerting rules evaluated centrally.<\/li>\n<li><strong>Why it matters<\/strong>: Central evaluation can reduce per-cluster Prometheus load.<\/li>\n<li><strong>Practical benefit<\/strong>: Standard alert rules across fleets.<\/li>\n<li><strong>Caveats<\/strong>: Availability, configuration model, and pricing may differ by region and over time\u2014verify in official docs and pricing pages.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">7. Architecture and How It Works<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">High-level architecture<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Amazon Managed Service for Prometheus uses a <strong>push-to-backend<\/strong> model:\n1. You run a scraper\/collector (Prometheus, agent, or OpenTelemetry Collector) near your workloads.\n2. The scraper collects metrics from exporters and <code>\/metrics<\/code> endpoints.\n3. The scraper sends metrics to Amazon Managed Service for Prometheus using Prometheus <code>remote_write<\/code>.\n4. Dashboards and operational queries are executed against the workspace query endpoint using PromQL.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This differs from \u201ccentral Prometheus scrapes everything\u201d designs. The remote_write model scales better across networks and accounts because scraping remains local.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Request\/data\/control flow<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Control plane<\/strong><\/li>\n<li>Create\/delete workspaces<\/li>\n<li>Configure aliases\/tags (where supported)<\/li>\n<li>Manage access with IAM<\/li>\n<li>Audit actions with CloudTrail<\/li>\n<li><strong>Data plane<\/strong><\/li>\n<li>Ingest: <code>remote_write<\/code> HTTP requests to the workspace endpoint (SigV4 signed)<\/li>\n<li>Query: PromQL requests from Grafana\/CLI to the query endpoint (SigV4 signed)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Integrations with related AWS services<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Amazon Managed Grafana<\/strong>: Prometheus data source to query metrics and build dashboards.<\/li>\n<li><strong>Amazon EKS<\/strong>: Run Prometheus collectors in-cluster; use IRSA to grant write\/query permissions.<\/li>\n<li><strong>AWS Distro for OpenTelemetry (ADOT)<\/strong>: Common choice to scrape Prometheus endpoints and export via Prometheus remote_write with IAM auth.<\/li>\n<li><strong>AWS CloudTrail<\/strong>: Audits workspace management API calls (and some access patterns depending on configuration and AWS support).<\/li>\n<li><strong>AWS IAM Identity Center (AWS SSO)<\/strong>: Often used to grant engineers access to Grafana and controlled roles.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Dependency services (typical)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>IAM (required)<\/li>\n<li>VPC\/networking components (for your collectors and (optional) private access)<\/li>\n<li>Compute for collectors (EKS\/ECS\/EC2\/on-prem)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Security\/authentication model<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Primary model<\/strong>: IAM policies + SigV4 signing.<\/li>\n<li><strong>Common implementation patterns<\/strong>:<\/li>\n<li>Use collectors that natively sign requests (for example, ADOT components) <strong>or<\/strong><\/li>\n<li>Use a SigV4 signing proxy that runs alongside Prometheus and signs outbound requests (use carefully; validate from official guidance).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Networking model<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Collectors reach AMP endpoints over HTTPS.<\/li>\n<li>You can run collectors anywhere with network reachability to the service endpoints.<\/li>\n<li>For private connectivity options (like PrivateLink \/ interface VPC endpoints), verify current support and endpoint names in your region via:<\/li>\n<li>VPC Console \u2192 Endpoints \u2192 \u201cCreate endpoint\u201d \u2192 search for the Prometheus workspace endpoint service<\/li>\n<li>Official documentation for Amazon Managed Service for Prometheus networking guidance<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Monitoring\/logging\/governance considerations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Monitor <strong>collector health<\/strong>: remote_write queue, dropped samples, WAL growth, and scrape durations.<\/li>\n<li>Enforce <strong>label and cardinality governance<\/strong>: avoid high-cardinality labels (request IDs, user IDs, pod UID, etc.).<\/li>\n<li>Use <strong>workspace tagging<\/strong> and consistent naming\/aliases for cost allocation and governance.<\/li>\n<li>Use CloudTrail and IAM Access Analyzer to review access patterns.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Simple architecture diagram (Mermaid)<\/h3>\n\n\n\n<pre><code class=\"language-mermaid\">flowchart LR\n  subgraph Workloads\n    A[Apps \/metrics endpoints]\n    B[Node Exporter \/ Kube-state-metrics]\n  end\n\n  subgraph Collector\n    P[Prometheus \/ OTel Collector]\n  end\n\n  subgraph AWS[aws region]\n    W[(Amazon Managed Service for Prometheus Workspace)]\n    G[Grafana (Amazon Managed Grafana or self-managed)]\n  end\n\n  A --&gt;|scrape| P\n  B --&gt;|scrape| P\n  P --&gt;|remote_write (SigV4)| W\n  G --&gt;|PromQL query (SigV4)| W\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Production-style architecture diagram (Mermaid)<\/h3>\n\n\n\n<pre><code class=\"language-mermaid\">flowchart TB\n  subgraph Org[AWS Organization]\n    subgraph Shared[Shared Services Account]\n      AMG[Amazon Managed Grafana]\n    end\n\n    subgraph Prod[Production Account]\n      EKS1[EKS Cluster A]\n      EKS2[EKS Cluster B]\n      ADOT1[ADOT Collector \/ Prometheus Agent]\n      ADOT2[ADOT Collector \/ Prometheus Agent]\n      IRSA[IAM Roles for Service Accounts]\n    end\n\n    subgraph Obs[Observability Account]\n      AMP[(Amazon Managed Service for Prometheus\\nWorkspace: prod-metrics)]\n      CT[CloudTrail]\n      KMS[KMS (encryption at rest - AWS managed or CMK where supported)]\n    end\n  end\n\n  EKS1 --&gt; ADOT1\n  EKS2 --&gt; ADOT2\n  IRSA --&gt; ADOT1\n  IRSA --&gt; ADOT2\n  ADOT1 --&gt;|remote_write (SigV4)| AMP\n  ADOT2 --&gt;|remote_write (SigV4)| AMP\n  AMG --&gt;|PromQL query (SigV4)| AMP\n  CT --&gt; AMP\n  KMS --&gt; AMP\n<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">8. Prerequisites<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Before you start, confirm the following.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Account and billing<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>An active <strong>AWS account<\/strong> with billing enabled.<\/li>\n<li>Permission to create and delete resources used in the lab (AMP workspace, IAM role, EC2 instance).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">IAM permissions (minimum)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">You need IAM permissions to:\n&#8211; Create and manage an Amazon Managed Service for Prometheus workspace.\n&#8211; Create IAM roles\/policies (for the collector instance role).\n&#8211; Create and manage an EC2 instance and (optionally) Systems Manager access.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Common IAM actions you may encounter include (verify exact action names in official IAM docs for this service):\n&#8211; <code>aps:CreateWorkspace<\/code>, <code>aps:DeleteWorkspace<\/code>, <code>aps:DescribeWorkspace<\/code>, <code>aps:ListWorkspaces<\/code>\n&#8211; <code>aps:RemoteWrite<\/code>\n&#8211; <code>aps:QueryMetrics<\/code>, <code>aps:GetSeries<\/code>, <code>aps:GetLabels<\/code>, <code>aps:GetMetricMetadata<\/code><\/p>\n\n\n\n<blockquote>\n<p>The service API prefix in IAM is commonly <code>aps<\/code>. Confirm current IAM action names here: https:\/\/docs.aws.amazon.com\/service-authorization\/latest\/reference\/list_amazonmanagedserviceforprometheus.html<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\">Tools<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>AWS CLI v2<\/strong> (recommended): https:\/\/docs.aws.amazon.com\/cli\/latest\/userguide\/getting-started-install.html<\/li>\n<li>A terminal with <code>curl<\/code><\/li>\n<li>For the lab instance: <strong>Docker<\/strong> (installed on EC2)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Region availability<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Amazon Managed Service for Prometheus is regional and not available in every AWS Region. Verify supported Regions in official docs:<\/li>\n<li>https:\/\/docs.aws.amazon.com\/prometheus\/latest\/userguide\/what-is-Amazon-Managed-Service-Prometheus.html (and Region tables linked from there)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Quotas\/limits<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Expect quotas around:\n&#8211; Number of workspaces per account\/region\n&#8211; Ingestion rates\n&#8211; Query concurrency\n&#8211; Series\/label limits<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Quotas can change; verify current quotas and request increases if needed in:\n&#8211; AWS Service Quotas console (if supported for this service)\n&#8211; Service documentation on limits\/quotas<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Prerequisite services<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">For this tutorial, you will also use:\n&#8211; Amazon EC2\n&#8211; AWS Systems Manager (recommended for secure access without inbound SSH)\n&#8211; IAM instance profile role for the EC2 instance<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">9. Pricing \/ Cost<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Amazon Managed Service for Prometheus pricing is <strong>usage-based<\/strong> and varies by Region. Do not estimate with fixed numbers without checking the official pricing page for your Region.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Official pricing sources<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Pricing page: https:\/\/aws.amazon.com\/prometheus\/pricing\/<\/li>\n<li>AWS Pricing Calculator: https:\/\/calculator.aws\/#\/<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Pricing dimensions (what you pay for)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Pricing commonly includes some combination of:\n&#8211; <strong>Metrics ingested<\/strong> (often measured as samples ingested via remote_write)\n&#8211; <strong>Metrics stored<\/strong> (time-series storage footprint over time)\n&#8211; <strong>Metrics queried<\/strong> (query processing based on samples scanned\/returned or query usage)\n&#8211; <strong>(If applicable; verify)<\/strong> rule evaluations \/ managed alerting components<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">AWS can adjust dimensions over time\u2014always confirm the current dimensions on the pricing page.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Free tier<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">AWS free tier eligibility changes by service and time. Check the pricing page for any free tier or trial details. If not explicitly listed, assume no free tier for meaningful production use.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Primary cost drivers<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Ingestion volume<\/strong>: more targets \u00d7 higher scrape frequency \u00d7 more metrics per target = higher ingestion.<\/li>\n<li><strong>Cardinality<\/strong>: high-cardinality labels explode the number of time series and storage\/query costs.<\/li>\n<li><strong>Retention needs<\/strong>: longer retention increases stored data.<\/li>\n<li><strong>Query patterns<\/strong>: \u201cwide\u201d queries over long time ranges can be expensive and slow.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Hidden or indirect costs<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Collector compute<\/strong>: EC2\/EKS\/ECS costs for Prometheus\/collectors and any signing proxy.<\/li>\n<li><strong>Network data transfer<\/strong>:<\/li>\n<li>Cross-AZ, cross-VPC, cross-account, cross-region, and internet egress can add cost.<\/li>\n<li>If sending from on-prem to AWS, you may have VPN\/Direct Connect costs.<\/li>\n<li><strong>Grafana costs<\/strong>: Amazon Managed Grafana is priced separately.<\/li>\n<li><strong>Logs<\/strong>: If you ship collector logs to CloudWatch Logs, that is separately billed.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Cost optimization tips (high impact)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Control <strong>label cardinality<\/strong>:<\/li>\n<li>Avoid labels like <code>request_id<\/code>, <code>user_id<\/code>, full URL paths, pod UID, build timestamps.<\/li>\n<li>Prefer bounded labels like <code>service<\/code>, <code>namespace<\/code>, <code>cluster<\/code>, <code>status_code<\/code> (carefully).<\/li>\n<li>Reduce scrape frequency where it\u2019s not needed:<\/li>\n<li>Many exporters default to 15s; some metrics are fine at 30s or 60s.<\/li>\n<li>Drop unneeded metrics at the collector:<\/li>\n<li>Use relabeling\/metric_relabel_configs (Prometheus) or filtering processors (OTel).<\/li>\n<li>Split workspaces by environment and purpose:<\/li>\n<li>Keep noisy dev\/test separate from prod.<\/li>\n<li>Use recording rules (where supported) to precompute expensive queries (verify feature support).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Example low-cost starter estimate (how to estimate without inventing numbers)<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Measure your ingestion rate from your collector:\n   &#8211; Prometheus remote_write stats (e.g., samples\/s succeeded\/failed).<\/li>\n<li>Convert to the pricing unit:\n   &#8211; If pricing is \u201cper million samples ingested,\u201d estimate:<br\/>\n<code>samples_per_second * 60 * 60 * 24 * 30 \/ 1,000,000<\/code><\/li>\n<li>Estimate storage:\n   &#8211; Use your observed active series count and retention; check AWS guidance for storage estimation (verify in docs).<\/li>\n<li>Plug values into:\n   &#8211; AWS Pricing Calculator and the service pricing page for your Region.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Example production cost considerations<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">In production, expect costs to scale with:\n&#8211; Many clusters (10\u2013100+), each with kube-state-metrics, cAdvisor\/container metrics, service mesh metrics.\n&#8211; Longer retention windows.\n&#8211; High query concurrency (many Grafana users, many dashboards refreshing).\n&#8211; Additional derived metrics or recording rules.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">A practical approach is to run a 1\u20132 week pilot with:\n&#8211; One production-like cluster\n&#8211; Real scrape configs\n&#8211; A few representative dashboards\nThen review ingestion, active series, query patterns, and update your cost model.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">10. Step-by-Step Hands-On Tutorial<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">This lab builds a small but real setup:\n&#8211; Create an Amazon Managed Service for Prometheus workspace\n&#8211; Run a tiny Prometheus on an EC2 instance\n&#8211; Use a SigV4 signing proxy so Prometheus can remote_write to AMP\n&#8211; Query the workspace to confirm metrics arrived<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This avoids standing up a full Kubernetes cluster, while still using real remote_write and PromQL.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Objective<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Ingest a small set of Prometheus metrics into <strong>Amazon Managed Service for Prometheus<\/strong> from a self-managed Prometheus running on EC2, then query those metrics successfully.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Lab Overview<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">You will:\n1. Create an AMP workspace.\n2. Create an IAM role for an EC2 instance to remote_write and query.\n3. Launch a small EC2 instance with Docker and Systems Manager access.\n4. Run:\n   &#8211; Node Exporter (to generate host metrics)\n   &#8211; Prometheus (to scrape and remote_write)\n   &#8211; A SigV4 proxy (to sign requests to AMP)\n5. Validate metrics ingestion using a PromQL query.\n6. Clean up resources to avoid ongoing charges.<\/p>\n\n\n\n<blockquote>\n<p>Cost note: EC2, data transfer, and AMP usage can incur charges. Use a small instance and delete resources immediately after the lab.<\/p>\n<\/blockquote>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 1: Choose a Region and create a workspace<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Pick a Region where Amazon Managed Service for Prometheus is available.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Option A: Create workspace in the AWS Console<\/h4>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Open the AWS console for <strong>Amazon Managed Service for Prometheus<\/strong>.<\/li>\n<li>Create a <strong>workspace<\/strong>.<\/li>\n<li>Note:\n   &#8211; <strong>Workspace ID<\/strong>\n   &#8211; <strong>Remote write URL<\/strong>\n   &#8211; <strong>Query URL<\/strong><\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\">Expected outcome:\n&#8211; A workspace is created and shows an \u201cActive\u201d (or similar) status.\n&#8211; You have the endpoints needed for later steps.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Option B: Create workspace using AWS CLI<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">The CLI command group is typically <code>amp<\/code> (but verify your AWS CLI supports it in your version\/region). If <code>aws amp<\/code> is not available, use the console or verify the correct CLI naming in current AWS CLI docs.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Example (verify command names in official docs):<\/p>\n\n\n\n<pre><code class=\"language-bash\">aws amp create-workspace --alias \"lab-workspace\"\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Expected outcome:\n&#8211; CLI returns workspace metadata including an ID.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 2: Create an IAM role for the EC2 instance (remote_write + query)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">You will attach an instance role that allows:\n&#8211; Writing metrics to the workspace (<code>RemoteWrite<\/code>)\n&#8211; Querying metrics for validation (<code>QueryMetrics<\/code> and related read actions)<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">2.1 Create an IAM policy<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Create a policy similar to the following, replacing:\n&#8211; <code>REGION<\/code>\n&#8211; <code>ACCOUNT_ID<\/code>\n&#8211; <code>WORKSPACE_ID<\/code><\/p>\n\n\n\n<blockquote>\n<p>Verify the exact resource ARN format for workspaces in the official docs. The example below reflects common AWS patterns but should be validated.<\/p>\n<\/blockquote>\n\n\n\n<pre><code class=\"language-json\">{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Sid\": \"WriteToWorkspace\",\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"aps:RemoteWrite\"\n      ],\n      \"Resource\": \"arn:aws:aps:REGION:ACCOUNT_ID:workspace\/WORKSPACE_ID\"\n    },\n    {\n      \"Sid\": \"QueryWorkspace\",\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"aps:QueryMetrics\",\n        \"aps:GetSeries\",\n        \"aps:GetLabels\",\n        \"aps:GetMetricMetadata\"\n      ],\n      \"Resource\": \"arn:aws:aps:REGION:ACCOUNT_ID:workspace\/WORKSPACE_ID\"\n    },\n    {\n      \"Sid\": \"DescribeWorkspace\",\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"aps:DescribeWorkspace\"\n      ],\n      \"Resource\": \"arn:aws:aps:REGION:ACCOUNT_ID:workspace\/WORKSPACE_ID\"\n    }\n  ]\n}\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Expected outcome:\n&#8211; You have a customer-managed IAM policy granting least-privilege AMP access.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">2.2 Create an IAM role for EC2<\/h4>\n\n\n\n<ol class=\"wp-block-list\">\n<li>IAM \u2192 Roles \u2192 Create role<\/li>\n<li>Trusted entity: <strong>AWS service<\/strong> \u2192 <strong>EC2<\/strong><\/li>\n<li>Attach:\n   &#8211; Your new AMP policy\n   &#8211; <strong>AmazonSSMManagedInstanceCore<\/strong> (so you can connect via Session Manager without SSH)<\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\">Expected outcome:\n&#8211; You have an EC2 role (instance profile) ready to attach at launch time.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 3: Launch an EC2 instance (small + SSM access)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Launch a small Linux instance (for example, Amazon Linux). Ensure:\n&#8211; It has outbound internet access (via public subnet + IGW, or private subnet + NAT) to reach AMP endpoints.\n&#8211; You attach the IAM role created above.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Recommended security posture for a lab:\n&#8211; No inbound security group rules required if you use <strong>Systems Manager Session Manager<\/strong>.\n&#8211; Allow outbound HTTPS (TCP 443).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Expected outcome:\n&#8211; EC2 instance is running and shows as \u201cManaged\u201d in Systems Manager (if SSM agent is available and the role is attached).<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 4: Connect to the instance and install Docker<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Use AWS Systems Manager \u2192 Session Manager \u2192 Start session.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">On the instance, install and start Docker.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For Amazon Linux (commands vary by distro\u2014verify for your AMI):<\/p>\n\n\n\n<pre><code class=\"language-bash\">sudo yum update -y\nsudo yum install -y docker\nsudo systemctl enable docker\nsudo systemctl start docker\nsudo usermod -aG docker ec2-user\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Reconnect your session (or <code>newgrp docker<\/code>) so your user can run Docker without sudo:<\/p>\n\n\n\n<pre><code class=\"language-bash\">newgrp docker\ndocker version\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Expected outcome:\n&#8211; <code>docker version<\/code> succeeds.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 5: Run a SigV4 signing proxy (for Prometheus remote_write and query)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Prometheus itself does not always natively sign requests with AWS SigV4 in all deployments. A common pattern is to run a local proxy that:\n&#8211; Accepts HTTP from Prometheus\n&#8211; Signs the outgoing request with SigV4 using the instance role credentials\n&#8211; Sends it to the AMP endpoint<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">A widely referenced tool is <strong>aws-sigv4-proxy<\/strong> (awslabs). Verify the current recommended approach in AWS documentation for Amazon Managed Service for Prometheus ingestion clients.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Official\/trusted reference starting points:\n&#8211; https:\/\/docs.aws.amazon.com\/prometheus\/latest\/userguide\/ingest-metrics.html (navigate to ingestion methods)\n&#8211; https:\/\/github.com\/awslabs\/aws-sigv4-proxy<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">5.1 Start the SigV4 proxy container<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Replace:\n&#8211; <code>AWS_REGION<\/code>\n&#8211; (If required) upstream host\/URL configuration per the proxy\u2019s documentation<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Example pattern (verify flags with the project docs):<\/p>\n\n\n\n<pre><code class=\"language-bash\">docker run -d --name sigv4-proxy \\\n  -p 8000:8000 \\\n  --restart unless-stopped \\\n  public.ecr.aws\/aws-observability\/aws-sigv4-proxy:latest \\\n  --port 8000 \\\n  --region AWS_REGION \\\n  --service aps\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Expected outcome:\n&#8211; <code>docker ps<\/code> shows <code>sigv4-proxy<\/code> running.\n&#8211; The proxy listens on <code>localhost:8000<\/code>.<\/p>\n\n\n\n<blockquote>\n<p>If the image name\/tag differs, use the repository\u2019s documented image reference. Some environments build the proxy binary directly.<\/p>\n<\/blockquote>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 6: Run Node Exporter (metrics source)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Run Node Exporter to expose host metrics on port 9100:<\/p>\n\n\n\n<pre><code class=\"language-bash\">docker run -d --name node-exporter \\\n  --net=host \\\n  --restart unless-stopped \\\n  prom\/node-exporter:latest\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Expected outcome:\n&#8211; Node Exporter is running.\n&#8211; You can confirm locally:\n  <code>bash\n  curl -s http:\/\/127.0.0.1:9100\/metrics | head<\/code><\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 7: Configure and run Prometheus (scrape + remote_write to AMP via proxy)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Create a Prometheus config file:<\/p>\n\n\n\n<pre><code class=\"language-bash\">mkdir -p ~\/prometheus\ncat &gt; ~\/prometheus\/prometheus.yml &lt;&lt;'EOF'\nglobal:\n  scrape_interval: 15s\n\nscrape_configs:\n  - job_name: prometheus\n    static_configs:\n      - targets: [\"127.0.0.1:9090\"]\n\n  - job_name: node\n    static_configs:\n      - targets: [\"127.0.0.1:9100\"]\n\nremote_write:\n  - url: \"http:\/\/127.0.0.1:8000\/workspaces\/WORKSPACE_ID\/api\/v1\/remote_write\"\n    queue_config:\n      max_samples_per_send: 1000\n      max_shards: 5\n      capacity: 5000\nEOF\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Replace <code>WORKSPACE_ID<\/code> with your actual workspace ID.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Now run Prometheus:<\/p>\n\n\n\n<pre><code class=\"language-bash\">docker run -d --name prometheus \\\n  -p 9090:9090 \\\n  --restart unless-stopped \\\n  -v ~\/prometheus\/prometheus.yml:\/etc\/prometheus\/prometheus.yml:ro \\\n  prom\/prometheus:latest\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Expected outcome:\n&#8211; Prometheus is running.\n&#8211; You can open the Prometheus UI locally (optional) by port-forwarding with SSM or using Session Manager port forwarding. For a CLI-only lab, verify via logs:\n  <code>bash\n  docker logs --tail=50 prometheus<\/code><\/p>\n\n\n\n<h4 class=\"wp-block-heading\">7.1 Check Prometheus is scraping targets<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Inside the instance:<\/p>\n\n\n\n<pre><code class=\"language-bash\">curl -s \"http:\/\/127.0.0.1:9090\/api\/v1\/targets\" | head\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Expected outcome:\n&#8211; Targets show as <code>up<\/code> (this API returns JSON; you may not have <code>jq<\/code> installed).<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">7.2 Check remote_write is not failing<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Prometheus exposes internal metrics that indicate remote_write health. Query locally:<\/p>\n\n\n\n<pre><code class=\"language-bash\">curl -s \"http:\/\/127.0.0.1:9090\/api\/v1\/query?query=prometheus_remote_storage_succeeded_samples_total\" | head\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Expected outcome:\n&#8211; You see a counter value that increases over time (may start at 0, then rise).<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 8: Query Amazon Managed Service for Prometheus to confirm data arrived<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Now query the managed workspace through the same SigV4 proxy.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Try a basic <code>up<\/code> query:<\/p>\n\n\n\n<pre><code class=\"language-bash\">curl -s \"http:\/\/127.0.0.1:8000\/workspaces\/WORKSPACE_ID\/api\/v1\/query?query=up\" | head\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Expected outcome:\n&#8211; The response includes time series with <code>job=\"prometheus\"<\/code> and <code>job=\"node\"<\/code> returning <code>1<\/code>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Try a range query (last 5 minutes) for CPU idle seconds (Node Exporter):<\/p>\n\n\n\n<pre><code class=\"language-bash\">END=$(date +%s)\nSTART=$((END-300))\n\ncurl -s \"http:\/\/127.0.0.1:8000\/workspaces\/WORKSPACE_ID\/api\/v1\/query_range?query=rate(node_cpu_seconds_total%7Bmode%3D%22idle%22%7D%5B1m%5D)&amp;start=${START}&amp;end=${END}&amp;step=15\" | head\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Expected outcome:\n&#8211; The response includes a matrix of samples for the metric.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Validation<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Use the following checklist:\n&#8211; [ ] Prometheus is scraping Node Exporter (<code>targets<\/code> show <code>up<\/code>)\n&#8211; [ ] Prometheus shows successful remote_write samples increasing\n&#8211; [ ] AMP query endpoint returns <code>up<\/code> series through the proxy\n&#8211; [ ] You can run a range query and receive datapoints<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">If all items pass, you have a working ingestion + query path.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Troubleshooting<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Common issues and fixes:<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">1) HTTP 403 \/ AccessDenied when writing or querying<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Cause: IAM policy missing required <code>aps:*<\/code> actions or wrong workspace ARN.<\/li>\n<li>Fix:<\/li>\n<li>Confirm the EC2 instance role is attached.<\/li>\n<li>Confirm the policy uses the correct workspace resource ARN format (verify in official docs).<\/li>\n<li>Confirm region matches the workspace.<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">2) SigV4 errors or signature mismatch<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Cause: Wrong region, time drift, or incorrect proxy configuration.<\/li>\n<li>Fix:<\/li>\n<li>Ensure proxy region equals workspace region.<\/li>\n<li>Ensure instance time is correct (NTP).<\/li>\n<li>Verify proxy flags per official repo docs.<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">3) No metrics returned, but Prometheus is scraping locally<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Cause: remote_write failing silently, incorrect remote_write URL path, wrong workspace ID.<\/li>\n<li>Fix:<\/li>\n<li>Double-check the remote_write URL path includes <code>\/workspaces\/WORKSPACE_ID\/api\/v1\/remote_write<\/code>.<\/li>\n<li>Check Prometheus logs:\n    <code>bash\n    docker logs --tail=200 prometheus<\/code><\/li>\n<li>Check proxy logs:\n    <code>bash\n    docker logs --tail=200 sigv4-proxy<\/code><\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">4) Prometheus WAL grows and disk fills<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Cause: remote_write cannot deliver (network\/IAM) and buffers in WAL.<\/li>\n<li>Fix:<\/li>\n<li>Resolve remote_write errors quickly.<\/li>\n<li>Reduce scrape frequency\/volume temporarily.<\/li>\n<li>Ensure outbound HTTPS works.<\/li>\n<li>Consider limiting retention on the local Prometheus (lab only).<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">5) High cardinality causes ingestion or query pain<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Cause: Too many unique label combinations.<\/li>\n<li>Fix:<\/li>\n<li>Drop labels\/metrics at scrape time (<code>metric_relabel_configs<\/code>).<\/li>\n<li>Standardize labels and avoid unbounded dimensions.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Cleanup<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">To avoid ongoing charges, delete everything you created.<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Stop and remove containers on EC2:\n   <code>bash\n   docker rm -f prometheus node-exporter sigv4-proxy<\/code><\/li>\n<li>Terminate the EC2 instance.<\/li>\n<li>Delete the IAM role and policy created for the lab (if not reused).<\/li>\n<li>Delete the AMP workspace (console or CLI).<\/li>\n<li>Review billing:\n   &#8211; AMP ingestion\/query\/storage usage may continue briefly until fully stopped; verify in AWS Billing.<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\">11. Best Practices<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Architecture best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Use a remote_write architecture intentionally<\/strong>:<\/li>\n<li>Scrape locally (per cluster or per environment), remote_write centrally.<\/li>\n<li><strong>Separate workspaces by environment and tenancy<\/strong>:<\/li>\n<li><code>prod<\/code>, <code>nonprod<\/code>, and \u201cexperimentation\u201d should not share a workspace in most orgs.<\/li>\n<li><strong>Design for failure on the sender side<\/strong>:<\/li>\n<li>remote_write is asynchronous; ensure queue\/WAL settings are appropriate.<\/li>\n<li><strong>Standardize labels<\/strong>:<\/li>\n<li>Establish label conventions (<code>cluster<\/code>, <code>namespace<\/code>, <code>service<\/code>, <code>app<\/code>, <code>env<\/code>) and avoid duplicates.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">IAM\/security best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Least privilege<\/strong>:<\/li>\n<li>Writers need <code>aps:RemoteWrite<\/code> only.<\/li>\n<li>Readers need <code>aps:QueryMetrics<\/code> and read actions.<\/li>\n<li><strong>Use roles, not long-lived access keys<\/strong>:<\/li>\n<li>IRSA for EKS, task roles for ECS, instance roles for EC2.<\/li>\n<li><strong>Separate write and read identities<\/strong>:<\/li>\n<li>Prevent dashboards from accidentally being used to exfiltrate broader datasets.<\/li>\n<li><strong>Control Grafana access<\/strong>:<\/li>\n<li>Use SAML\/IAM Identity Center, team-based access, and audit.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Cost best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Control cardinality first<\/strong> (largest cost\/perf lever).<\/li>\n<li><strong>Drop noisy metrics<\/strong>:<\/li>\n<li>Many exporter metrics aren\u2019t used; don\u2019t ingest what you won\u2019t query.<\/li>\n<li><strong>Tune scrape intervals<\/strong> per metric importance.<\/li>\n<li><strong>Use separate workspaces<\/strong> to prevent dev\/test noise from inflating prod costs.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Performance best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Prefer recording rules<\/strong> for expensive dashboard queries (if supported; verify).<\/li>\n<li><strong>Avoid huge range queries<\/strong> on high-cardinality metrics.<\/li>\n<li><strong>Use dashboard variables carefully<\/strong> (they can generate many queries).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Reliability best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Run redundant collectors<\/strong> where needed:<\/li>\n<li>For Kubernetes, deploy collectors with replica count &gt; 1 for high availability (while avoiding double-scrape duplication\u2014design carefully).<\/li>\n<li><strong>Monitor remote_write health<\/strong>:<\/li>\n<li>Alert on sustained failures, high dropped samples, and WAL growth.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Operations best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Version control your scrape configs<\/strong>:<\/li>\n<li>Use GitOps for Kubernetes.<\/li>\n<li><strong>Use consistent workspace naming<\/strong>:<\/li>\n<li>Example: <code>org-prod-platform-metrics<\/code>, <code>org-nonprod-app-metrics<\/code>.<\/li>\n<li><strong>Tag workspaces<\/strong> for cost allocation:<\/li>\n<li><code>Environment=prod<\/code>, <code>Owner=platform<\/code>, <code>CostCenter=...<\/code><\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Governance\/tagging\/naming best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Apply tags at creation and enforce via IaC:<\/li>\n<li><code>Environment<\/code>, <code>BusinessUnit<\/code>, <code>DataClassification<\/code>, <code>Owner<\/code>, <code>Application<\/code><\/li>\n<li>Use AWS Organizations SCPs where appropriate to prevent unapproved workspace creation (verify governance needs).<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">12. Security Considerations<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Identity and access model<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Amazon Managed Service for Prometheus uses <strong>IAM<\/strong> for authentication\/authorization.<\/li>\n<li>Typical access patterns:<\/li>\n<li><strong>Ingestion<\/strong>: collectors assume a role that can only <code>aps:RemoteWrite<\/code> to the target workspace.<\/li>\n<li><strong>Query<\/strong>: Grafana assumes a role with query permissions.<\/li>\n<li>Prefer <strong>separate roles<\/strong> for ingestion vs query.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Encryption<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>In transit<\/strong>: HTTPS\/TLS to ingestion and query endpoints.<\/li>\n<li><strong>At rest<\/strong>: AWS-managed encryption for stored metrics.<\/li>\n<li>If you require KMS customer-managed keys or specific key policies, verify current support in the service docs.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Network exposure<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>If using public endpoints, ensure:<\/li>\n<li>Outbound egress is controlled (NAT, egress firewalls).<\/li>\n<li>IAM policies limit access even if endpoints are reachable.<\/li>\n<li>If private connectivity (PrivateLink\/interface endpoints) is supported and required, implement it and restrict security groups accordingly (verify endpoint support and names).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Secrets handling<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Avoid static AWS access keys on instances or in Kubernetes secrets.<\/li>\n<li>Use:<\/li>\n<li>IRSA (EKS)<\/li>\n<li>EC2 instance roles<\/li>\n<li>ECS task roles<\/li>\n<li>If using a proxy, ensure it sources credentials from the role (IMDSv2) rather than static environment variables.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Audit\/logging<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Enable and review <strong>CloudTrail<\/strong> for workspace management actions.<\/li>\n<li>For collector-side troubleshooting:<\/li>\n<li>Capture Prometheus\/collector logs securely (CloudWatch Logs with retention).<\/li>\n<li>Consider SIEM ingestion for CloudTrail events if required.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Compliance considerations<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Metrics can contain sensitive information via labels (service names, internal hostnames, tenant identifiers). Treat metrics as potentially sensitive data:\n&#8211; Apply data classification tagging.\n&#8211; Limit who can query raw labels and series.\n&#8211; Avoid embedding PII in labels.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Common security mistakes<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Granting <code>aps:*<\/code> on <code>*<\/code> to many roles.<\/li>\n<li>Using one shared workspace for all environments and teams.<\/li>\n<li>Storing AWS long-lived access keys in Prometheus config files.<\/li>\n<li>Exposing Grafana publicly without strong authentication.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Secure deployment recommendations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use dedicated ingestion roles per cluster\/environment.<\/li>\n<li>Use private networking where required.<\/li>\n<li>Enforce label policies and drop sensitive labels at the collector.<\/li>\n<li>Use AWS Organizations guardrails for workspace creation and tagging.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">13. Limitations and Gotchas<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Always validate the latest limits in official documentation. Common real-world gotchas include:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>You still run scrapers\/collectors<\/strong>: AMP does not scrape your targets; you must operate the collection layer.<\/li>\n<li><strong>Cardinality can explode quickly<\/strong>: Kubernetes labels, pod churn, and \u201chelpful\u201d labels can create massive series counts.<\/li>\n<li><strong>Remote write failure modes<\/strong>:<\/li>\n<li>Network\/IAM issues can cause WAL growth and disk pressure on collectors.<\/li>\n<li><strong>Query behavior differs from local Prometheus<\/strong>:<\/li>\n<li>Latency and supported API nuances can differ; verify supported endpoints.<\/li>\n<li><strong>Region scoping<\/strong>:<\/li>\n<li>Workspaces are regional; cross-region dashboards require explicit configuration and may incur data transfer.<\/li>\n<li><strong>Private networking support varies<\/strong>:<\/li>\n<li>If you require VPC endpoints\/PrivateLink, verify support in your Region and plan early.<\/li>\n<li><strong>Pricing surprises<\/strong>:<\/li>\n<li>Ingestion volume and high-cardinality labels are the most common reasons for unexpected costs.<\/li>\n<li><strong>Migration challenges<\/strong>:<\/li>\n<li>Migrating dashboards is usually easy (PromQL), but migrating alerting\/rules depends on your current tooling and what managed features you use (verify rule support).<\/li>\n<li><strong>Multi-tenant governance<\/strong>:<\/li>\n<li>Without strict workspace boundaries, teams can inadvertently impact each other with expensive queries or noisy ingestion.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">14. Comparison with Alternatives<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Amazon Managed Service for Prometheus is one option in a larger observability ecosystem.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Comparison table<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Option<\/th>\n<th>Best For<\/th>\n<th>Strengths<\/th>\n<th>Weaknesses<\/th>\n<th>When to Choose<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>Amazon Managed Service for Prometheus<\/strong><\/td>\n<td>Prometheus-compatible managed metrics backend in AWS<\/td>\n<td>PromQL compatibility, IAM auth, managed storage\/query, good fit with EKS + Grafana<\/td>\n<td>Must run collectors; cost\/cardinality management required; regional scope<\/td>\n<td>You want managed Prometheus backend with AWS governance<\/td>\n<\/tr>\n<tr>\n<td><strong>Amazon CloudWatch Metrics<\/strong><\/td>\n<td>AWS-native metrics for AWS services and apps<\/td>\n<td>Tight AWS integration, alarms, dashboards, many managed metrics sources<\/td>\n<td>PromQL not native; custom metrics and cardinality can be costly; different data model<\/td>\n<td>You are primarily monitoring AWS services and want CloudWatch alarms<\/td>\n<\/tr>\n<tr>\n<td><strong>Self-managed Prometheus (EC2\/EKS)<\/strong><\/td>\n<td>Full control, small setups, special requirements<\/td>\n<td>Full OSS feature set, local low-latency queries<\/td>\n<td>Ops burden: HA, scaling, storage, upgrades<\/td>\n<td>Small environments or when you need total control<\/td>\n<\/tr>\n<tr>\n<td><strong>Thanos\/Cortex\/Mimir (self-managed)<\/strong><\/td>\n<td>Large-scale Prometheus at enterprise scale<\/td>\n<td>Long-term storage, HA, global querying<\/td>\n<td>Significant operational complexity<\/td>\n<td>You have platform maturity and want OSS control at scale<\/td>\n<\/tr>\n<tr>\n<td><strong>Amazon Managed Grafana<\/strong><\/td>\n<td>Visualization and alerting UI<\/td>\n<td>Managed Grafana, integrates with AMP and other sources<\/td>\n<td>Not a metrics store; separate cost<\/td>\n<td>Pair with AMP for dashboards; or unify many datasources<\/td>\n<\/tr>\n<tr>\n<td><strong>Grafana Cloud \/ Datadog \/ New Relic<\/strong><\/td>\n<td>Fully managed observability across metrics\/logs\/traces<\/td>\n<td>End-to-end SaaS, strong UX, less DIY<\/td>\n<td>Vendor cost model; data egress; lock-in<\/td>\n<td>You want a SaaS observability suite rather than AWS-native<\/td>\n<\/tr>\n<tr>\n<td><strong>Google Cloud Managed Service for Prometheus \/ Azure Managed Prometheus<\/strong><\/td>\n<td>Managed Prometheus on other clouds<\/td>\n<td>Cloud-native integrations in their ecosystems<\/td>\n<td>Cross-cloud complexity if primary workloads are on AWS<\/td>\n<td>If your workloads primarily run on those clouds<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">15. Real-World Example<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Enterprise example: regulated multi-account Kubernetes platform<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: A bank runs dozens of EKS clusters across multiple AWS accounts. Each cluster had its own Prometheus with limited retention and inconsistent access. Auditors require strict access controls and traceability.<\/li>\n<li><strong>Proposed architecture<\/strong>:<\/li>\n<li>One observability account per environment tier.<\/li>\n<li>Amazon Managed Service for Prometheus workspaces: <code>prod<\/code>, <code>nonprod<\/code>.<\/li>\n<li>EKS collectors (Prometheus Operator or ADOT) remote_write to the appropriate workspace using IRSA.<\/li>\n<li>Amazon Managed Grafana in a shared services account, using IAM Identity Center and role assumption to query prod\/nonprod workspaces.<\/li>\n<li>Strict IAM policies: ingestion roles cannot query; query roles cannot write.<\/li>\n<li><strong>Why this service was chosen<\/strong>:<\/li>\n<li>Prometheus compatibility retained.<\/li>\n<li>Managed backend reduces operational risk and change management burden.<\/li>\n<li>IAM-based access supports least privilege and audit.<\/li>\n<li><strong>Expected outcomes<\/strong>:<\/li>\n<li>Standardized dashboards and SLO monitoring across clusters.<\/li>\n<li>Reduced outage risk from self-managed storage incidents.<\/li>\n<li>Improved compliance posture through centralized governance and auditing.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Startup\/small-team example: single EKS cluster with growth expectations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: A startup has one EKS cluster now, but expects rapid growth. They want Prometheus metrics for app and Kubernetes, but don\u2019t want to operate HA Prometheus + long-term storage.<\/li>\n<li><strong>Proposed architecture<\/strong>:<\/li>\n<li>One AMP workspace for <code>prod<\/code>.<\/li>\n<li>A lightweight in-cluster collector scraping kube-state-metrics and app <code>\/metrics<\/code>.<\/li>\n<li>Grafana (Amazon Managed Grafana or small self-managed Grafana) querying AMP.<\/li>\n<li><strong>Why this service was chosen<\/strong>:<\/li>\n<li>Minimal operations: no long-term TSDB scaling.<\/li>\n<li>PromQL dashboards remain portable.<\/li>\n<li><strong>Expected outcomes<\/strong>:<\/li>\n<li>Faster onboarding of new services with standard metrics patterns.<\/li>\n<li>Predictable operations with a managed backend; focus stays on product.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">16. FAQ<\/h2>\n\n\n\n<ol class=\"wp-block-list\">\n<li>\n<p><strong>Is Amazon Managed Service for Prometheus the same as Prometheus?<\/strong><br\/>\n   It is a managed AWS service that is <strong>compatible with Prometheus ingestion (remote_write) and querying (PromQL)<\/strong>. You still run Prometheus scrapers\/collectors to collect metrics.<\/p>\n<\/li>\n<li>\n<p><strong>Do I still need to run Prometheus servers?<\/strong><br\/>\n   You need a component that scrapes targets and remote_writes metrics to the managed workspace (Prometheus, agent, or an OpenTelemetry Collector configured for Prometheus scraping).<\/p>\n<\/li>\n<li>\n<p><strong>Does it support PromQL?<\/strong><br\/>\n   Yes\u2014querying is PromQL-based through Prometheus-compatible query APIs.<\/p>\n<\/li>\n<li>\n<p><strong>How do I visualize metrics?<\/strong><br\/>\n   Commonly with Grafana, including <strong>Amazon Managed Grafana<\/strong>, configured to query the workspace.<\/p>\n<\/li>\n<li>\n<p><strong>How is access controlled?<\/strong><br\/>\n   With <strong>AWS IAM<\/strong>. Writers and readers are authorized via IAM policies and SigV4 request signing.<\/p>\n<\/li>\n<li>\n<p><strong>Can I isolate teams\/environments?<\/strong><br\/>\n   Yes\u2014use <strong>separate workspaces<\/strong> and IAM boundaries. This is a common best practice.<\/p>\n<\/li>\n<li>\n<p><strong>Is it regional or global?<\/strong><br\/>\n   Workspaces are <strong>regional<\/strong> resources. Plan multi-region observability intentionally.<\/p>\n<\/li>\n<li>\n<p><strong>What are the main cost drivers?<\/strong><br\/>\n   Ingestion volume (samples), storage footprint, query usage, and cardinality (number of unique time series).<\/p>\n<\/li>\n<li>\n<p><strong>What is \u201ccardinality\u201d and why does it matter?<\/strong><br\/>\n   Cardinality is the number of unique label combinations (time series). High cardinality increases ingestion, storage, and query costs and can degrade performance.<\/p>\n<\/li>\n<li>\n<p><strong>Can I send metrics from on-prem or another cloud?<\/strong><br\/>\n   Yes, if your collector can reach the workspace endpoints and authenticate with IAM. Many organizations do this over VPN\/Direct Connect and controlled egress.<\/p>\n<\/li>\n<li>\n<p><strong>Does AMP scrape targets for me?<\/strong><br\/>\n   No. You provide the scraping\/collection layer.<\/p>\n<\/li>\n<li>\n<p><strong>How do I handle remote_write outages?<\/strong><br\/>\n   Monitor remote_write success\/failure metrics on the collector, ensure adequate disk for WAL buffering, and design for retries\/backpressure.<\/p>\n<\/li>\n<li>\n<p><strong>Can I use Kubernetes IRSA with it?<\/strong><br\/>\n   Yes\u2014this is a recommended pattern: pods assume IAM roles to write\/query without static credentials.<\/p>\n<\/li>\n<li>\n<p><strong>Can I migrate from self-managed Prometheus without changing dashboards?<\/strong><br\/>\n   Often yes, because PromQL remains the query language. Some differences may exist in API compatibility or latency; validate with a pilot.<\/p>\n<\/li>\n<li>\n<p><strong>Should I use CloudWatch Metrics instead?<\/strong><br\/>\n   If your needs are mostly AWS service metrics, CloudWatch alarms, and simple dashboards, CloudWatch may be simpler. If you need Prometheus exporters\/PromQL and Kubernetes-centric observability, AMP is a strong fit.<\/p>\n<\/li>\n<li>\n<p><strong>Does it support alerts and recording rules?<\/strong><br\/>\n   AWS capabilities can evolve. Check current support for managed rules\/alerting in the official docs for your region and the pricing page for any related charges.<\/p>\n<\/li>\n<li>\n<p><strong>How do I keep costs under control?<\/strong><br\/>\n   Start by controlling label cardinality, dropping unused metrics, tuning scrape intervals, and separating dev\/test from prod workspaces.<\/p>\n<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\">17. Top Online Resources to Learn Amazon Managed Service for Prometheus<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Resource Type<\/th>\n<th>Name<\/th>\n<th>Why It Is Useful<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Official Documentation<\/td>\n<td>Amazon Managed Service for Prometheus User Guide \u2014 https:\/\/docs.aws.amazon.com\/prometheus\/latest\/userguide\/what-is-Amazon-Managed-Service-Prometheus.html<\/td>\n<td>Primary source for concepts, endpoints, IAM, and supported workflows<\/td>\n<\/tr>\n<tr>\n<td>Official Pricing<\/td>\n<td>Pricing page \u2014 https:\/\/aws.amazon.com\/prometheus\/pricing\/<\/td>\n<td>Current pricing dimensions by Region<\/td>\n<\/tr>\n<tr>\n<td>Pricing Tool<\/td>\n<td>AWS Pricing Calculator \u2014 https:\/\/calculator.aws\/#\/<\/td>\n<td>Build scenario-based estimates without guessing<\/td>\n<\/tr>\n<tr>\n<td>Getting Started<\/td>\n<td>Ingest and query metrics (AMP docs) \u2014 https:\/\/docs.aws.amazon.com\/prometheus\/latest\/userguide\/ingest-metrics.html<\/td>\n<td>Step-by-step ingestion patterns and supported clients<\/td>\n<\/tr>\n<tr>\n<td>Visualization<\/td>\n<td>Amazon Managed Grafana \u2014 https:\/\/docs.aws.amazon.com\/grafana\/latest\/userguide\/what-is-Amazon-Managed-Service-Grafana.html<\/td>\n<td>Common pairing for dashboards and alerting<\/td>\n<\/tr>\n<tr>\n<td>Observability Workshop<\/td>\n<td>AWS Observability Workshop \u2014 https:\/\/observability.workshop.aws\/<\/td>\n<td>Hands-on labs for metrics\/traces\/logs including Prometheus\/Grafana patterns<\/td>\n<\/tr>\n<tr>\n<td>GitHub (tooling)<\/td>\n<td>aws-sigv4-proxy \u2014 https:\/\/github.com\/awslabs\/aws-sigv4-proxy<\/td>\n<td>Common SigV4 signing proxy used in AMP integrations<\/td>\n<\/tr>\n<tr>\n<td>GitHub (collector)<\/td>\n<td>AWS OTel Collector \u2014 https:\/\/github.com\/aws-observability\/aws-otel-collector<\/td>\n<td>OpenTelemetry collector distribution widely used on AWS<\/td>\n<\/tr>\n<tr>\n<td>Reference Architecture<\/td>\n<td>AWS Architecture Center \u2014 https:\/\/aws.amazon.com\/architecture\/<\/td>\n<td>Patterns for multi-account, monitoring, and governance (search for Prometheus\/Grafana)<\/td>\n<\/tr>\n<tr>\n<td>Videos<\/td>\n<td>AWS YouTube channel \u2014 https:\/\/www.youtube.com\/user\/AmazonWebServices<\/td>\n<td>Search for \u201cAmazon Managed Service for Prometheus\u201d sessions and demos<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">18. Training and Certification Providers<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The following training providers may offer courses related to AWS observability, Prometheus, and platform engineering. Verify current course catalogs directly on their websites.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Institute<\/th>\n<th>Suitable Audience<\/th>\n<th>Likely Learning Focus<\/th>\n<th>Mode<\/th>\n<th>Website URL<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>DevOpsSchool.com<\/td>\n<td>DevOps engineers, SREs, platform teams<\/td>\n<td>Prometheus\/Grafana, AWS monitoring, DevOps practices<\/td>\n<td>check website<\/td>\n<td>https:\/\/www.devopsschool.com\/<\/td>\n<\/tr>\n<tr>\n<td>ScmGalaxy.com<\/td>\n<td>Beginners to intermediate engineers<\/td>\n<td>DevOps fundamentals, tools, CI\/CD, monitoring basics<\/td>\n<td>check website<\/td>\n<td>https:\/\/www.scmgalaxy.com\/<\/td>\n<\/tr>\n<tr>\n<td>CLoudOpsNow.in<\/td>\n<td>Cloud\/ops practitioners<\/td>\n<td>Cloud operations, monitoring\/observability, AWS operations<\/td>\n<td>check website<\/td>\n<td>https:\/\/www.cloudopsnow.in\/<\/td>\n<\/tr>\n<tr>\n<td>SreSchool.com<\/td>\n<td>SREs and operations teams<\/td>\n<td>SRE practices, SLOs\/SLIs, reliability engineering<\/td>\n<td>check website<\/td>\n<td>https:\/\/www.sreschool.com\/<\/td>\n<\/tr>\n<tr>\n<td>AiOpsSchool.com<\/td>\n<td>Ops + automation teams<\/td>\n<td>AIOps concepts, monitoring automation, incident workflows<\/td>\n<td>check website<\/td>\n<td>https:\/\/www.aiopsschool.com\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">19. Top Trainers<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">These sites are presented as trainer platforms\/resources. Verify offerings, credentials, and availability directly.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Platform\/Site<\/th>\n<th>Likely Specialization<\/th>\n<th>Suitable Audience<\/th>\n<th>Website URL<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>RajeshKumar.xyz<\/td>\n<td>DevOps\/cloud coaching and training resources<\/td>\n<td>Engineers seeking guided learning<\/td>\n<td>https:\/\/rajeshkumar.xyz\/<\/td>\n<\/tr>\n<tr>\n<td>devopstrainer.in<\/td>\n<td>DevOps tools and practices<\/td>\n<td>Beginners to intermediate DevOps practitioners<\/td>\n<td>https:\/\/www.devopstrainer.in\/<\/td>\n<\/tr>\n<tr>\n<td>devopsfreelancer.com<\/td>\n<td>Freelance DevOps help\/training resources<\/td>\n<td>Teams needing short-term expertise<\/td>\n<td>https:\/\/www.devopsfreelancer.com\/<\/td>\n<\/tr>\n<tr>\n<td>devopssupport.in<\/td>\n<td>DevOps support and training resources<\/td>\n<td>Ops teams needing hands-on help<\/td>\n<td>https:\/\/www.devopssupport.in\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">20. Top Consulting Companies<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">These organizations may provide consulting around DevOps, cloud operations, and observability. Confirm service details directly with each company.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Company Name<\/th>\n<th>Likely Service Area<\/th>\n<th>Where They May Help<\/th>\n<th>Consulting Use Case Examples<\/th>\n<th>Website URL<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>cotocus.com<\/td>\n<td>Cloud\/DevOps engineering services<\/td>\n<td>Platform engineering, observability rollouts<\/td>\n<td>Multi-account observability design; AMP + Grafana implementations<\/td>\n<td>https:\/\/cotocus.com\/<\/td>\n<\/tr>\n<tr>\n<td>DevOpsSchool.com<\/td>\n<td>DevOps consulting and enablement<\/td>\n<td>Training + implementation support<\/td>\n<td>Prometheus\/Grafana adoption; CI\/CD + monitoring standardization<\/td>\n<td>https:\/\/www.devopsschool.com\/<\/td>\n<\/tr>\n<tr>\n<td>DEVOPSCONSULTING.IN<\/td>\n<td>DevOps consulting services<\/td>\n<td>Assessments, implementations, operations support<\/td>\n<td>Migration from self-managed Prometheus; cost\/cardinality governance<\/td>\n<td>https:\/\/www.devopsconsulting.in\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">21. Career and Learning Roadmap<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">What to learn before this service<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Prometheus fundamentals:<\/li>\n<li>Metrics vs logs vs traces<\/li>\n<li>Prometheus data model (labels, time series)<\/li>\n<li>Scraping and exporters<\/li>\n<li>PromQL basics<\/li>\n<li>AWS fundamentals:<\/li>\n<li>IAM roles and policies<\/li>\n<li>VPC basics (subnets, routing, NAT\/IGW)<\/li>\n<li>CloudWatch basics (for broader AWS operations)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">What to learn after this service<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Advanced PromQL and dashboard design<\/li>\n<li>Recording rules and alerting strategies (and whether they run in Grafana, Prometheus, or managed services)<\/li>\n<li>OpenTelemetry pipelines for unified metrics\/logs\/traces<\/li>\n<li>Multi-account governance:<\/li>\n<li>AWS Organizations, SCPs, centralized logging, identity federation<\/li>\n<li>Cost optimization for observability data (cardinality management, sampling strategies where relevant)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Job roles that use it<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Site Reliability Engineer (SRE)<\/li>\n<li>DevOps Engineer<\/li>\n<li>Platform Engineer<\/li>\n<li>Cloud Operations Engineer<\/li>\n<li>Observability Engineer<\/li>\n<li>Solutions Architect (designing monitoring platforms)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Certification path (AWS)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">There is no dedicated \u201cAmazon Managed Service for Prometheus\u201d certification. Relevant AWS certifications typically include:\n&#8211; AWS Certified SysOps Administrator \u2013 Associate\n&#8211; AWS Certified DevOps Engineer \u2013 Professional\n&#8211; AWS Certified Solutions Architect \u2013 Associate\/Professional<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">(Choose based on your role; verify the latest AWS certification tracks on https:\/\/aws.amazon.com\/certification\/)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Project ideas for practice<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>EKS observability baseline<\/strong>: kube-state-metrics + node-exporter + app metrics \u2192 AMP \u2192 Grafana dashboards.<\/li>\n<li><strong>Cardinality governance<\/strong>: implement metric relabeling rules and measure ingestion reduction.<\/li>\n<li><strong>Multi-account setup<\/strong>: one observability account with AMP workspaces, multiple workload accounts writing via role assumption.<\/li>\n<li><strong>Hybrid metrics<\/strong>: on-prem Prometheus remote_write to AMP over VPN, with strict IAM controls.<\/li>\n<li><strong>SLO dashboards<\/strong>: implement SLIs in PromQL and visualize error budgets.<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\">22. Glossary<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>AMP<\/strong>: Common shorthand for Amazon Managed Service for Prometheus.<\/li>\n<li><strong>Prometheus<\/strong>: Open-source monitoring system and time-series database.<\/li>\n<li><strong>PromQL<\/strong>: Prometheus Query Language used to query time-series metrics.<\/li>\n<li><strong>Exporter<\/strong>: A service that exposes metrics in Prometheus format (e.g., Node Exporter).<\/li>\n<li><strong>Scrape<\/strong>: Prometheus pulling metrics from a <code>\/metrics<\/code> endpoint on a schedule.<\/li>\n<li><strong>remote_write<\/strong>: Prometheus feature to push samples to a remote storage backend.<\/li>\n<li><strong>Time series<\/strong>: A sequence of timestamped metric samples identified by a metric name and labels.<\/li>\n<li><strong>Labels<\/strong>: Key\/value pairs that identify a time series (e.g., <code>service=\"checkout\"<\/code>).<\/li>\n<li><strong>Cardinality<\/strong>: The number of unique label combinations; drives the number of time series.<\/li>\n<li><strong>Workspace<\/strong>: Logical container in Amazon Managed Service for Prometheus that holds metrics and provides endpoints.<\/li>\n<li><strong>SigV4<\/strong>: AWS Signature Version 4 signing process for authenticating API requests.<\/li>\n<li><strong>IRSA<\/strong>: IAM Roles for Service Accounts (EKS feature) for assigning IAM roles to pods.<\/li>\n<li><strong>WAL<\/strong>: Write-Ahead Log used by Prometheus to buffer samples, especially when remote_write is enabled.<\/li>\n<li><strong>SSM Session Manager<\/strong>: AWS Systems Manager feature for secure shell-like access without inbound ports.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">23. Summary<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Amazon Managed Service for Prometheus is AWS\u2019s managed, Prometheus-compatible metrics backend in the <strong>Management and governance<\/strong> category. It provides workspaces with ingestion (remote_write) and query (PromQL) endpoints, governed by IAM, enabling teams to centralize metrics storage and querying without operating large Prometheus storage systems.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">It matters because Prometheus is widely adopted, but running it reliably at scale (HA, retention, storage growth, multi-tenant governance) is non-trivial. Amazon Managed Service for Prometheus fits well when you want to keep Prometheus exporters and PromQL while adopting AWS-native identity, access control, and managed operations.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Key cost and security takeaways:\n&#8211; Costs are driven by ingestion volume, storage footprint, query usage, and especially label cardinality\u2014manage cardinality early.\n&#8211; Secure access with least-privilege IAM roles (separate writers and readers), avoid static credentials, and consider private networking where required.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Use it when you want a managed Prometheus backend for Kubernetes and cloud-native workloads, especially at multi-cluster or multi-team scale. Next step: pair it with a visualization layer (often Amazon Managed Grafana) and standardize collector configurations and label governance across your environments.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Management and governance<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20,33],"tags":[],"class_list":["post-275","post","type-post","status-publish","format-standard","hentry","category-aws","category-management-and-governance"],"_links":{"self":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/posts\/275","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/comments?post=275"}],"version-history":[{"count":0,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/posts\/275\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/media?parent=275"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/categories?post=275"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/tags?post=275"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}