{"id":628,"date":"2026-04-14T19:34:21","date_gmt":"2026-04-14T19:34:21","guid":{"rendered":"https:\/\/www.devopsschool.com\/tutorials\/google-cloud-container-optimized-os-tutorial-architecture-pricing-use-cases-and-hands-on-guide-for-compute\/"},"modified":"2026-04-14T19:34:21","modified_gmt":"2026-04-14T19:34:21","slug":"google-cloud-container-optimized-os-tutorial-architecture-pricing-use-cases-and-hands-on-guide-for-compute","status":"publish","type":"post","link":"https:\/\/www.devopsschool.com\/tutorials\/google-cloud-container-optimized-os-tutorial-architecture-pricing-use-cases-and-hands-on-guide-for-compute\/","title":{"rendered":"Google Cloud Container-Optimized OS Tutorial: Architecture, Pricing, Use Cases, and Hands-On Guide for Compute"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">Category<\/h2>\n\n\n\n<p>Compute<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">1. Introduction<\/h2>\n\n\n\n<p><strong>What this service is<\/strong><br\/>\nContainer-Optimized OS is a Google-managed operating system image for Google Cloud Compute Engine virtual machines (VMs) that is specifically designed to run containers securely and efficiently.<\/p>\n\n\n\n<p><strong>Simple explanation (one paragraph)<\/strong><br\/>\nIf you want to run containers on a VM in Google Cloud without managing a general-purpose Linux distribution (packages, frequent configuration drift, large attack surface), Container-Optimized OS gives you a minimal OS that boots fast, stays locked down, and is tuned for container workloads.<\/p>\n\n\n\n<p><strong>Technical explanation (one paragraph)<\/strong><br\/>\nContainer-Optimized OS (often abbreviated as COS) is a hardened, minimal OS image maintained by Google, based on Chromium OS concepts (immutable \/ read-only root filesystem, verified boot design patterns, and automatic updates). It\u2019s intended to be used as the host OS for container runtimes (commonly <code>containerd<\/code>, and in some contexts Docker compatibility\u2014verify current runtime options in the official docs). COS integrates naturally with Compute Engine features (instance metadata, Managed Instance Groups, load balancing, service accounts, VPC networking) and is also a common node OS choice for Google Kubernetes Engine (GKE) node images (for example, COS variants used with <code>containerd<\/code>).<\/p>\n\n\n\n<p><strong>What problem it solves<\/strong><br\/>\nIt solves the \u201cVM host management tax\u201d for container hosting: OS patching risk, drift across fleets, oversized base images, inconsistent security baselines, and operational toil when you only need a stable host to run containers.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">2. What is Container-Optimized OS?<\/h2>\n\n\n\n<p><strong>Official purpose<\/strong><br\/>\nContainer-Optimized OS is designed by Google to provide a secure, efficient, and maintainable host environment for running containers on Compute Engine.<\/p>\n\n\n\n<p><strong>Core capabilities<\/strong>\n&#8211; Run containerized workloads on Compute Engine VMs with a minimal host OS footprint.\n&#8211; Reduce host attack surface compared to a general-purpose Linux OS.\n&#8211; Provide automated updates and a consistent base image across fleets.\n&#8211; Support container-focused deployment patterns (for example, \u201crun a container as the VM workload\u201d via Compute Engine\u2019s container-on-VM workflows).<\/p>\n\n\n\n<p><strong>Major components (conceptual)<\/strong>\n&#8211; <strong>Minimal OS userland<\/strong>: fewer packages\/tools than a general-purpose distro.\n&#8211; <strong>Hardened\/immutable design<\/strong>: read-only root filesystem patterns help reduce drift and persistence of unwanted changes.\n&#8211; <strong>Container runtime support<\/strong>: commonly <code>containerd<\/code> (and sometimes Docker-related tooling depending on the image family and use case\u2014verify current details).\n&#8211; <strong>Update system<\/strong>: designed for automated, reliable OS updates.\n&#8211; <strong>Compute Engine integration points<\/strong>: instance metadata, startup configuration patterns, logging\/monitoring integration paths, and compatibility with fleet constructs like Managed Instance Groups (MIGs).<\/p>\n\n\n\n<p><strong>Service type<\/strong><br\/>\nContainer-Optimized OS is an <strong>operating system image<\/strong> provided by Google Cloud for <strong>Compute Engine<\/strong>. It is not a separate hosted \u201cservice\u201d with its own control plane; you select it as the boot disk image for VMs (or implicitly via workflows that create COS-based instances).<\/p>\n\n\n\n<p><strong>Scope (how it\u2019s \u201cscoped\u201d in Google Cloud)<\/strong><br\/>\n&#8211; <strong>Image availability<\/strong>: COS images are published by Google and are accessible within projects when you create Compute Engine instances (subject to permissions).\n&#8211; <strong>Compute Engine resources<\/strong>: VMs are <strong>zonal<\/strong> resources; Managed Instance Groups can be <strong>zonal<\/strong> or <strong>regional<\/strong>; load balancers are <strong>global<\/strong> or <strong>regional<\/strong> depending on type.\n&#8211; <strong>Operational scope<\/strong>: you manage COS usage per project\/VPC\/instance template just like other Compute Engine images.<\/p>\n\n\n\n<p><strong>How it fits into the Google Cloud ecosystem<\/strong>\n&#8211; <strong>Compute Engine<\/strong>: primary place you use COS\u2014single instances, MIGs, container-on-VM patterns.\n&#8211; <strong>GKE<\/strong>: COS is widely used as a node OS option (GKE manages nodes; you choose node image type).\n&#8211; <strong>Artifact Registry<\/strong>: store container images securely and pull from COS-hosted runtimes.\n&#8211; <strong>Cloud Logging\/Monitoring<\/strong>: standard observability stack for VM and workload telemetry (implementation details depend on your chosen agents\/approach; verify COS support for specific agents).\n&#8211; <strong>VPC + Cloud Load Balancing + Cloud Armor<\/strong>: front-end and secure COS-based workloads.\n&#8211; <strong>IAM + Service Accounts<\/strong>: authorize workloads to call Google APIs without embedding credentials.<\/p>\n\n\n\n<p><strong>Service name status<\/strong><br\/>\nAs of the latest generally available Google Cloud documentation, the product is still called <strong>Container-Optimized OS<\/strong>. (If you are using it via GKE node images, you may see COS variants referenced by image type names; verify the current image type labels in GKE docs.)<\/p>\n\n\n\n<p>Official docs entry point: https:\/\/cloud.google.com\/container-optimized-os\/docs<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">3. Why use Container-Optimized OS?<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Business reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Lower operational overhead<\/strong>: fewer OS-level tickets (patching cadence, baseline hardening, drift remediation) when your real product is the container workload.<\/li>\n<li><strong>Standardization<\/strong>: a consistent host OS across dev\/test\/prod and across teams reduces \u201csnowflake VM\u201d risk.<\/li>\n<li><strong>Faster time to production<\/strong>: fewer decisions about OS packages and configuration; focus on image build + deployment.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Technical reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Optimized for containers<\/strong>: COS is built for the \u201ccontainer is the unit of deployment\u201d model.<\/li>\n<li><strong>Reduced footprint<\/strong>: smaller OS surface area than a typical general-purpose distro.<\/li>\n<li><strong>Immutability patterns<\/strong>: a read-only root filesystem approach discourages ad-hoc changes on the host.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Operational reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Fleet-friendly<\/strong>: works well with instance templates and Managed Instance Groups; replace instances rather than repair them.<\/li>\n<li><strong>Predictable updates<\/strong>: designed to be updated regularly in a controlled way (pin image versions when necessary, or use channels\u2014verify exact mechanics in docs).<\/li>\n<li><strong>Faster boot and simpler host<\/strong>: in many environments, COS boots quickly and has fewer moving parts.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Security \/ compliance reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Smaller attack surface<\/strong>: fewer packages and services.<\/li>\n<li><strong>Hardening patterns<\/strong>: immutable root filesystem design, strong defaults, and automatic updates reduce exposure windows.<\/li>\n<li><strong>Better separation of concerns<\/strong>: app dependencies go into container images rather than the host OS.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Scalability \/ performance reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Works well with MIG autoscaling<\/strong>: you can scale out stateless container workloads by adding instances.<\/li>\n<li><strong>Container-centric resource usage<\/strong>: host overhead is typically smaller than full-featured distros (workload-dependent).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">When teams should choose it<\/h3>\n\n\n\n<p>Choose Container-Optimized OS when:\n&#8211; You primarily run <strong>one or more containers as the VM workload<\/strong>.\n&#8211; You want <strong>standardized, hardened hosts<\/strong> with minimal customization.\n&#8211; You plan to use <strong>MIGs<\/strong> for elasticity and immutable infrastructure practices.\n&#8211; You want a stepping stone between \u201cserverless\u201d and \u201cfull Kubernetes\u201d:\n  &#8211; more control than Cloud Run\n  &#8211; less platform complexity than managing Kubernetes for small deployments<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">When teams should not choose it<\/h3>\n\n\n\n<p>Avoid or reconsider COS when:\n&#8211; You need <strong>extensive OS customization<\/strong>, third-party agents that require package managers, or kernel\/module tinkering.\n&#8211; You rely on <strong>interactive debugging<\/strong> with many common Linux tools installed by default.\n&#8211; Your workload expects a <strong>general-purpose VM<\/strong> environment (custom services, cron-heavy hosts, configuration management tools that assume writable root).\n&#8211; You want a managed container platform (consider <strong>Cloud Run<\/strong> or <strong>GKE Autopilot<\/strong>).<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">4. Where is Container-Optimized OS used?<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Industries<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>SaaS and web<\/strong>: standardized container fleets behind load balancers.<\/li>\n<li><strong>Fintech and regulated industries<\/strong>: hardened baseline + controlled patching (always validate compliance needs against official attestations; COS itself isn\u2019t automatically a compliance certification).<\/li>\n<li><strong>Gaming and media<\/strong>: burstable stateless services or edge-like services on VM fleets.<\/li>\n<li><strong>Data platforms<\/strong>: containerized sidecars, lightweight services, ingestion endpoints (not the place to run full data stacks unless designed carefully).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Team types<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Platform engineering teams building VM-based container platforms.<\/li>\n<li>DevOps\/SRE teams maintaining fleets of stateless services.<\/li>\n<li>Security teams standardizing hardened VM images.<\/li>\n<li>App teams that want containers on VMs without adopting Kubernetes immediately.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Workloads<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>HTTP APIs and web front ends (Nginx, Envoy, app services).<\/li>\n<li>Background workers \/ job processors (pull from Pub\/Sub, process tasks).<\/li>\n<li>Proxies, gateways, and lightweight network appliances packaged as containers.<\/li>\n<li>Build runners or CI agents packaged in containers (be careful with privilege needs).<\/li>\n<li>Internal tools that don\u2019t justify Kubernetes overhead.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Architectures<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Single VM running a container with a public IP (small dev\/test).<\/li>\n<li>MIG of COS instances pulling images from Artifact Registry, fronted by Cloud Load Balancing.<\/li>\n<li>Blue\/green or canary using multiple MIGs or rolling updates of instance templates.<\/li>\n<li>Hybrid patterns: COS VMs for specific components; GKE for the rest.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Production vs dev\/test usage<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Dev\/test<\/strong>: quick, low-maintenance way to run containers on VMs; useful for validation and demos.<\/li>\n<li><strong>Production<\/strong>: common when you want VM-level control (custom networking, instance types, GPUs, specialized disks) but still want container immutability and a hardened host.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">5. Top Use Cases and Scenarios<\/h2>\n\n\n\n<p>Below are realistic scenarios where Container-Optimized OS on Google Cloud Compute Engine is a strong fit.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1) Single-container web service on a VM (simple hosting)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: You need to host a small web service quickly, but don\u2019t want to maintain Ubuntu patching and packages.<\/li>\n<li><strong>Why COS fits<\/strong>: Minimal host; run your container as the primary workload.<\/li>\n<li><strong>Example<\/strong>: A small internal dashboard served by <code>nginx<\/code> + a backend container on one VM for a dev environment.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">2) Stateless API fleet with Managed Instance Groups<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Scale an API horizontally with predictable, repeatable hosts.<\/li>\n<li><strong>Why COS fits<\/strong>: Great with instance templates + MIG; immutable rollout by replacing instances.<\/li>\n<li><strong>Example<\/strong>: A regional MIG of COS instances runs <code>my-api:1.2.3<\/code>, autoscaled by CPU, fronted by an external HTTP(S) load balancer.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">3) Edge proxy \/ gateway layer<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: You need high-performance L7 proxying with strong OS hardening.<\/li>\n<li><strong>Why COS fits<\/strong>: Minimal OS + containerized proxy simplifies patching and upgrades.<\/li>\n<li><strong>Example<\/strong>: Envoy containers in a MIG terminate mTLS and route traffic to internal services.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">4) Batch\/worker nodes pulling tasks from Pub\/Sub<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Worker processes must scale up and down quickly and remain consistent.<\/li>\n<li><strong>Why COS fits<\/strong>: Fast to boot and easy to \u201creplace instead of fix.\u201d<\/li>\n<li><strong>Example<\/strong>: A MIG of worker VMs runs a container that pulls jobs from Pub\/Sub and writes results to Cloud Storage.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">5) Secure \u201cjump workload\u201d containers (not jump hosts)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: You need controlled administrative tools without turning a VM into a long-lived snowflake.<\/li>\n<li><strong>Why COS fits<\/strong>: Host stays minimal; tools live in container images; access is audited via IAM and OS Login\/IAP.<\/li>\n<li><strong>Example<\/strong>: Run a locked-down container image containing database admin CLI tools and short-lived credentials.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6) CI\/CD self-hosted runners packaged as containers<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: You need runners that can be replaced easily and remain clean after jobs.<\/li>\n<li><strong>Why COS fits<\/strong>: Immutable host; runners in containers; replace on compromise.<\/li>\n<li><strong>Example<\/strong>: GitHub Actions runners or GitLab runners in a MIG where each instance is recycled frequently.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">7) Dedicated network function appliances (containerized)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: You need custom routing, NAT helpers, or observability sidecars in a controlled environment.<\/li>\n<li><strong>Why COS fits<\/strong>: Predictable baseline and fewer host services.<\/li>\n<li><strong>Example<\/strong>: A containerized forward proxy or DNS caching tier.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">8) Pre-GKE stepping stone for teams adopting containers<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Team wants containers but isn\u2019t ready for Kubernetes complexity.<\/li>\n<li><strong>Why COS fits<\/strong>: Container workflow with VM primitives (firewall, load balancer) is simpler than Kubernetes.<\/li>\n<li><strong>Example<\/strong>: Two services deployed as two MIGs; rollouts via instance template version changes.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">9) Multi-tenant internal services with strict baseline controls<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Multiple teams run services on shared platform; need consistent OS baseline.<\/li>\n<li><strong>Why COS fits<\/strong>: Reduced drift; centralized image selection; strong defaults.<\/li>\n<li><strong>Example<\/strong>: Platform team provides an opinionated COS instance template and teams provide only container image + config.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">10) Specialized Compute Engine shapes (high-memory, local SSD, etc.)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Your workload needs VM-specific features but you still want containers.<\/li>\n<li><strong>Why COS fits<\/strong>: You get Compute Engine flexibility with containerized apps.<\/li>\n<li><strong>Example<\/strong>: A high-memory VM runs a containerized in-memory service with persistent disks for snapshots.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">11) Blue\/green rollouts using instance template versions<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: You need controlled rollouts with easy rollback.<\/li>\n<li><strong>Why COS fits<\/strong>: New template references new container image digest; rollback is simply switching MIG template.<\/li>\n<li><strong>Example<\/strong>: Two MIGs (blue and green) behind a load balancer; shift traffic gradually.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">12) Hardened internal developer preview environments<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: You want short-lived preview environments without long-term host maintenance.<\/li>\n<li><strong>Why COS fits<\/strong>: Easy to create and delete; predictable baseline.<\/li>\n<li><strong>Example<\/strong>: Per-branch preview service runs in a COS VM for a few hours, then deleted.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">6. Core Features<\/h2>\n\n\n\n<blockquote>\n<p>Note: Some implementation details (exact runtime, channels, update controls, logging agent support) can change over time. Where appropriate, this section calls out what to verify in official docs.<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\">6.1 Minimal, container-focused OS image<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Provides a slim host OS designed primarily to run containers.<\/li>\n<li><strong>Why it matters<\/strong>: Fewer packages and services typically reduce attack surface and patching scope.<\/li>\n<li><strong>Practical benefit<\/strong>: Less OS maintenance; smaller baseline to secure.<\/li>\n<li><strong>Limitations\/caveats<\/strong>: Not suited for workloads that assume a full Linux distro with package manager-based customization.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6.2 Read-only \/ immutable root filesystem patterns<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Uses an immutable-style root filesystem (read-only root) design approach.<\/li>\n<li><strong>Why it matters<\/strong>: Reduces configuration drift and persistence of unauthorized host changes.<\/li>\n<li><strong>Practical benefit<\/strong>: Encourages immutable infrastructure (replace rather than patch-in-place).<\/li>\n<li><strong>Limitations\/caveats<\/strong>: Installing host packages or modifying system files is intentionally constrained; you must plan for debugging and customization differently.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6.3 Automatic updates (designed for consistent patching)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: COS is designed to receive updates from Google to address security and stability issues.<\/li>\n<li><strong>Why it matters<\/strong>: Shortens exposure window to vulnerabilities and reduces manual patch operations.<\/li>\n<li><strong>Practical benefit<\/strong>: Better baseline hygiene across fleets.<\/li>\n<li><strong>Limitations\/caveats<\/strong>: Updates can require reboots; for production, use MIG rolling updates and capacity planning. Verify current controls for update strategy in official docs.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6.4 Support for container runtimes and OCI images<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Runs standard container images (OCI\/Docker image format).<\/li>\n<li><strong>Why it matters<\/strong>: Your build pipeline stays standard (Cloud Build, GitHub Actions, etc.).<\/li>\n<li><strong>Practical benefit<\/strong>: Build once, run anywhere containers; pull from Artifact Registry.<\/li>\n<li><strong>Limitations\/caveats<\/strong>: Runtime tooling differs across images and use cases (for example, <code>containerd<\/code> vs Docker). Verify the current recommended runtime and tooling in COS docs.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6.5 Tight integration with Compute Engine primitives<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: COS is used like other Compute Engine images and works with:<\/li>\n<li>instance templates and MIGs<\/li>\n<li>VPC networks and firewall rules<\/li>\n<li>load balancing<\/li>\n<li>service accounts<\/li>\n<li>metadata and startup configuration patterns<\/li>\n<li><strong>Why it matters<\/strong>: Lets you build production architectures with standard Google Cloud Compute building blocks.<\/li>\n<li><strong>Practical benefit<\/strong>: Consistent operations with the rest of Compute Engine.<\/li>\n<li><strong>Limitations\/caveats<\/strong>: Some \u201ctraditional VM administration\u201d approaches (configuration management writing to root) are not a great fit.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6.6 \u201cRun a container as the VM workload\u201d workflows<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Compute Engine supports deploying a container to a VM in a way that starts the container on boot (commonly done with <code>gcloud compute instances create-with-container<\/code> and\/or container declarations in instance metadata).<\/li>\n<li><strong>Why it matters<\/strong>: You can treat the VM as a container host appliance.<\/li>\n<li><strong>Practical benefit<\/strong>: Very fast path to \u201ccontainer on VM\u201d without building a custom image.<\/li>\n<li><strong>Limitations\/caveats<\/strong>: This is not Kubernetes. Health checks, rollouts, and multi-container orchestration are more manual unless you build them (or use MIG patterns). Confirm current container declaration capabilities in the Compute Engine containers documentation.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6.7 Strong compatibility with immutable\/fleet operations (MIG)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Encourages immutable operations: update instance templates, roll instances.<\/li>\n<li><strong>Why it matters<\/strong>: Predictable deployments; simpler rollback; better reliability than repairing pets.<\/li>\n<li><strong>Practical benefit<\/strong>: Easier to standardize across teams.<\/li>\n<li><strong>Limitations\/caveats<\/strong>: Stateful workloads require extra design (persistent disks, careful draining, database patterns).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6.8 Works well with Artifact Registry + private images<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Pull images securely from Artifact Registry with IAM-controlled access.<\/li>\n<li><strong>Why it matters<\/strong>: Avoid unauthenticated public pulls; control provenance.<\/li>\n<li><strong>Practical benefit<\/strong>: Enterprise-ready image governance.<\/li>\n<li><strong>Limitations\/caveats<\/strong>: You must ensure the VM\u2019s service account has the right Artifact Registry permissions and that egress\/firewall allows registry access.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6.9 Designed for secure boot patterns (verify features used)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: COS is designed with verified boot concepts (Chromium OS heritage).<\/li>\n<li><strong>Why it matters<\/strong>: Integrity of the host OS is central to container security.<\/li>\n<li><strong>Practical benefit<\/strong>: Better baseline trust.<\/li>\n<li><strong>Limitations\/caveats<\/strong>: Compute Engine also has Shielded VM features; confirm compatibility and best practices for COS + Shielded VM in official docs.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">7. Architecture and How It Works<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">High-level architecture<\/h3>\n\n\n\n<p>At its simplest, Container-Optimized OS is:\n&#8211; A <strong>Compute Engine VM<\/strong>\n&#8211; Booting from a <strong>COS image<\/strong>\n&#8211; Running one or more <strong>containers<\/strong> as the workload\n&#8211; Connected to a <strong>VPC network<\/strong>\n&#8211; Observed via <strong>Cloud Logging\/Monitoring<\/strong> and governed via <strong>IAM<\/strong><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Control flow vs data flow<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Control plane (Google Cloud)<\/strong>:<\/li>\n<li>You define a VM or instance template referencing a COS image family\/version.<\/li>\n<li>You optionally provide container configuration (image, env vars, restart policy) via metadata or \u201ccreate-with-container\u201d.<\/li>\n<li>IAM decides who can create\/modify instances, firewall rules, service accounts, and who can SSH (for example via OS Login).<\/li>\n<li><strong>Data plane (your workload)<\/strong>:<\/li>\n<li>Traffic hits a VM external IP or a load balancer.<\/li>\n<li>The container receives traffic on its exposed port.<\/li>\n<li>The container calls other services (databases, Pub\/Sub, Storage) using service account credentials.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Integrations with related services<\/h3>\n\n\n\n<p>Common and practical integrations include:\n&#8211; <strong>Artifact Registry<\/strong> for private container images.\n&#8211; <strong>Cloud Load Balancing<\/strong> for global\/regional front ends.\n&#8211; <strong>Managed Instance Groups<\/strong> for scaling and rolling updates.\n&#8211; <strong>Cloud DNS<\/strong> for naming.\n&#8211; <strong>Secret Manager<\/strong> (recommended) for secrets retrieved at runtime by the app, rather than stored in instance metadata.\n&#8211; <strong>Cloud Logging and Cloud Monitoring<\/strong> for logs\/metrics (agent approach varies; verify the recommended agent approach for COS in official docs).\n&#8211; <strong>Cloud Armor<\/strong> to protect HTTP(S) services from common attacks when using HTTP(S) load balancing.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Dependency services<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Compute Engine API<\/strong> is required.<\/li>\n<li>If using private images: <strong>Artifact Registry API<\/strong> and IAM bindings.<\/li>\n<li>If using load balancing: additional networking and load balancing APIs\/resources.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Security\/authentication model<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Google Cloud IAM<\/strong>: controls who can create\/modify instances and associated resources.<\/li>\n<li><strong>Service accounts<\/strong>: attached to VMs to grant workload access to Google APIs.<\/li>\n<li><strong>OS Login \/ IAM-based SSH<\/strong> (recommended): use IAM to control SSH access and log it.<\/li>\n<li><strong>Firewall rules<\/strong>: enforce network exposure at VPC level.<\/li>\n<li><strong>Container image security<\/strong>: depends on your build pipeline, scanning, and provenance controls.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Networking model<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>VMs attach to a VPC network\/subnet.<\/li>\n<li>Ingress is controlled by firewall rules and (optionally) load balancers.<\/li>\n<li>Egress follows VPC routing\/NAT; consider Cloud NAT if you want private instances without external IPs.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Monitoring\/logging\/governance considerations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Decide how you will:<\/li>\n<li>collect host and container logs<\/li>\n<li>collect metrics and traces<\/li>\n<li>patch\/roll instances safely (MIG rolling update)<\/li>\n<li>tag and label resources for cost allocation<\/li>\n<li>The best practice is to treat COS instances as <strong>replaceable<\/strong> and to externalize state.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Simple architecture diagram (Mermaid)<\/h3>\n\n\n\n<pre><code class=\"language-mermaid\">flowchart LR\n  User((User)) --&gt;|HTTP| FW[Firewall rule]\n  FW --&gt; VM[COS VM&lt;br\/&gt;Container-Optimized OS]\n  VM --&gt; C[Container&lt;br\/&gt;Web App]\n  C --&gt; GCP[(Google APIs&lt;br\/&gt;via Service Account)]\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Production-style architecture diagram (Mermaid)<\/h3>\n\n\n\n<pre><code class=\"language-mermaid\">flowchart TB\n  subgraph Internet[Internet]\n    U((Users))\n  end\n\n  subgraph GCP[Google Cloud Project]\n    direction TB\n\n    LB[External HTTP(S) Load Balancer]\n    ARMOR[Cloud Armor Policy]\n    DNS[Cloud DNS]\n\n    subgraph VPC[VPC Network]\n      direction TB\n      MIG[Regional Managed Instance Group&lt;br\/&gt;COS instances]\n      HC[Health Checks]\n      FW2[Firewall Rules]\n      NAT[Cloud NAT (optional)]\n    end\n\n    AR[Artifact Registry&lt;br\/&gt;Private Images]\n    SM[Secret Manager]\n    LOG[Cloud Logging]\n    MON[Cloud Monitoring]\n    IAM[IAM + Service Accounts]\n  end\n\n  U --&gt;|DNS| DNS --&gt; LB\n  LB --&gt; ARMOR --&gt; MIG\n  HC --&gt; MIG\n  FW2 --&gt; MIG\n  MIG --&gt;|pull image| AR\n  MIG --&gt;|fetch secrets at runtime| SM\n  MIG --&gt; LOG\n  MIG --&gt; MON\n  IAM --&gt; MIG\n  MIG --&gt; NAT\n<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">8. Prerequisites<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Account \/ project requirements<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>A <strong>Google Cloud project<\/strong> with <strong>billing enabled<\/strong>.<\/li>\n<li>Ability to enable required APIs in the project.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Permissions \/ IAM roles<\/h3>\n\n\n\n<p>For a lab in a personal sandbox project, <strong>Project Owner<\/strong> is simplest.<\/p>\n\n\n\n<p>For least-privilege in a real environment, you typically need:\n&#8211; Permissions to create and manage Compute Engine instances (for example, <code>roles\/compute.instanceAdmin.v1<\/code>)\n&#8211; Permissions to create firewall rules if you do that in the lab (for example, <code>roles\/compute.securityAdmin<\/code> or <code>roles\/compute.networkAdmin<\/code>)\n&#8211; Permission to use a service account if attaching one (<code>roles\/iam.serviceAccountUser<\/code> on that service account)<\/p>\n\n\n\n<p>Exact least-privilege depends on your organization policies; verify with your IAM admins.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Billing requirements<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Compute Engine resources incur charges (VM core\/RAM time, disks, IPs, load balancing, egress).<\/li>\n<li>Container-Optimized OS itself is an image; pricing is primarily for the underlying Compute Engine resources.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">CLI \/ tools<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Cloud Shell<\/strong> (recommended) or local installation of:<\/li>\n<li><code>gcloud<\/code> CLI: https:\/\/cloud.google.com\/sdk\/docs\/install<\/li>\n<li>Optional: <code>curl<\/code> for testing endpoints.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Region availability<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>COS images are used in Compute Engine, which is available across many regions\/zones. Choose a zone close to your users and other dependencies.<\/li>\n<li>Some machine types and features are region\/zone dependent. Verify in Compute Engine docs if you need specific hardware.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Quotas \/ limits<\/h3>\n\n\n\n<p>Common quotas to check:\n&#8211; vCPU quota in your chosen region\n&#8211; In-use IP addresses\n&#8211; Firewall rules quota (usually not an issue in small labs)\n&#8211; If using MIG\/LB later: forwarding rules and backend service quotas<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Prerequisite services<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Compute Engine API<\/strong> must be enabled.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">9. Pricing \/ Cost<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Pricing model (accurate framing)<\/h3>\n\n\n\n<p>Container-Optimized OS does not have a separate SKU you pay for like a managed service. <strong>Your costs come from the Compute Engine resources you run COS on<\/strong>, plus any connected services (load balancer, disks, logs, egress, Artifact Registry, etc.).<\/p>\n\n\n\n<p>Primary official pricing references:\n&#8211; Compute Engine pricing: https:\/\/cloud.google.com\/compute\/pricing\n&#8211; Google Cloud Pricing Calculator: https:\/\/cloud.google.com\/products\/calculator<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Pricing dimensions to understand<\/h3>\n\n\n\n<p>You typically pay for:\n1. <strong>VM runtime<\/strong>: vCPU + memory pricing by machine type and region.\n2. <strong>Boot disk and attached disks<\/strong>: persistent disk type (balanced, SSD, standard), size (GB-month), and IOPS\/throughput characteristics depending on disk type.\n3. <strong>Networking<\/strong>:\n   &#8211; Egress to the internet (often a major driver)\n   &#8211; Cross-region traffic\n   &#8211; Load balancer data processing (if used)\n4. <strong>External IP<\/strong>: depending on how the IP is used (ephemeral vs reserved, attached vs unused) pricing can vary; verify current external IP pricing in official docs.\n5. <strong>Operations suite (Logging\/Monitoring)<\/strong>: logs ingestion\/retention and metrics beyond free allocations.\n6. <strong>Artifact Registry<\/strong>:\n   &#8211; storage for container images\n   &#8211; network egress when pulling images across regions (and general network costs)\n7. <strong>Optional security<\/strong>:\n   &#8211; Cloud Armor policies and rules\n   &#8211; KMS usage if you add customer-managed encryption keys (CMEK) to disks or other resources<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Free tier (if applicable)<\/h3>\n\n\n\n<p>Google Cloud has an \u201cAlways Free\u201d tier for some resources in some regions (historically including a small VM). Eligibility and details change over time and vary by region and usage. <strong>Verify current Always Free eligibility<\/strong> in official docs before assuming a workload is free.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Cost drivers (what surprises teams)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Internet egress<\/strong> from serving traffic publicly can exceed compute costs.<\/li>\n<li><strong>Overprovisioned machine types<\/strong>: using a larger instance than necessary for a small container.<\/li>\n<li><strong>Log volume<\/strong>: chatty containers can generate expensive log ingestion.<\/li>\n<li><strong>Load balancer + multiple zones<\/strong>: great for reliability, but adds cost.<\/li>\n<li><strong>Image pull patterns<\/strong>: frequent instance recreation can cause frequent image pulls (and potential egress) if not regionally optimized.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Hidden\/indirect costs<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Engineering time for:<\/li>\n<li>secure image supply chain<\/li>\n<li>rollouts\/rollbacks<\/li>\n<li>secrets management<\/li>\n<li>observability<\/li>\n<li>If you move from \u201csingle VM\u201d to \u201cproduction fleet\u201d, costs often shift to:<\/li>\n<li>load balancing<\/li>\n<li>monitoring\/logging<\/li>\n<li>security controls (Armor, WAF-like policies)<\/li>\n<li>multi-zone redundancy<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">How to optimize cost (practical)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Right-size instances (start small; measure CPU\/memory).<\/li>\n<li>Use <strong>Managed Instance Groups<\/strong> with autoscaling for variable traffic.<\/li>\n<li>Use <strong>Sustained Use Discounts<\/strong> automatically where applicable and evaluate <strong>Committed Use Discounts<\/strong> for steady workloads (Compute Engine pricing model; verify current discount applicability).<\/li>\n<li>Reduce log volume:<\/li>\n<li>tune application logging levels<\/li>\n<li>apply log exclusions in Cloud Logging if appropriate<\/li>\n<li>Keep Artifact Registry in the same region as your compute fleet to minimize latency and cross-region egress.<\/li>\n<li>Prefer private instances behind a load balancer + Cloud NAT if you don\u2019t need per-VM public IPs.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Example low-cost starter estimate (no fabricated prices)<\/h3>\n\n\n\n<p>A minimal lab setup often includes:\n&#8211; 1 small VM instance (e.g., an E2-family small machine type)\n&#8211; 1 small boot disk\n&#8211; 1 firewall rule\n&#8211; Minimal internet egress (a few MB for testing)<\/p>\n\n\n\n<p>To estimate your real cost:\n1. Open the Pricing Calculator: https:\/\/cloud.google.com\/products\/calculator<br\/>\n2. Add \u201cCompute Engine\u201d.\n3. Select your region, machine type, usage hours (e.g., a few hours), disk type\/size.\n4. Add expected internet egress (even small amounts).\nBecause compute and network pricing are region-dependent, <strong>do not rely on a single universal number<\/strong>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Example production cost considerations<\/h3>\n\n\n\n<p>A typical production pattern (MIG + load balancer) adds:\n&#8211; Multiple instances across zones (or regional MIG)\n&#8211; Load balancer components (forwarding rules, proxies, backend service)\n&#8211; Health checks\n&#8211; Higher log and metric volume\n&#8211; Potential Cloud Armor usage\n&#8211; More egress volume<\/p>\n\n\n\n<p>Use the calculator with:\n&#8211; your steady-state instance count\n&#8211; expected peak scaling\n&#8211; expected requests\/GB egress\n&#8211; log volume (if you can estimate)<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">10. Step-by-Step Hands-On Tutorial<\/h2>\n\n\n\n<p>This lab deploys a real container to a Compute Engine VM running Container-Optimized OS and exposes it over HTTP for quick validation.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Objective<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Create a Compute Engine VM that uses <strong>Container-Optimized OS<\/strong> and automatically runs an <code>nginx<\/code> container.<\/li>\n<li>Allow inbound HTTP traffic.<\/li>\n<li>Validate the service.<\/li>\n<li>Clean up resources to avoid ongoing charges.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Lab Overview<\/h3>\n\n\n\n<p>You will:\n1. Set your project and enable the Compute Engine API.\n2. Create a firewall rule to allow inbound TCP port 80.\n3. Create a COS-based VM using <code>gcloud compute instances create-with-container<\/code>.\n4. Validate with <code>curl<\/code>.\n5. Troubleshoot common issues.\n6. Delete the VM and firewall rule.<\/p>\n\n\n\n<blockquote>\n<p>Why <code>create-with-container<\/code>?<br\/>\nIt\u2019s the most beginner-friendly way to run a container as the \u201cmain\u201d VM workload on Container-Optimized OS without building a custom image.<\/p>\n<\/blockquote>\n\n\n\n<h4 class=\"wp-block-heading\">Expected cost<\/h4>\n\n\n\n<p>Low, if you:\n&#8211; use a small VM\n&#8211; keep the lab running only briefly\n&#8211; generate minimal egress<br\/>\nAlways verify pricing for your region and account.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 1: Select a project, region, and enable the API<\/h3>\n\n\n\n<p>In <strong>Cloud Shell<\/strong>, run:<\/p>\n\n\n\n<pre><code class=\"language-bash\">gcloud auth list\ngcloud config list project\n<\/code><\/pre>\n\n\n\n<p>Set your project:<\/p>\n\n\n\n<pre><code class=\"language-bash\">export PROJECT_ID=\"YOUR_PROJECT_ID\"\ngcloud config set project \"${PROJECT_ID}\"\n<\/code><\/pre>\n\n\n\n<p>Pick a zone (example: <code>us-central1-a<\/code>). Choose one close to you:<\/p>\n\n\n\n<pre><code class=\"language-bash\">export ZONE=\"us-central1-a\"\ngcloud config set compute\/zone \"${ZONE}\"\n<\/code><\/pre>\n\n\n\n<p>Enable the Compute Engine API:<\/p>\n\n\n\n<pre><code class=\"language-bash\">gcloud services enable compute.googleapis.com\n<\/code><\/pre>\n\n\n\n<p><strong>Expected outcome<\/strong>\n&#8211; Compute Engine API is enabled.\n&#8211; Your <code>gcloud<\/code> default project and zone are set.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 2: Create a firewall rule to allow HTTP (port 80)<\/h3>\n\n\n\n<p>Create a firewall rule that allows inbound TCP:80 to instances with a specific network tag.<\/p>\n\n\n\n<pre><code class=\"language-bash\">gcloud compute firewall-rules create allow-http-80 \\\n  --direction=INGRESS \\\n  --priority=1000 \\\n  --network=default \\\n  --action=ALLOW \\\n  --rules=tcp:80 \\\n  --source-ranges=0.0.0.0\/0 \\\n  --target-tags=cos-http\n<\/code><\/pre>\n\n\n\n<p><strong>Expected outcome<\/strong>\n&#8211; A firewall rule named <code>allow-http-80<\/code> exists in your project.\n&#8211; Only instances tagged <code>cos-http<\/code> will be reachable on port 80.<\/p>\n\n\n\n<p><strong>Verification<\/strong><\/p>\n\n\n\n<pre><code class=\"language-bash\">gcloud compute firewall-rules describe allow-http-80 --format=\"value(name,network,direction,allowed[].IPProtocol,allowed[].ports)\"\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 3: Create a Container-Optimized OS VM that runs Nginx<\/h3>\n\n\n\n<p>Create a VM and specify a container image to run. This command will:\n&#8211; create the VM\n&#8211; use a COS-based container-VM workflow\n&#8211; start the container on boot<\/p>\n\n\n\n<pre><code class=\"language-bash\">export VM_NAME=\"cos-nginx-1\"\n\ngcloud compute instances create-with-container \"${VM_NAME}\" \\\n  --tags=cos-http \\\n  --machine-type=e2-micro \\\n  --container-image=nginx:stable\n<\/code><\/pre>\n\n\n\n<p>Notes:\n&#8211; <code>e2-micro<\/code> is a small machine type commonly used for labs, but availability and cost depend on region. If it fails due to quota or availability, try <code>e2-small<\/code>.\n&#8211; The container image is pulled from a public registry in this example. For production, prefer <strong>Artifact Registry<\/strong> with IAM-controlled access.<\/p>\n\n\n\n<p><strong>Expected outcome<\/strong>\n&#8211; A VM instance is created.\n&#8211; Nginx container starts automatically.\n&#8211; The VM has an external IP (by default, in the default VPC unless you changed defaults).<\/p>\n\n\n\n<p><strong>Verification<\/strong>\nDescribe the instance and capture its external IP:<\/p>\n\n\n\n<pre><code class=\"language-bash\">gcloud compute instances describe \"${VM_NAME}\" --format=\"get(networkInterfaces[0].accessConfigs[0].natIP)\"\n<\/code><\/pre>\n\n\n\n<p>Store it:<\/p>\n\n\n\n<pre><code class=\"language-bash\">export EXTERNAL_IP=\"$(gcloud compute instances describe \"${VM_NAME}\" --format=\"get(networkInterfaces[0].accessConfigs[0].natIP)\")\"\necho \"External IP: ${EXTERNAL_IP}\"\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 4: Test the web server from Cloud Shell<\/h3>\n\n\n\n<p>Run:<\/p>\n\n\n\n<pre><code class=\"language-bash\">curl -i \"http:\/\/${EXTERNAL_IP}\/\"\n<\/code><\/pre>\n\n\n\n<p><strong>Expected outcome<\/strong>\n&#8211; You receive an HTTP response (typically <code>HTTP\/1.1 200 OK<\/code>) and see the Nginx welcome HTML.<\/p>\n\n\n\n<p>If you want to see headers only:<\/p>\n\n\n\n<pre><code class=\"language-bash\">curl -I \"http:\/\/${EXTERNAL_IP}\/\"\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 5: Basic operational checks (instance + container)<\/h3>\n\n\n\n<p>Check instance status:<\/p>\n\n\n\n<pre><code class=\"language-bash\">gcloud compute instances describe \"${VM_NAME}\" --format=\"value(status)\"\n<\/code><\/pre>\n\n\n\n<p>If you need to SSH for deeper debugging:<\/p>\n\n\n\n<pre><code class=\"language-bash\">gcloud compute ssh \"${VM_NAME}\"\n<\/code><\/pre>\n\n\n\n<p>Once connected, you can inspect system logs with <code>journalctl<\/code> (available on many systemd-based systems). The exact unit names and container supervisor depend on the container-on-VM implementation. If you don\u2019t immediately see container logs, use:<\/p>\n\n\n\n<pre><code class=\"language-bash\">sudo journalctl --no-pager -n 200\n<\/code><\/pre>\n\n\n\n<p>If the container-on-VM workflow uses a dedicated service unit, you can list units and search:<\/p>\n\n\n\n<pre><code class=\"language-bash\">sudo systemctl list-units --type=service | head\nsudo systemctl list-units --type=service | grep -i -E \"container|konlet|docker|containerd\" || true\n<\/code><\/pre>\n\n\n\n<blockquote>\n<p>If you require a precise \u201cwhich service starts the container\u201d answer for your chosen image family, <strong>verify in the official Compute Engine containers documentation<\/strong>, because the underlying components and naming can evolve.<\/p>\n<\/blockquote>\n\n\n\n<p>Exit SSH:<\/p>\n\n\n\n<pre><code class=\"language-bash\">exit\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Validation<\/h3>\n\n\n\n<p>You have successfully validated that:\n&#8211; Container-Optimized OS can run a container workload on Compute Engine.\n&#8211; The workload is reachable over HTTP.\n&#8211; You can operate it using standard Compute Engine tooling.<\/p>\n\n\n\n<p>A quick final validation summary:<\/p>\n\n\n\n<pre><code class=\"language-bash\">echo \"VM: ${VM_NAME}\"\necho \"IP: ${EXTERNAL_IP}\"\ncurl -I \"http:\/\/${EXTERNAL_IP}\/\" | head -n 1\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Troubleshooting<\/h3>\n\n\n\n<h4 class=\"wp-block-heading\">Issue: <code>curl<\/code> times out \/ cannot connect<\/h4>\n\n\n\n<p>Common causes and fixes:\n1. <strong>Firewall rule missing or wrong tag<\/strong>\n   &#8211; Ensure the VM has the tag <code>cos-http<\/code>:\n     <code>bash\n     gcloud compute instances describe \"${VM_NAME}\" --format=\"value(tags.items)\"<\/code>\n   &#8211; Ensure firewall rule targets that tag:\n     <code>bash\n     gcloud compute firewall-rules describe allow-http-80 --format=\"value(targetTags)\"<\/code><\/p>\n\n\n\n<ol class=\"wp-block-list\" start=\"2\">\n<li>\n<p><strong>Wrong IP<\/strong>\n   &#8211; Re-check the external IP:\n     <code>bash\n     gcloud compute instances describe \"${VM_NAME}\" --format=\"get(networkInterfaces[0].accessConfigs[0].natIP)\"<\/code><\/p>\n<\/li>\n<li>\n<p><strong>Container not running<\/strong>\n   &#8211; SSH in and inspect logs (<code>journalctl<\/code>) as shown above.\n   &#8211; Recreate the instance if needed (in immutable style, replacing is often faster than deep repair).<\/p>\n<\/li>\n<\/ol>\n\n\n\n<h4 class=\"wp-block-heading\">Issue: <code>create-with-container<\/code> fails with permissions error<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Ensure you have permissions to create instances.<\/li>\n<li>In managed orgs, Organization Policy may block external IPs or public firewall rules. If so:<\/li>\n<li>Use an internal load balancer \/ private access patterns<\/li>\n<li>Or request policy exceptions in a sandbox project<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">Issue: Machine type not available \/ quota exceeded<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Try a different zone:\n  <code>bash\n  gcloud compute zones list --filter=\"region:(us-central1)\" --format=\"value(name)\"<\/code><\/li>\n<li>Try a different machine type (e.g., <code>e2-small<\/code>).<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Cleanup<\/h3>\n\n\n\n<p>Delete the VM:<\/p>\n\n\n\n<pre><code class=\"language-bash\">gcloud compute instances delete \"${VM_NAME}\" --quiet\n<\/code><\/pre>\n\n\n\n<p>Delete the firewall rule:<\/p>\n\n\n\n<pre><code class=\"language-bash\">gcloud compute firewall-rules delete allow-http-80 --quiet\n<\/code><\/pre>\n\n\n\n<p>Verify cleanup:<\/p>\n\n\n\n<pre><code class=\"language-bash\">gcloud compute instances list --filter=\"name=${VM_NAME}\"\ngcloud compute firewall-rules list --filter=\"name=allow-http-80\"\n<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">11. Best Practices<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Architecture best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Prefer Managed Instance Groups for production<\/strong>:<\/li>\n<li>Enables rolling updates and autohealing.<\/li>\n<li>Makes \u201creplace instances\u201d the standard remediation.<\/li>\n<li><strong>Externalize state<\/strong>:<\/li>\n<li>Store data in managed services (Cloud SQL, Spanner, Firestore) or persistent disks designed for that purpose.<\/li>\n<li>Keep COS VMs as stateless as possible.<\/li>\n<li><strong>Use load balancers instead of per-VM public IPs<\/strong>:<\/li>\n<li>Better security posture and easier TLS, health checks, and scaling.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">IAM and security best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Use dedicated service accounts<\/strong> per workload and grant least privilege.<\/li>\n<li><strong>Use OS Login<\/strong> (and ideally IAP for SSH) to avoid unmanaged SSH keys.<\/li>\n<li><strong>Restrict firewall rules<\/strong>:<\/li>\n<li>Avoid <code>0.0.0.0\/0<\/code> unless necessary.<\/li>\n<li>Limit inbound ports; default deny.<\/li>\n<li><strong>Pin and verify container images<\/strong>:<\/li>\n<li>Prefer immutable image references (digests) in production rather than mutable tags like <code>latest<\/code>.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Cost best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Right-size aggressively; measure real CPU\/memory usage.<\/li>\n<li>Use autoscaling with MIGs for variable workloads.<\/li>\n<li>Minimize egress and cross-region pulls (keep Artifact Registry close to compute).<\/li>\n<li>Control log volume; implement log exclusions where appropriate.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Performance best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Keep container images small; optimize layer caching.<\/li>\n<li>Use regional placement to reduce latency to dependencies.<\/li>\n<li>Ensure health checks are representative and not overly expensive.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Reliability best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Run across multiple zones (regional MIG) for high availability.<\/li>\n<li>Use load balancer health checks and autohealing.<\/li>\n<li>Design for instance replacement during updates and failures.<\/li>\n<li>Implement graceful shutdown in your application so rolling updates don\u2019t drop requests.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Operations best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Standardize instance templates and use labels for ownership and environment (<code>env=prod<\/code>, <code>team=payments<\/code>).<\/li>\n<li>Maintain a documented rollout process (update template, rolling update parameters, rollback).<\/li>\n<li>Keep a break-glass procedure for emergency access that is auditable and time-bound.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Governance\/tagging\/naming best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Consistent naming:<\/li>\n<li><code>svc-env-region-role-###<\/code> (example: <code>api-prod-uscentral1-web-001<\/code>)<\/li>\n<li>Use labels for:<\/li>\n<li>cost center<\/li>\n<li>data sensitivity tier<\/li>\n<li>owner\/oncall<\/li>\n<li>Track COS image family\/version and container image digest in deployment records.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">12. Security Considerations<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Identity and access model<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>IAM<\/strong> governs:<\/li>\n<li>who can create\/modify\/delete instances, templates, firewall rules<\/li>\n<li>who can attach service accounts and what scopes\/permissions workloads get<\/li>\n<li><strong>Service accounts<\/strong> are the recommended way for apps to access Google Cloud APIs.<\/li>\n<li><strong>OS Login<\/strong> integrates Linux account access with IAM and helps centralize auditability.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Encryption<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>At rest<\/strong>: Compute Engine disks are encrypted by default. For stricter requirements, consider CMEK (customer-managed keys) for disks (verify the current CMEK support and configuration in Compute Engine docs).<\/li>\n<li><strong>In transit<\/strong>: Use TLS termination at the load balancer or in the container. Prefer managed certificates and modern TLS policies where applicable.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Network exposure<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Avoid giving every instance a public IP in production.<\/li>\n<li>Use:<\/li>\n<li>External HTTP(S) Load Balancer (public entry)<\/li>\n<li>Private instances in subnets<\/li>\n<li>Cloud NAT for outbound access without inbound exposure<\/li>\n<li>Use firewall rules with least exposure and target tags\/service accounts.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Secrets handling<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Avoid storing secrets in:<\/li>\n<li>instance metadata<\/li>\n<li>container image layers<\/li>\n<li>source control<\/li>\n<li>Prefer <strong>Secret Manager<\/strong> and fetch secrets at runtime using the VM\u2019s service account identity.<\/li>\n<li>Rotate secrets and use short-lived credentials where possible.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Audit\/logging<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use <strong>Cloud Audit Logs<\/strong> for administrative actions (instance creation, firewall changes, IAM changes).<\/li>\n<li>Ensure you can attribute:<\/li>\n<li>who deployed a new container version<\/li>\n<li>who changed network exposure<\/li>\n<li>who accessed instances (OS Login + IAP logs, where used)<\/li>\n<li>For workload logs:<\/li>\n<li>centralize to Cloud Logging (agent\/collection method depends on your approach; verify the recommended method for COS).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Compliance considerations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>COS can support secure operations, but compliance depends on:<\/li>\n<li>your configuration<\/li>\n<li>identity controls<\/li>\n<li>logging\/retention<\/li>\n<li>vulnerability management<\/li>\n<li>network boundaries<br\/>\nAlways validate requirements against Google Cloud compliance documentation and your auditor\u2019s needs.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Common security mistakes<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Leaving SSH open to the internet with weak key management.<\/li>\n<li>Using wide firewall rules (<code>0.0.0.0\/0<\/code>) for admin ports.<\/li>\n<li>Running containers as root unnecessarily.<\/li>\n<li>Pulling public images without provenance checks.<\/li>\n<li>Using mutable tags (<code>latest<\/code>) in production.<\/li>\n<li>Treating COS hosts as \u201cpet servers\u201d and making manual changes that aren\u2019t reproducible.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Secure deployment recommendations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Prefer MIG + LB architecture.<\/li>\n<li>Use private images in Artifact Registry with IAM.<\/li>\n<li>Use image scanning\/provenance in your CI pipeline (verify your chosen tooling).<\/li>\n<li>Restrict metadata exposure and avoid sensitive data in metadata.<\/li>\n<li>Implement runtime security controls in the application and container configuration (non-root user, read-only filesystem in container where possible).<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">13. Limitations and Gotchas<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Known limitations \/ design constraints<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Not a general-purpose Linux distro<\/strong>: package installation and customization are intentionally limited.<\/li>\n<li><strong>Debugging friction<\/strong>: fewer built-in tools; you may need \u201cdebug containers\u201d or dedicated debugging workflows.<\/li>\n<li><strong>Host persistence model differs<\/strong>: immutable root patterns mean some changes won\u2019t persist or are discouraged.<\/li>\n<li><strong>Multi-container orchestration is limited<\/strong> without Kubernetes or additional tooling: the simplest workflows assume \u201cone main container per VM\u201d or require you to build your own supervisor approach.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Quotas and scaling constraints<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>vCPU quota and IP quota can block scaling.<\/li>\n<li>Load balancing quotas can surprise teams when moving to production patterns.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Regional constraints<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Some machine types and accelerators are zone-specific.<\/li>\n<li>Keep Artifact Registry and compute in compatible regions to avoid latency\/egress.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Pricing surprises<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Internet egress for public services.<\/li>\n<li>Log ingestion volume from chatty containers.<\/li>\n<li>External IP charges depending on usage type (verify current billing rules).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Compatibility issues<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Some third-party monitoring\/security agents assume they can install packages or write broadly to the filesystem.<\/li>\n<li>Kernel module requirements can be tricky; verify whether your workload needs specific kernel modules\/drivers.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Operational gotchas<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>If you treat instances as mutable pets, you\u2019ll fight the platform.<\/li>\n<li>Updates\/reboots must be planned for (MIG rolling updates help).<\/li>\n<li>Container image pull failures (auth, network) can cause instances to come up \u201chealthy VM but unhealthy app.\u201d<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Migration challenges<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Moving from Ubuntu to COS may require:<\/li>\n<li>rebuilding host-installed software into container images<\/li>\n<li>redesigning log collection<\/li>\n<li>changing SSH\/debug habits<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Vendor-specific nuances<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>COS is deeply integrated with Google Cloud\u2019s Compute Engine model. If you need portability across clouds at the VM OS level, consider whether a more generic OS (or Kubernetes) is a better abstraction.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">14. Comparison with Alternatives<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">In Google Cloud (nearest options)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Ubuntu\/Debian on Compute Engine<\/strong>: flexible general-purpose OS, more host maintenance.<\/li>\n<li><strong>GKE Standard \/ Autopilot<\/strong>: managed Kubernetes; more features for orchestration and scale, but more platform complexity.<\/li>\n<li><strong>Cloud Run<\/strong>: serverless containers; simplest ops model but less VM-level control and some workload constraints.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">In other clouds (nearest conceptual peers)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>AWS Bottlerocket<\/strong>: container-optimized OS for ECS\/EKS.<\/li>\n<li><strong>Azure Linux \/ CBL-Mariner-based container host patterns<\/strong>: Microsoft has container host OS patterns; exact product choices vary\u2014verify current Azure recommendations.<\/li>\n<li><strong>Self-managed minimal OS<\/strong>: Fedora CoreOS, Flatcar, etc., when you want an immutable OS with different ecosystem tradeoffs.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Comparison table<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Option<\/th>\n<th>Best For<\/th>\n<th>Strengths<\/th>\n<th>Weaknesses<\/th>\n<th>When to Choose<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>Container-Optimized OS (Google Cloud)<\/strong><\/td>\n<td>Container workloads on Compute Engine VMs<\/td>\n<td>Minimal\/hardened host, designed for containers, good for MIG patterns<\/td>\n<td>Limited host customization, different debugging model<\/td>\n<td>You want containers on VMs with strong baseline and low OS toil<\/td>\n<\/tr>\n<tr>\n<td><strong>Ubuntu\/Debian on Compute Engine<\/strong><\/td>\n<td>Mixed workloads, custom agents, traditional VM ops<\/td>\n<td>Familiar tooling, package managers, broad compatibility<\/td>\n<td>Larger attack surface, more patching\/drift risk<\/td>\n<td>You need broad OS flexibility or legacy software<\/td>\n<\/tr>\n<tr>\n<td><strong>GKE Standard<\/strong><\/td>\n<td>Kubernetes-managed container platforms<\/td>\n<td>Rich orchestration, scaling, service discovery, policies<\/td>\n<td>Kubernetes operational overhead (though managed)<\/td>\n<td>You have multiple services and want Kubernetes features<\/td>\n<\/tr>\n<tr>\n<td><strong>GKE Autopilot<\/strong><\/td>\n<td>\u201cKubernetes with less ops\u201d<\/td>\n<td>Less node management, opinionated best practices<\/td>\n<td>Less infrastructure control, different cost model<\/td>\n<td>You want Kubernetes but don\u2019t want to manage nodes<\/td>\n<\/tr>\n<tr>\n<td><strong>Cloud Run<\/strong><\/td>\n<td>Stateless HTTP services and event-driven containers<\/td>\n<td>Very low ops, fast deploys, scale to zero<\/td>\n<td>Platform constraints (request\/response model, execution limits), less network control<\/td>\n<td>You want serverless simplicity and fit the model<\/td>\n<\/tr>\n<tr>\n<td><strong>AWS Bottlerocket (AWS)<\/strong><\/td>\n<td>Container hosts in AWS<\/td>\n<td>Minimal immutable OS for containers<\/td>\n<td>Different cloud, different integrations<\/td>\n<td>Multi-cloud comparison; choose if you\u2019re on AWS<\/td>\n<\/tr>\n<tr>\n<td><strong>Fedora CoreOS \/ Flatcar (self-managed)<\/strong><\/td>\n<td>Immutable OS approach with broader control<\/td>\n<td>Strong immutability story, flexible environments<\/td>\n<td>You manage lifecycle and integration<\/td>\n<td>You need an immutable OS but prefer non-cloud-vendor images<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">15. Real-World Example<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Enterprise example: Secure API fleet on Compute Engine with controlled rollout<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: A large enterprise has strict security requirements and wants to reduce VM drift. They run containerized APIs that must integrate with existing VPCs, shared load balancers, and IAM.<\/li>\n<li><strong>Proposed architecture<\/strong>:<\/li>\n<li>Artifact Registry for private images<\/li>\n<li>Cloud Build pipeline builds and signs images (signing approach depends on chosen tooling; verify)<\/li>\n<li>Regional MIG of COS instances using an instance template that references a pinned COS image family\/version<\/li>\n<li>External HTTP(S) Load Balancer + Cloud Armor in front<\/li>\n<li>Workloads use service accounts to access Pub\/Sub and Cloud SQL<\/li>\n<li>Centralized logging\/monitoring with alerting tied to SLOs<\/li>\n<li><strong>Why COS was chosen<\/strong>:<\/li>\n<li>Minimal host OS reduces attack surface and drift<\/li>\n<li>Automated updates fit enterprise patching goals when paired with MIG rolling updates<\/li>\n<li>Clear separation: \u201chost is appliance, app is container\u201d<\/li>\n<li><strong>Expected outcomes<\/strong>:<\/li>\n<li>Faster, safer rollouts (template update \u2192 rolling update)<\/li>\n<li>Reduced OS vulnerabilities window and fewer manual patch cycles<\/li>\n<li>Consistent baseline across environments<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Startup\/small-team example: Simple container hosting without Kubernetes<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: A startup needs a reliable service host for one API and one worker, but Kubernetes is too heavy for current team size.<\/li>\n<li><strong>Proposed architecture<\/strong>:<\/li>\n<li>COS VM(s) running container workloads<\/li>\n<li>A small MIG for the API behind a load balancer<\/li>\n<li>Worker service on a separate MIG without public ingress<\/li>\n<li>Artifact Registry for images<\/li>\n<li>Secret Manager for API keys<\/li>\n<li><strong>Why COS was chosen<\/strong>:<\/li>\n<li>Reduced ops compared to Ubuntu patch management<\/li>\n<li>Easier than Kubernetes while still enabling immutable deployments<\/li>\n<li><strong>Expected outcomes<\/strong>:<\/li>\n<li>Simple deploy pipeline: build image \u2192 update template \u2192 roll<\/li>\n<li>Lower operational burden and predictable environment<\/li>\n<li>A clear growth path to GKE later if needed<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">16. FAQ<\/h2>\n\n\n\n<p>1) <strong>Is Container-Optimized OS a separate billed product?<\/strong><br\/>\nNo. You pay for the Compute Engine resources (VMs, disks, network, etc.). COS is an OS image you choose for instances.<\/p>\n\n\n\n<p>2) <strong>Can I SSH into a COS VM?<\/strong><br\/>\nYes, you can SSH like other Compute Engine VMs (subject to IAM and network controls). Use OS Login\/IAP where possible for better security.<\/p>\n\n\n\n<p>3) <strong>Can I install packages with <code>apt<\/code> or <code>yum<\/code>?<\/strong><br\/>\nTypically, no\u2014COS is not meant to be managed like a general-purpose distro. Put dependencies in container images instead.<\/p>\n\n\n\n<p>4) <strong>How do OS updates work?<\/strong><br\/>\nCOS is designed to receive automated updates. For production, plan for reboots and use MIG rolling updates. Verify current update controls\/channels in official docs.<\/p>\n\n\n\n<p>5) <strong>Is COS only for a single container per VM?<\/strong><br\/>\nMany common workflows assume one primary container, but you can run additional containers depending on your chosen approach. If you need multi-container orchestration with service discovery and rollouts, consider GKE.<\/p>\n\n\n\n<p>6) <strong>Should I use COS or GKE?<\/strong><br\/>\nUse COS on Compute Engine when you want VM-based control and simpler operations for a smaller set of services. Use GKE when you need Kubernetes orchestration features, multi-service scheduling, and Kubernetes-native policies.<\/p>\n\n\n\n<p>7) <strong>Does COS work with Managed Instance Groups?<\/strong><br\/>\nYes. COS is often used with MIGs for autohealing, autoscaling, and rolling updates.<\/p>\n\n\n\n<p>8) <strong>How do I pull private images from Artifact Registry?<\/strong><br\/>\nAttach a service account to the VM with Artifact Registry read permissions and ensure network access to the registry endpoint. Verify the exact required IAM role(s) in Artifact Registry docs.<\/p>\n\n\n\n<p>9) <strong>Where should I store secrets for COS workloads?<\/strong><br\/>\nUse Secret Manager and fetch secrets at runtime using the VM\u2019s service account identity. Avoid embedding secrets in metadata or images.<\/p>\n\n\n\n<p>10) <strong>How do I handle persistent storage?<\/strong><br\/>\nPrefer managed services. If you must persist files, use persistent disks or other Google Cloud storage products. Design carefully so instance replacement does not lose state.<\/p>\n\n\n\n<p>11) <strong>Is COS \u201cmore secure\u201d than Ubuntu by default?<\/strong><br\/>\nIt\u2019s designed with a smaller footprint and hardened patterns, which can reduce attack surface. Security still depends heavily on your container image, IAM, network exposure, and operational practices.<\/p>\n\n\n\n<p>12) <strong>Can I run non-container workloads on COS?<\/strong><br\/>\nCOS is intended for containers. If you need general-purpose workloads or host-installed software, use a general-purpose OS image.<\/p>\n\n\n\n<p>13) <strong>How do I do blue\/green deployments with COS?<\/strong><br\/>\nCommonly: create a new instance template (new container image digest), roll a new MIG or update an existing MIG with controlled rollout, and switch traffic via load balancer backends.<\/p>\n\n\n\n<p>14) <strong>How do I observe container logs and metrics?<\/strong><br\/>\nUse Cloud Logging\/Monitoring. The exact agent\/collection method depends on COS support and your chosen approach. Verify the current recommended method in official docs.<\/p>\n\n\n\n<p>15) <strong>What\u2019s the difference between COS on Compute Engine and COS as GKE node image?<\/strong><br\/>\nOn Compute Engine, you manage the VM lifecycle and container startup method. On GKE, Google (or you, depending on mode) manages nodes and Kubernetes orchestrates containers.<\/p>\n\n\n\n<p>16) <strong>Can I use COS for highly regulated environments?<\/strong><br\/>\nPossibly, but you must validate the entire system (IAM, logging, encryption, network boundaries, patching processes) against your compliance framework. Don\u2019t assume compliance from OS choice alone.<\/p>\n\n\n\n<p>17) <strong>Do I need a public IP for a COS VM?<\/strong><br\/>\nNo. Many production designs use private VMs behind a load balancer, and use Cloud NAT for outbound access.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">17. Top Online Resources to Learn Container-Optimized OS<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Resource Type<\/th>\n<th>Name<\/th>\n<th>Why It Is Useful<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Official documentation<\/td>\n<td>Container-Optimized OS docs \u2014 https:\/\/cloud.google.com\/container-optimized-os\/docs<\/td>\n<td>Primary source for COS concepts, images, security model, and operations<\/td>\n<\/tr>\n<tr>\n<td>Official release notes<\/td>\n<td>Container-Optimized OS release notes \u2014 https:\/\/cloud.google.com\/container-optimized-os\/docs\/release-notes<\/td>\n<td>Track security fixes, version changes, and behavioral updates<\/td>\n<\/tr>\n<tr>\n<td>Official Compute Engine containers guide<\/td>\n<td>Deploying containers on VMs (Compute Engine) \u2014 https:\/\/cloud.google.com\/compute\/docs\/containers<\/td>\n<td>Authoritative guide for <code>create-with-container<\/code> and container declaration patterns<\/td>\n<\/tr>\n<tr>\n<td>Official pricing<\/td>\n<td>Compute Engine pricing \u2014 https:\/\/cloud.google.com\/compute\/pricing<\/td>\n<td>COS cost is primarily Compute Engine cost; this is the base pricing reference<\/td>\n<\/tr>\n<tr>\n<td>Pricing calculator<\/td>\n<td>Google Cloud Pricing Calculator \u2014 https:\/\/cloud.google.com\/products\/calculator<\/td>\n<td>Build a region-specific estimate including disks, egress, and load balancing<\/td>\n<\/tr>\n<tr>\n<td>Architecture guidance<\/td>\n<td>Google Cloud Architecture Center \u2014 https:\/\/cloud.google.com\/architecture<\/td>\n<td>Patterns for MIGs, load balancing, security, and operations<\/td>\n<\/tr>\n<tr>\n<td>Observability<\/td>\n<td>Cloud Operations suite docs \u2014 https:\/\/cloud.google.com\/products\/operations<\/td>\n<td>Logging\/Monitoring patterns that apply to VM\/container architectures<\/td>\n<\/tr>\n<tr>\n<td>Container image registry<\/td>\n<td>Artifact Registry docs \u2014 https:\/\/cloud.google.com\/artifact-registry\/docs<\/td>\n<td>Secure private image storage and IAM-controlled access<\/td>\n<\/tr>\n<tr>\n<td>Security\/IAM<\/td>\n<td>IAM overview \u2014 https:\/\/cloud.google.com\/iam\/docs\/overview<\/td>\n<td>Correct identity model for VM\/container workloads<\/td>\n<\/tr>\n<tr>\n<td>Tutorials (official)<\/td>\n<td>Compute Engine tutorials \u2014 https:\/\/cloud.google.com\/compute\/docs\/tutorials<\/td>\n<td>VM patterns that often pair well with COS (MIGs, LBs, networking)<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">18. Training and Certification Providers<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Institute<\/th>\n<th>Suitable Audience<\/th>\n<th>Likely Learning Focus<\/th>\n<th>Mode<\/th>\n<th>Website URL<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>DevOpsSchool.com<\/td>\n<td>DevOps engineers, SREs, platform teams<\/td>\n<td>DevOps tooling, cloud operations, CI\/CD, container operations<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.devopsschool.com\/<\/td>\n<\/tr>\n<tr>\n<td>ScmGalaxy.com<\/td>\n<td>Beginners to intermediate DevOps learners<\/td>\n<td>SCM, DevOps fundamentals, build\/release practices<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.scmgalaxy.com\/<\/td>\n<\/tr>\n<tr>\n<td>CLoudOpsNow.in<\/td>\n<td>Cloud ops practitioners<\/td>\n<td>Cloud operations, monitoring, automation<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.cloudopsnow.in\/<\/td>\n<\/tr>\n<tr>\n<td>SreSchool.com<\/td>\n<td>SREs and reliability-focused engineers<\/td>\n<td>SRE practices, reliability engineering, observability<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.sreschool.com\/<\/td>\n<\/tr>\n<tr>\n<td>AiOpsSchool.com<\/td>\n<td>Ops teams exploring AIOps<\/td>\n<td>AIOps concepts, automation, monitoring analytics<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.aiopsschool.com\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">19. Top Trainers<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Platform\/Site<\/th>\n<th>Likely Specialization<\/th>\n<th>Suitable Audience<\/th>\n<th>Website URL<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>RajeshKumar.xyz<\/td>\n<td>DevOps\/Cloud training content (verify offering)<\/td>\n<td>Individuals and teams seeking guided learning<\/td>\n<td>https:\/\/www.rajeshkumar.xyz\/<\/td>\n<\/tr>\n<tr>\n<td>devopstrainer.in<\/td>\n<td>DevOps training (verify course catalog)<\/td>\n<td>Beginners to advanced DevOps learners<\/td>\n<td>https:\/\/www.devopstrainer.in\/<\/td>\n<\/tr>\n<tr>\n<td>devopsfreelancer.com<\/td>\n<td>Freelance DevOps guidance\/services (treat as a resource platform unless verified)<\/td>\n<td>Teams needing short-term expert help<\/td>\n<td>https:\/\/www.devopsfreelancer.com\/<\/td>\n<\/tr>\n<tr>\n<td>devopssupport.in<\/td>\n<td>DevOps support\/training resources (verify scope)<\/td>\n<td>Engineers needing practical support<\/td>\n<td>https:\/\/www.devopssupport.in\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">20. Top Consulting Companies<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Company Name<\/th>\n<th>Likely Service Area<\/th>\n<th>Where They May Help<\/th>\n<th>Consulting Use Case Examples<\/th>\n<th>Website URL<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>cotocus.com<\/td>\n<td>DevOps and cloud consulting (verify exact offerings)<\/td>\n<td>Cloud migration, CI\/CD, infrastructure automation<\/td>\n<td>COS-based MIG design, secure container hosting on Compute Engine, rollout\/rollback automation<\/td>\n<td>https:\/\/www.cotocus.com\/<\/td>\n<\/tr>\n<tr>\n<td>DevOpsSchool.com<\/td>\n<td>DevOps consulting and enablement<\/td>\n<td>Platform engineering, training + implementation<\/td>\n<td>Designing container-on-VM reference architectures, setting up Artifact Registry + CI pipelines<\/td>\n<td>https:\/\/www.devopsschool.com\/<\/td>\n<\/tr>\n<tr>\n<td>DEVOPSCONSULTING.IN<\/td>\n<td>DevOps consulting (verify exact offerings)<\/td>\n<td>Operational readiness, automation, reliability practices<\/td>\n<td>MIG + load balancer production setup, logging\/monitoring baseline, security review<\/td>\n<td>https:\/\/www.devopsconsulting.in\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">21. Career and Learning Roadmap<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">What to learn before Container-Optimized OS<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Google Cloud fundamentals:<\/li>\n<li>projects, billing, IAM, service accounts<\/li>\n<li>VPC networking and firewall rules<\/li>\n<li>Compute Engine basics:<\/li>\n<li>instances, images, disks<\/li>\n<li>instance templates and MIGs (recommended)<\/li>\n<li>Containers fundamentals:<\/li>\n<li>Docker\/OCI images, registries<\/li>\n<li>container networking and ports<\/li>\n<li>basic security (non-root, minimal images)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">What to learn after Container-Optimized OS<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Production architectures:<\/li>\n<li>external HTTP(S) load balancing<\/li>\n<li>Cloud Armor basics<\/li>\n<li>multi-zone design and SLOs<\/li>\n<li>CI\/CD and supply chain:<\/li>\n<li>Cloud Build or other CI<\/li>\n<li>Artifact Registry permissions and lifecycle policies<\/li>\n<li>vulnerability scanning and provenance (verify your selected tooling)<\/li>\n<li>Kubernetes (optional but common next step):<\/li>\n<li>GKE Standard\/Autopilot<\/li>\n<li>deployment strategies, services, ingress, policies<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Job roles that use it<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Cloud engineer (Compute Engine + container hosting)<\/li>\n<li>DevOps engineer \/ platform engineer<\/li>\n<li>SRE (especially VM fleet operations)<\/li>\n<li>Security engineer (hardened baseline, workload identity, network boundaries)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Certification path (if available)<\/h3>\n\n\n\n<p>There is no \u201cContainer-Optimized OS certification\u201d specifically. Relevant Google Cloud certifications typically include:\n&#8211; Associate Cloud Engineer\n&#8211; Professional Cloud Architect\n&#8211; Professional Cloud DevOps Engineer<br\/>\nVerify current certification names and requirements: https:\/\/cloud.google.com\/learn\/certification<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Project ideas for practice<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Build a small service and deploy it to a COS MIG behind a load balancer.<\/li>\n<li>Implement blue\/green via two MIGs and controlled traffic switching.<\/li>\n<li>Store images in Artifact Registry and restrict access via service accounts.<\/li>\n<li>Implement Secret Manager integration and rotate secrets.<\/li>\n<li>Add Cloud Monitoring alerts on HTTP error rate and latency.<\/li>\n<li>Create a cost dashboard using labels for team\/environment.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">22. Glossary<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Artifact Registry<\/strong>: Google Cloud service to store container images and other artifacts with IAM-based access control.<\/li>\n<li><strong>Compute Engine<\/strong>: Google Cloud\u2019s IaaS VM service.<\/li>\n<li><strong>Container image<\/strong>: A packaged filesystem and metadata used to run a container (OCI\/Docker format).<\/li>\n<li><strong>Container runtime<\/strong>: Software that runs containers on a host (commonly <code>containerd<\/code>; Docker Engine historically in some contexts\u2014verify for your COS image).<\/li>\n<li><strong>COS<\/strong>: Common abbreviation for Container-Optimized OS.<\/li>\n<li><strong>Firewall rule (VPC)<\/strong>: Network rule controlling allowed\/denied traffic to VM instances.<\/li>\n<li><strong>IAM<\/strong>: Identity and Access Management, controls permissions in Google Cloud.<\/li>\n<li><strong>Instance template<\/strong>: A reusable VM configuration used by Managed Instance Groups.<\/li>\n<li><strong>Managed Instance Group (MIG)<\/strong>: A group of identical VMs managed as a single entity for scaling, autohealing, and rolling updates.<\/li>\n<li><strong>OS Login<\/strong>: IAM-integrated method for managing SSH access to VMs.<\/li>\n<li><strong>Service account<\/strong>: A Google identity used by workloads to access Google Cloud APIs.<\/li>\n<li><strong>Shielded VM<\/strong>: Compute Engine features for protecting against boot-level and rootkit attacks (verify COS compatibility and best practices).<\/li>\n<li><strong>VPC<\/strong>: Virtual Private Cloud network in Google Cloud.<\/li>\n<li><strong>Workload identity (VM)<\/strong>: Using a VM\u2019s service account credentials to access Google Cloud APIs without static keys.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">23. Summary<\/h2>\n\n\n\n<p>Container-Optimized OS is a Google-managed, container-focused operating system image for <strong>Google Cloud Compute Engine<\/strong>. It matters because it reduces OS maintenance overhead, limits host attack surface, and aligns well with immutable infrastructure practices\u2014especially when combined with <strong>Managed Instance Groups<\/strong> for rolling updates and autohealing.<\/p>\n\n\n\n<p>Cost-wise, COS itself is not a separate billed service; your spend is driven by <strong>Compute Engine VM runtime, disks, networking (especially egress), load balancing, and observability<\/strong>. Security-wise, COS helps by providing a minimal and hardened baseline, but real security still depends on <strong>IAM least privilege, firewall design, image provenance, secrets handling, and logging\/auditing<\/strong>.<\/p>\n\n\n\n<p>Use Container-Optimized OS when you want to run containers on VMs with a strong baseline and straightforward operations. If you need full orchestration and Kubernetes-native features, plan for <strong>GKE<\/strong>; if you want maximum simplicity and your workload fits, consider <strong>Cloud Run<\/strong>.<\/p>\n\n\n\n<p>Next step: take the lab further by putting your COS instances into a <strong>Managed Instance Group<\/strong> behind an <strong>HTTP(S) load balancer<\/strong>, using <strong>Artifact Registry<\/strong> (private images) and <strong>Secret Manager<\/strong> (runtime secrets).<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Compute<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[26,51],"tags":[],"class_list":["post-628","post","type-post","status-publish","format-standard","hentry","category-compute","category-google-cloud"],"_links":{"self":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/posts\/628","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/comments?post=628"}],"version-history":[{"count":0,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/posts\/628\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/media?parent=628"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/categories?post=628"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/tags?post=628"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}