Category
Compute
1. Introduction
What this service is
Container-Optimized OS is a Google-managed operating system image for Google Cloud Compute Engine virtual machines (VMs) that is specifically designed to run containers securely and efficiently.
Simple explanation (one paragraph)
If you want to run containers on a VM in Google Cloud without managing a general-purpose Linux distribution (packages, frequent configuration drift, large attack surface), Container-Optimized OS gives you a minimal OS that boots fast, stays locked down, and is tuned for container workloads.
Technical explanation (one paragraph)
Container-Optimized OS (often abbreviated as COS) is a hardened, minimal OS image maintained by Google, based on Chromium OS concepts (immutable / read-only root filesystem, verified boot design patterns, and automatic updates). It’s intended to be used as the host OS for container runtimes (commonly containerd, and in some contexts Docker compatibility—verify current runtime options in the official docs). COS integrates naturally with Compute Engine features (instance metadata, Managed Instance Groups, load balancing, service accounts, VPC networking) and is also a common node OS choice for Google Kubernetes Engine (GKE) node images (for example, COS variants used with containerd).
What problem it solves
It solves the “VM host management tax” for container hosting: OS patching risk, drift across fleets, oversized base images, inconsistent security baselines, and operational toil when you only need a stable host to run containers.
2. What is Container-Optimized OS?
Official purpose
Container-Optimized OS is designed by Google to provide a secure, efficient, and maintainable host environment for running containers on Compute Engine.
Core capabilities – Run containerized workloads on Compute Engine VMs with a minimal host OS footprint. – Reduce host attack surface compared to a general-purpose Linux OS. – Provide automated updates and a consistent base image across fleets. – Support container-focused deployment patterns (for example, “run a container as the VM workload” via Compute Engine’s container-on-VM workflows).
Major components (conceptual)
– Minimal OS userland: fewer packages/tools than a general-purpose distro.
– Hardened/immutable design: read-only root filesystem patterns help reduce drift and persistence of unwanted changes.
– Container runtime support: commonly containerd (and sometimes Docker-related tooling depending on the image family and use case—verify current details).
– Update system: designed for automated, reliable OS updates.
– Compute Engine integration points: instance metadata, startup configuration patterns, logging/monitoring integration paths, and compatibility with fleet constructs like Managed Instance Groups (MIGs).
Service type
Container-Optimized OS is an operating system image provided by Google Cloud for Compute Engine. It is not a separate hosted “service” with its own control plane; you select it as the boot disk image for VMs (or implicitly via workflows that create COS-based instances).
Scope (how it’s “scoped” in Google Cloud)
– Image availability: COS images are published by Google and are accessible within projects when you create Compute Engine instances (subject to permissions).
– Compute Engine resources: VMs are zonal resources; Managed Instance Groups can be zonal or regional; load balancers are global or regional depending on type.
– Operational scope: you manage COS usage per project/VPC/instance template just like other Compute Engine images.
How it fits into the Google Cloud ecosystem – Compute Engine: primary place you use COS—single instances, MIGs, container-on-VM patterns. – GKE: COS is widely used as a node OS option (GKE manages nodes; you choose node image type). – Artifact Registry: store container images securely and pull from COS-hosted runtimes. – Cloud Logging/Monitoring: standard observability stack for VM and workload telemetry (implementation details depend on your chosen agents/approach; verify COS support for specific agents). – VPC + Cloud Load Balancing + Cloud Armor: front-end and secure COS-based workloads. – IAM + Service Accounts: authorize workloads to call Google APIs without embedding credentials.
Service name status
As of the latest generally available Google Cloud documentation, the product is still called Container-Optimized OS. (If you are using it via GKE node images, you may see COS variants referenced by image type names; verify the current image type labels in GKE docs.)
Official docs entry point: https://cloud.google.com/container-optimized-os/docs
3. Why use Container-Optimized OS?
Business reasons
- Lower operational overhead: fewer OS-level tickets (patching cadence, baseline hardening, drift remediation) when your real product is the container workload.
- Standardization: a consistent host OS across dev/test/prod and across teams reduces “snowflake VM” risk.
- Faster time to production: fewer decisions about OS packages and configuration; focus on image build + deployment.
Technical reasons
- Optimized for containers: COS is built for the “container is the unit of deployment” model.
- Reduced footprint: smaller OS surface area than a typical general-purpose distro.
- Immutability patterns: a read-only root filesystem approach discourages ad-hoc changes on the host.
Operational reasons
- Fleet-friendly: works well with instance templates and Managed Instance Groups; replace instances rather than repair them.
- Predictable updates: designed to be updated regularly in a controlled way (pin image versions when necessary, or use channels—verify exact mechanics in docs).
- Faster boot and simpler host: in many environments, COS boots quickly and has fewer moving parts.
Security / compliance reasons
- Smaller attack surface: fewer packages and services.
- Hardening patterns: immutable root filesystem design, strong defaults, and automatic updates reduce exposure windows.
- Better separation of concerns: app dependencies go into container images rather than the host OS.
Scalability / performance reasons
- Works well with MIG autoscaling: you can scale out stateless container workloads by adding instances.
- Container-centric resource usage: host overhead is typically smaller than full-featured distros (workload-dependent).
When teams should choose it
Choose Container-Optimized OS when: – You primarily run one or more containers as the VM workload. – You want standardized, hardened hosts with minimal customization. – You plan to use MIGs for elasticity and immutable infrastructure practices. – You want a stepping stone between “serverless” and “full Kubernetes”: – more control than Cloud Run – less platform complexity than managing Kubernetes for small deployments
When teams should not choose it
Avoid or reconsider COS when: – You need extensive OS customization, third-party agents that require package managers, or kernel/module tinkering. – You rely on interactive debugging with many common Linux tools installed by default. – Your workload expects a general-purpose VM environment (custom services, cron-heavy hosts, configuration management tools that assume writable root). – You want a managed container platform (consider Cloud Run or GKE Autopilot).
4. Where is Container-Optimized OS used?
Industries
- SaaS and web: standardized container fleets behind load balancers.
- Fintech and regulated industries: hardened baseline + controlled patching (always validate compliance needs against official attestations; COS itself isn’t automatically a compliance certification).
- Gaming and media: burstable stateless services or edge-like services on VM fleets.
- Data platforms: containerized sidecars, lightweight services, ingestion endpoints (not the place to run full data stacks unless designed carefully).
Team types
- Platform engineering teams building VM-based container platforms.
- DevOps/SRE teams maintaining fleets of stateless services.
- Security teams standardizing hardened VM images.
- App teams that want containers on VMs without adopting Kubernetes immediately.
Workloads
- HTTP APIs and web front ends (Nginx, Envoy, app services).
- Background workers / job processors (pull from Pub/Sub, process tasks).
- Proxies, gateways, and lightweight network appliances packaged as containers.
- Build runners or CI agents packaged in containers (be careful with privilege needs).
- Internal tools that don’t justify Kubernetes overhead.
Architectures
- Single VM running a container with a public IP (small dev/test).
- MIG of COS instances pulling images from Artifact Registry, fronted by Cloud Load Balancing.
- Blue/green or canary using multiple MIGs or rolling updates of instance templates.
- Hybrid patterns: COS VMs for specific components; GKE for the rest.
Production vs dev/test usage
- Dev/test: quick, low-maintenance way to run containers on VMs; useful for validation and demos.
- Production: common when you want VM-level control (custom networking, instance types, GPUs, specialized disks) but still want container immutability and a hardened host.
5. Top Use Cases and Scenarios
Below are realistic scenarios where Container-Optimized OS on Google Cloud Compute Engine is a strong fit.
1) Single-container web service on a VM (simple hosting)
- Problem: You need to host a small web service quickly, but don’t want to maintain Ubuntu patching and packages.
- Why COS fits: Minimal host; run your container as the primary workload.
- Example: A small internal dashboard served by
nginx+ a backend container on one VM for a dev environment.
2) Stateless API fleet with Managed Instance Groups
- Problem: Scale an API horizontally with predictable, repeatable hosts.
- Why COS fits: Great with instance templates + MIG; immutable rollout by replacing instances.
- Example: A regional MIG of COS instances runs
my-api:1.2.3, autoscaled by CPU, fronted by an external HTTP(S) load balancer.
3) Edge proxy / gateway layer
- Problem: You need high-performance L7 proxying with strong OS hardening.
- Why COS fits: Minimal OS + containerized proxy simplifies patching and upgrades.
- Example: Envoy containers in a MIG terminate mTLS and route traffic to internal services.
4) Batch/worker nodes pulling tasks from Pub/Sub
- Problem: Worker processes must scale up and down quickly and remain consistent.
- Why COS fits: Fast to boot and easy to “replace instead of fix.”
- Example: A MIG of worker VMs runs a container that pulls jobs from Pub/Sub and writes results to Cloud Storage.
5) Secure “jump workload” containers (not jump hosts)
- Problem: You need controlled administrative tools without turning a VM into a long-lived snowflake.
- Why COS fits: Host stays minimal; tools live in container images; access is audited via IAM and OS Login/IAP.
- Example: Run a locked-down container image containing database admin CLI tools and short-lived credentials.
6) CI/CD self-hosted runners packaged as containers
- Problem: You need runners that can be replaced easily and remain clean after jobs.
- Why COS fits: Immutable host; runners in containers; replace on compromise.
- Example: GitHub Actions runners or GitLab runners in a MIG where each instance is recycled frequently.
7) Dedicated network function appliances (containerized)
- Problem: You need custom routing, NAT helpers, or observability sidecars in a controlled environment.
- Why COS fits: Predictable baseline and fewer host services.
- Example: A containerized forward proxy or DNS caching tier.
8) Pre-GKE stepping stone for teams adopting containers
- Problem: Team wants containers but isn’t ready for Kubernetes complexity.
- Why COS fits: Container workflow with VM primitives (firewall, load balancer) is simpler than Kubernetes.
- Example: Two services deployed as two MIGs; rollouts via instance template version changes.
9) Multi-tenant internal services with strict baseline controls
- Problem: Multiple teams run services on shared platform; need consistent OS baseline.
- Why COS fits: Reduced drift; centralized image selection; strong defaults.
- Example: Platform team provides an opinionated COS instance template and teams provide only container image + config.
10) Specialized Compute Engine shapes (high-memory, local SSD, etc.)
- Problem: Your workload needs VM-specific features but you still want containers.
- Why COS fits: You get Compute Engine flexibility with containerized apps.
- Example: A high-memory VM runs a containerized in-memory service with persistent disks for snapshots.
11) Blue/green rollouts using instance template versions
- Problem: You need controlled rollouts with easy rollback.
- Why COS fits: New template references new container image digest; rollback is simply switching MIG template.
- Example: Two MIGs (blue and green) behind a load balancer; shift traffic gradually.
12) Hardened internal developer preview environments
- Problem: You want short-lived preview environments without long-term host maintenance.
- Why COS fits: Easy to create and delete; predictable baseline.
- Example: Per-branch preview service runs in a COS VM for a few hours, then deleted.
6. Core Features
Note: Some implementation details (exact runtime, channels, update controls, logging agent support) can change over time. Where appropriate, this section calls out what to verify in official docs.
6.1 Minimal, container-focused OS image
- What it does: Provides a slim host OS designed primarily to run containers.
- Why it matters: Fewer packages and services typically reduce attack surface and patching scope.
- Practical benefit: Less OS maintenance; smaller baseline to secure.
- Limitations/caveats: Not suited for workloads that assume a full Linux distro with package manager-based customization.
6.2 Read-only / immutable root filesystem patterns
- What it does: Uses an immutable-style root filesystem (read-only root) design approach.
- Why it matters: Reduces configuration drift and persistence of unauthorized host changes.
- Practical benefit: Encourages immutable infrastructure (replace rather than patch-in-place).
- Limitations/caveats: Installing host packages or modifying system files is intentionally constrained; you must plan for debugging and customization differently.
6.3 Automatic updates (designed for consistent patching)
- What it does: COS is designed to receive updates from Google to address security and stability issues.
- Why it matters: Shortens exposure window to vulnerabilities and reduces manual patch operations.
- Practical benefit: Better baseline hygiene across fleets.
- Limitations/caveats: Updates can require reboots; for production, use MIG rolling updates and capacity planning. Verify current controls for update strategy in official docs.
6.4 Support for container runtimes and OCI images
- What it does: Runs standard container images (OCI/Docker image format).
- Why it matters: Your build pipeline stays standard (Cloud Build, GitHub Actions, etc.).
- Practical benefit: Build once, run anywhere containers; pull from Artifact Registry.
- Limitations/caveats: Runtime tooling differs across images and use cases (for example,
containerdvs Docker). Verify the current recommended runtime and tooling in COS docs.
6.5 Tight integration with Compute Engine primitives
- What it does: COS is used like other Compute Engine images and works with:
- instance templates and MIGs
- VPC networks and firewall rules
- load balancing
- service accounts
- metadata and startup configuration patterns
- Why it matters: Lets you build production architectures with standard Google Cloud Compute building blocks.
- Practical benefit: Consistent operations with the rest of Compute Engine.
- Limitations/caveats: Some “traditional VM administration” approaches (configuration management writing to root) are not a great fit.
6.6 “Run a container as the VM workload” workflows
- What it does: Compute Engine supports deploying a container to a VM in a way that starts the container on boot (commonly done with
gcloud compute instances create-with-containerand/or container declarations in instance metadata). - Why it matters: You can treat the VM as a container host appliance.
- Practical benefit: Very fast path to “container on VM” without building a custom image.
- Limitations/caveats: This is not Kubernetes. Health checks, rollouts, and multi-container orchestration are more manual unless you build them (or use MIG patterns). Confirm current container declaration capabilities in the Compute Engine containers documentation.
6.7 Strong compatibility with immutable/fleet operations (MIG)
- What it does: Encourages immutable operations: update instance templates, roll instances.
- Why it matters: Predictable deployments; simpler rollback; better reliability than repairing pets.
- Practical benefit: Easier to standardize across teams.
- Limitations/caveats: Stateful workloads require extra design (persistent disks, careful draining, database patterns).
6.8 Works well with Artifact Registry + private images
- What it does: Pull images securely from Artifact Registry with IAM-controlled access.
- Why it matters: Avoid unauthenticated public pulls; control provenance.
- Practical benefit: Enterprise-ready image governance.
- Limitations/caveats: You must ensure the VM’s service account has the right Artifact Registry permissions and that egress/firewall allows registry access.
6.9 Designed for secure boot patterns (verify features used)
- What it does: COS is designed with verified boot concepts (Chromium OS heritage).
- Why it matters: Integrity of the host OS is central to container security.
- Practical benefit: Better baseline trust.
- Limitations/caveats: Compute Engine also has Shielded VM features; confirm compatibility and best practices for COS + Shielded VM in official docs.
7. Architecture and How It Works
High-level architecture
At its simplest, Container-Optimized OS is: – A Compute Engine VM – Booting from a COS image – Running one or more containers as the workload – Connected to a VPC network – Observed via Cloud Logging/Monitoring and governed via IAM
Control flow vs data flow
- Control plane (Google Cloud):
- You define a VM or instance template referencing a COS image family/version.
- You optionally provide container configuration (image, env vars, restart policy) via metadata or “create-with-container”.
- IAM decides who can create/modify instances, firewall rules, service accounts, and who can SSH (for example via OS Login).
- Data plane (your workload):
- Traffic hits a VM external IP or a load balancer.
- The container receives traffic on its exposed port.
- The container calls other services (databases, Pub/Sub, Storage) using service account credentials.
Integrations with related services
Common and practical integrations include: – Artifact Registry for private container images. – Cloud Load Balancing for global/regional front ends. – Managed Instance Groups for scaling and rolling updates. – Cloud DNS for naming. – Secret Manager (recommended) for secrets retrieved at runtime by the app, rather than stored in instance metadata. – Cloud Logging and Cloud Monitoring for logs/metrics (agent approach varies; verify the recommended agent approach for COS in official docs). – Cloud Armor to protect HTTP(S) services from common attacks when using HTTP(S) load balancing.
Dependency services
- Compute Engine API is required.
- If using private images: Artifact Registry API and IAM bindings.
- If using load balancing: additional networking and load balancing APIs/resources.
Security/authentication model
- Google Cloud IAM: controls who can create/modify instances and associated resources.
- Service accounts: attached to VMs to grant workload access to Google APIs.
- OS Login / IAM-based SSH (recommended): use IAM to control SSH access and log it.
- Firewall rules: enforce network exposure at VPC level.
- Container image security: depends on your build pipeline, scanning, and provenance controls.
Networking model
- VMs attach to a VPC network/subnet.
- Ingress is controlled by firewall rules and (optionally) load balancers.
- Egress follows VPC routing/NAT; consider Cloud NAT if you want private instances without external IPs.
Monitoring/logging/governance considerations
- Decide how you will:
- collect host and container logs
- collect metrics and traces
- patch/roll instances safely (MIG rolling update)
- tag and label resources for cost allocation
- The best practice is to treat COS instances as replaceable and to externalize state.
Simple architecture diagram (Mermaid)
flowchart LR
User((User)) -->|HTTP| FW[Firewall rule]
FW --> VM[COS VM<br/>Container-Optimized OS]
VM --> C[Container<br/>Web App]
C --> GCP[(Google APIs<br/>via Service Account)]
Production-style architecture diagram (Mermaid)
flowchart TB
subgraph Internet[Internet]
U((Users))
end
subgraph GCP[Google Cloud Project]
direction TB
LB[External HTTP(S) Load Balancer]
ARMOR[Cloud Armor Policy]
DNS[Cloud DNS]
subgraph VPC[VPC Network]
direction TB
MIG[Regional Managed Instance Group<br/>COS instances]
HC[Health Checks]
FW2[Firewall Rules]
NAT[Cloud NAT (optional)]
end
AR[Artifact Registry<br/>Private Images]
SM[Secret Manager]
LOG[Cloud Logging]
MON[Cloud Monitoring]
IAM[IAM + Service Accounts]
end
U -->|DNS| DNS --> LB
LB --> ARMOR --> MIG
HC --> MIG
FW2 --> MIG
MIG -->|pull image| AR
MIG -->|fetch secrets at runtime| SM
MIG --> LOG
MIG --> MON
IAM --> MIG
MIG --> NAT
8. Prerequisites
Account / project requirements
- A Google Cloud project with billing enabled.
- Ability to enable required APIs in the project.
Permissions / IAM roles
For a lab in a personal sandbox project, Project Owner is simplest.
For least-privilege in a real environment, you typically need:
– Permissions to create and manage Compute Engine instances (for example, roles/compute.instanceAdmin.v1)
– Permissions to create firewall rules if you do that in the lab (for example, roles/compute.securityAdmin or roles/compute.networkAdmin)
– Permission to use a service account if attaching one (roles/iam.serviceAccountUser on that service account)
Exact least-privilege depends on your organization policies; verify with your IAM admins.
Billing requirements
- Compute Engine resources incur charges (VM core/RAM time, disks, IPs, load balancing, egress).
- Container-Optimized OS itself is an image; pricing is primarily for the underlying Compute Engine resources.
CLI / tools
- Cloud Shell (recommended) or local installation of:
gcloudCLI: https://cloud.google.com/sdk/docs/install- Optional:
curlfor testing endpoints.
Region availability
- COS images are used in Compute Engine, which is available across many regions/zones. Choose a zone close to your users and other dependencies.
- Some machine types and features are region/zone dependent. Verify in Compute Engine docs if you need specific hardware.
Quotas / limits
Common quotas to check: – vCPU quota in your chosen region – In-use IP addresses – Firewall rules quota (usually not an issue in small labs) – If using MIG/LB later: forwarding rules and backend service quotas
Prerequisite services
- Compute Engine API must be enabled.
9. Pricing / Cost
Pricing model (accurate framing)
Container-Optimized OS does not have a separate SKU you pay for like a managed service. Your costs come from the Compute Engine resources you run COS on, plus any connected services (load balancer, disks, logs, egress, Artifact Registry, etc.).
Primary official pricing references: – Compute Engine pricing: https://cloud.google.com/compute/pricing – Google Cloud Pricing Calculator: https://cloud.google.com/products/calculator
Pricing dimensions to understand
You typically pay for: 1. VM runtime: vCPU + memory pricing by machine type and region. 2. Boot disk and attached disks: persistent disk type (balanced, SSD, standard), size (GB-month), and IOPS/throughput characteristics depending on disk type. 3. Networking: – Egress to the internet (often a major driver) – Cross-region traffic – Load balancer data processing (if used) 4. External IP: depending on how the IP is used (ephemeral vs reserved, attached vs unused) pricing can vary; verify current external IP pricing in official docs. 5. Operations suite (Logging/Monitoring): logs ingestion/retention and metrics beyond free allocations. 6. Artifact Registry: – storage for container images – network egress when pulling images across regions (and general network costs) 7. Optional security: – Cloud Armor policies and rules – KMS usage if you add customer-managed encryption keys (CMEK) to disks or other resources
Free tier (if applicable)
Google Cloud has an “Always Free” tier for some resources in some regions (historically including a small VM). Eligibility and details change over time and vary by region and usage. Verify current Always Free eligibility in official docs before assuming a workload is free.
Cost drivers (what surprises teams)
- Internet egress from serving traffic publicly can exceed compute costs.
- Overprovisioned machine types: using a larger instance than necessary for a small container.
- Log volume: chatty containers can generate expensive log ingestion.
- Load balancer + multiple zones: great for reliability, but adds cost.
- Image pull patterns: frequent instance recreation can cause frequent image pulls (and potential egress) if not regionally optimized.
Hidden/indirect costs
- Engineering time for:
- secure image supply chain
- rollouts/rollbacks
- secrets management
- observability
- If you move from “single VM” to “production fleet”, costs often shift to:
- load balancing
- monitoring/logging
- security controls (Armor, WAF-like policies)
- multi-zone redundancy
How to optimize cost (practical)
- Right-size instances (start small; measure CPU/memory).
- Use Managed Instance Groups with autoscaling for variable traffic.
- Use Sustained Use Discounts automatically where applicable and evaluate Committed Use Discounts for steady workloads (Compute Engine pricing model; verify current discount applicability).
- Reduce log volume:
- tune application logging levels
- apply log exclusions in Cloud Logging if appropriate
- Keep Artifact Registry in the same region as your compute fleet to minimize latency and cross-region egress.
- Prefer private instances behind a load balancer + Cloud NAT if you don’t need per-VM public IPs.
Example low-cost starter estimate (no fabricated prices)
A minimal lab setup often includes: – 1 small VM instance (e.g., an E2-family small machine type) – 1 small boot disk – 1 firewall rule – Minimal internet egress (a few MB for testing)
To estimate your real cost:
1. Open the Pricing Calculator: https://cloud.google.com/products/calculator
2. Add “Compute Engine”.
3. Select your region, machine type, usage hours (e.g., a few hours), disk type/size.
4. Add expected internet egress (even small amounts).
Because compute and network pricing are region-dependent, do not rely on a single universal number.
Example production cost considerations
A typical production pattern (MIG + load balancer) adds: – Multiple instances across zones (or regional MIG) – Load balancer components (forwarding rules, proxies, backend service) – Health checks – Higher log and metric volume – Potential Cloud Armor usage – More egress volume
Use the calculator with: – your steady-state instance count – expected peak scaling – expected requests/GB egress – log volume (if you can estimate)
10. Step-by-Step Hands-On Tutorial
This lab deploys a real container to a Compute Engine VM running Container-Optimized OS and exposes it over HTTP for quick validation.
Objective
- Create a Compute Engine VM that uses Container-Optimized OS and automatically runs an
nginxcontainer. - Allow inbound HTTP traffic.
- Validate the service.
- Clean up resources to avoid ongoing charges.
Lab Overview
You will:
1. Set your project and enable the Compute Engine API.
2. Create a firewall rule to allow inbound TCP port 80.
3. Create a COS-based VM using gcloud compute instances create-with-container.
4. Validate with curl.
5. Troubleshoot common issues.
6. Delete the VM and firewall rule.
Why
create-with-container?
It’s the most beginner-friendly way to run a container as the “main” VM workload on Container-Optimized OS without building a custom image.
Expected cost
Low, if you:
– use a small VM
– keep the lab running only briefly
– generate minimal egress
Always verify pricing for your region and account.
Step 1: Select a project, region, and enable the API
In Cloud Shell, run:
gcloud auth list
gcloud config list project
Set your project:
export PROJECT_ID="YOUR_PROJECT_ID"
gcloud config set project "${PROJECT_ID}"
Pick a zone (example: us-central1-a). Choose one close to you:
export ZONE="us-central1-a"
gcloud config set compute/zone "${ZONE}"
Enable the Compute Engine API:
gcloud services enable compute.googleapis.com
Expected outcome
– Compute Engine API is enabled.
– Your gcloud default project and zone are set.
Step 2: Create a firewall rule to allow HTTP (port 80)
Create a firewall rule that allows inbound TCP:80 to instances with a specific network tag.
gcloud compute firewall-rules create allow-http-80 \
--direction=INGRESS \
--priority=1000 \
--network=default \
--action=ALLOW \
--rules=tcp:80 \
--source-ranges=0.0.0.0/0 \
--target-tags=cos-http
Expected outcome
– A firewall rule named allow-http-80 exists in your project.
– Only instances tagged cos-http will be reachable on port 80.
Verification
gcloud compute firewall-rules describe allow-http-80 --format="value(name,network,direction,allowed[].IPProtocol,allowed[].ports)"
Step 3: Create a Container-Optimized OS VM that runs Nginx
Create a VM and specify a container image to run. This command will: – create the VM – use a COS-based container-VM workflow – start the container on boot
export VM_NAME="cos-nginx-1"
gcloud compute instances create-with-container "${VM_NAME}" \
--tags=cos-http \
--machine-type=e2-micro \
--container-image=nginx:stable
Notes:
– e2-micro is a small machine type commonly used for labs, but availability and cost depend on region. If it fails due to quota or availability, try e2-small.
– The container image is pulled from a public registry in this example. For production, prefer Artifact Registry with IAM-controlled access.
Expected outcome – A VM instance is created. – Nginx container starts automatically. – The VM has an external IP (by default, in the default VPC unless you changed defaults).
Verification Describe the instance and capture its external IP:
gcloud compute instances describe "${VM_NAME}" --format="get(networkInterfaces[0].accessConfigs[0].natIP)"
Store it:
export EXTERNAL_IP="$(gcloud compute instances describe "${VM_NAME}" --format="get(networkInterfaces[0].accessConfigs[0].natIP)")"
echo "External IP: ${EXTERNAL_IP}"
Step 4: Test the web server from Cloud Shell
Run:
curl -i "http://${EXTERNAL_IP}/"
Expected outcome
– You receive an HTTP response (typically HTTP/1.1 200 OK) and see the Nginx welcome HTML.
If you want to see headers only:
curl -I "http://${EXTERNAL_IP}/"
Step 5: Basic operational checks (instance + container)
Check instance status:
gcloud compute instances describe "${VM_NAME}" --format="value(status)"
If you need to SSH for deeper debugging:
gcloud compute ssh "${VM_NAME}"
Once connected, you can inspect system logs with journalctl (available on many systemd-based systems). The exact unit names and container supervisor depend on the container-on-VM implementation. If you don’t immediately see container logs, use:
sudo journalctl --no-pager -n 200
If the container-on-VM workflow uses a dedicated service unit, you can list units and search:
sudo systemctl list-units --type=service | head
sudo systemctl list-units --type=service | grep -i -E "container|konlet|docker|containerd" || true
If you require a precise “which service starts the container” answer for your chosen image family, verify in the official Compute Engine containers documentation, because the underlying components and naming can evolve.
Exit SSH:
exit
Validation
You have successfully validated that: – Container-Optimized OS can run a container workload on Compute Engine. – The workload is reachable over HTTP. – You can operate it using standard Compute Engine tooling.
A quick final validation summary:
echo "VM: ${VM_NAME}"
echo "IP: ${EXTERNAL_IP}"
curl -I "http://${EXTERNAL_IP}/" | head -n 1
Troubleshooting
Issue: curl times out / cannot connect
Common causes and fixes:
1. Firewall rule missing or wrong tag
– Ensure the VM has the tag cos-http:
bash
gcloud compute instances describe "${VM_NAME}" --format="value(tags.items)"
– Ensure firewall rule targets that tag:
bash
gcloud compute firewall-rules describe allow-http-80 --format="value(targetTags)"
-
Wrong IP – Re-check the external IP:
bash gcloud compute instances describe "${VM_NAME}" --format="get(networkInterfaces[0].accessConfigs[0].natIP)" -
Container not running – SSH in and inspect logs (
journalctl) as shown above. – Recreate the instance if needed (in immutable style, replacing is often faster than deep repair).
Issue: create-with-container fails with permissions error
- Ensure you have permissions to create instances.
- In managed orgs, Organization Policy may block external IPs or public firewall rules. If so:
- Use an internal load balancer / private access patterns
- Or request policy exceptions in a sandbox project
Issue: Machine type not available / quota exceeded
- Try a different zone:
bash gcloud compute zones list --filter="region:(us-central1)" --format="value(name)" - Try a different machine type (e.g.,
e2-small).
Cleanup
Delete the VM:
gcloud compute instances delete "${VM_NAME}" --quiet
Delete the firewall rule:
gcloud compute firewall-rules delete allow-http-80 --quiet
Verify cleanup:
gcloud compute instances list --filter="name=${VM_NAME}"
gcloud compute firewall-rules list --filter="name=allow-http-80"
11. Best Practices
Architecture best practices
- Prefer Managed Instance Groups for production:
- Enables rolling updates and autohealing.
- Makes “replace instances” the standard remediation.
- Externalize state:
- Store data in managed services (Cloud SQL, Spanner, Firestore) or persistent disks designed for that purpose.
- Keep COS VMs as stateless as possible.
- Use load balancers instead of per-VM public IPs:
- Better security posture and easier TLS, health checks, and scaling.
IAM and security best practices
- Use dedicated service accounts per workload and grant least privilege.
- Use OS Login (and ideally IAP for SSH) to avoid unmanaged SSH keys.
- Restrict firewall rules:
- Avoid
0.0.0.0/0unless necessary. - Limit inbound ports; default deny.
- Pin and verify container images:
- Prefer immutable image references (digests) in production rather than mutable tags like
latest.
Cost best practices
- Right-size aggressively; measure real CPU/memory usage.
- Use autoscaling with MIGs for variable workloads.
- Minimize egress and cross-region pulls (keep Artifact Registry close to compute).
- Control log volume; implement log exclusions where appropriate.
Performance best practices
- Keep container images small; optimize layer caching.
- Use regional placement to reduce latency to dependencies.
- Ensure health checks are representative and not overly expensive.
Reliability best practices
- Run across multiple zones (regional MIG) for high availability.
- Use load balancer health checks and autohealing.
- Design for instance replacement during updates and failures.
- Implement graceful shutdown in your application so rolling updates don’t drop requests.
Operations best practices
- Standardize instance templates and use labels for ownership and environment (
env=prod,team=payments). - Maintain a documented rollout process (update template, rolling update parameters, rollback).
- Keep a break-glass procedure for emergency access that is auditable and time-bound.
Governance/tagging/naming best practices
- Consistent naming:
svc-env-region-role-###(example:api-prod-uscentral1-web-001)- Use labels for:
- cost center
- data sensitivity tier
- owner/oncall
- Track COS image family/version and container image digest in deployment records.
12. Security Considerations
Identity and access model
- IAM governs:
- who can create/modify/delete instances, templates, firewall rules
- who can attach service accounts and what scopes/permissions workloads get
- Service accounts are the recommended way for apps to access Google Cloud APIs.
- OS Login integrates Linux account access with IAM and helps centralize auditability.
Encryption
- At rest: Compute Engine disks are encrypted by default. For stricter requirements, consider CMEK (customer-managed keys) for disks (verify the current CMEK support and configuration in Compute Engine docs).
- In transit: Use TLS termination at the load balancer or in the container. Prefer managed certificates and modern TLS policies where applicable.
Network exposure
- Avoid giving every instance a public IP in production.
- Use:
- External HTTP(S) Load Balancer (public entry)
- Private instances in subnets
- Cloud NAT for outbound access without inbound exposure
- Use firewall rules with least exposure and target tags/service accounts.
Secrets handling
- Avoid storing secrets in:
- instance metadata
- container image layers
- source control
- Prefer Secret Manager and fetch secrets at runtime using the VM’s service account identity.
- Rotate secrets and use short-lived credentials where possible.
Audit/logging
- Use Cloud Audit Logs for administrative actions (instance creation, firewall changes, IAM changes).
- Ensure you can attribute:
- who deployed a new container version
- who changed network exposure
- who accessed instances (OS Login + IAP logs, where used)
- For workload logs:
- centralize to Cloud Logging (agent/collection method depends on your approach; verify the recommended method for COS).
Compliance considerations
- COS can support secure operations, but compliance depends on:
- your configuration
- identity controls
- logging/retention
- vulnerability management
- network boundaries
Always validate requirements against Google Cloud compliance documentation and your auditor’s needs.
Common security mistakes
- Leaving SSH open to the internet with weak key management.
- Using wide firewall rules (
0.0.0.0/0) for admin ports. - Running containers as root unnecessarily.
- Pulling public images without provenance checks.
- Using mutable tags (
latest) in production. - Treating COS hosts as “pet servers” and making manual changes that aren’t reproducible.
Secure deployment recommendations
- Prefer MIG + LB architecture.
- Use private images in Artifact Registry with IAM.
- Use image scanning/provenance in your CI pipeline (verify your chosen tooling).
- Restrict metadata exposure and avoid sensitive data in metadata.
- Implement runtime security controls in the application and container configuration (non-root user, read-only filesystem in container where possible).
13. Limitations and Gotchas
Known limitations / design constraints
- Not a general-purpose Linux distro: package installation and customization are intentionally limited.
- Debugging friction: fewer built-in tools; you may need “debug containers” or dedicated debugging workflows.
- Host persistence model differs: immutable root patterns mean some changes won’t persist or are discouraged.
- Multi-container orchestration is limited without Kubernetes or additional tooling: the simplest workflows assume “one main container per VM” or require you to build your own supervisor approach.
Quotas and scaling constraints
- vCPU quota and IP quota can block scaling.
- Load balancing quotas can surprise teams when moving to production patterns.
Regional constraints
- Some machine types and accelerators are zone-specific.
- Keep Artifact Registry and compute in compatible regions to avoid latency/egress.
Pricing surprises
- Internet egress for public services.
- Log ingestion volume from chatty containers.
- External IP charges depending on usage type (verify current billing rules).
Compatibility issues
- Some third-party monitoring/security agents assume they can install packages or write broadly to the filesystem.
- Kernel module requirements can be tricky; verify whether your workload needs specific kernel modules/drivers.
Operational gotchas
- If you treat instances as mutable pets, you’ll fight the platform.
- Updates/reboots must be planned for (MIG rolling updates help).
- Container image pull failures (auth, network) can cause instances to come up “healthy VM but unhealthy app.”
Migration challenges
- Moving from Ubuntu to COS may require:
- rebuilding host-installed software into container images
- redesigning log collection
- changing SSH/debug habits
Vendor-specific nuances
- COS is deeply integrated with Google Cloud’s Compute Engine model. If you need portability across clouds at the VM OS level, consider whether a more generic OS (or Kubernetes) is a better abstraction.
14. Comparison with Alternatives
In Google Cloud (nearest options)
- Ubuntu/Debian on Compute Engine: flexible general-purpose OS, more host maintenance.
- GKE Standard / Autopilot: managed Kubernetes; more features for orchestration and scale, but more platform complexity.
- Cloud Run: serverless containers; simplest ops model but less VM-level control and some workload constraints.
In other clouds (nearest conceptual peers)
- AWS Bottlerocket: container-optimized OS for ECS/EKS.
- Azure Linux / CBL-Mariner-based container host patterns: Microsoft has container host OS patterns; exact product choices vary—verify current Azure recommendations.
- Self-managed minimal OS: Fedora CoreOS, Flatcar, etc., when you want an immutable OS with different ecosystem tradeoffs.
Comparison table
| Option | Best For | Strengths | Weaknesses | When to Choose |
|---|---|---|---|---|
| Container-Optimized OS (Google Cloud) | Container workloads on Compute Engine VMs | Minimal/hardened host, designed for containers, good for MIG patterns | Limited host customization, different debugging model | You want containers on VMs with strong baseline and low OS toil |
| Ubuntu/Debian on Compute Engine | Mixed workloads, custom agents, traditional VM ops | Familiar tooling, package managers, broad compatibility | Larger attack surface, more patching/drift risk | You need broad OS flexibility or legacy software |
| GKE Standard | Kubernetes-managed container platforms | Rich orchestration, scaling, service discovery, policies | Kubernetes operational overhead (though managed) | You have multiple services and want Kubernetes features |
| GKE Autopilot | “Kubernetes with less ops” | Less node management, opinionated best practices | Less infrastructure control, different cost model | You want Kubernetes but don’t want to manage nodes |
| Cloud Run | Stateless HTTP services and event-driven containers | Very low ops, fast deploys, scale to zero | Platform constraints (request/response model, execution limits), less network control | You want serverless simplicity and fit the model |
| AWS Bottlerocket (AWS) | Container hosts in AWS | Minimal immutable OS for containers | Different cloud, different integrations | Multi-cloud comparison; choose if you’re on AWS |
| Fedora CoreOS / Flatcar (self-managed) | Immutable OS approach with broader control | Strong immutability story, flexible environments | You manage lifecycle and integration | You need an immutable OS but prefer non-cloud-vendor images |
15. Real-World Example
Enterprise example: Secure API fleet on Compute Engine with controlled rollout
- Problem: A large enterprise has strict security requirements and wants to reduce VM drift. They run containerized APIs that must integrate with existing VPCs, shared load balancers, and IAM.
- Proposed architecture:
- Artifact Registry for private images
- Cloud Build pipeline builds and signs images (signing approach depends on chosen tooling; verify)
- Regional MIG of COS instances using an instance template that references a pinned COS image family/version
- External HTTP(S) Load Balancer + Cloud Armor in front
- Workloads use service accounts to access Pub/Sub and Cloud SQL
- Centralized logging/monitoring with alerting tied to SLOs
- Why COS was chosen:
- Minimal host OS reduces attack surface and drift
- Automated updates fit enterprise patching goals when paired with MIG rolling updates
- Clear separation: “host is appliance, app is container”
- Expected outcomes:
- Faster, safer rollouts (template update → rolling update)
- Reduced OS vulnerabilities window and fewer manual patch cycles
- Consistent baseline across environments
Startup/small-team example: Simple container hosting without Kubernetes
- Problem: A startup needs a reliable service host for one API and one worker, but Kubernetes is too heavy for current team size.
- Proposed architecture:
- COS VM(s) running container workloads
- A small MIG for the API behind a load balancer
- Worker service on a separate MIG without public ingress
- Artifact Registry for images
- Secret Manager for API keys
- Why COS was chosen:
- Reduced ops compared to Ubuntu patch management
- Easier than Kubernetes while still enabling immutable deployments
- Expected outcomes:
- Simple deploy pipeline: build image → update template → roll
- Lower operational burden and predictable environment
- A clear growth path to GKE later if needed
16. FAQ
1) Is Container-Optimized OS a separate billed product?
No. You pay for the Compute Engine resources (VMs, disks, network, etc.). COS is an OS image you choose for instances.
2) Can I SSH into a COS VM?
Yes, you can SSH like other Compute Engine VMs (subject to IAM and network controls). Use OS Login/IAP where possible for better security.
3) Can I install packages with apt or yum?
Typically, no—COS is not meant to be managed like a general-purpose distro. Put dependencies in container images instead.
4) How do OS updates work?
COS is designed to receive automated updates. For production, plan for reboots and use MIG rolling updates. Verify current update controls/channels in official docs.
5) Is COS only for a single container per VM?
Many common workflows assume one primary container, but you can run additional containers depending on your chosen approach. If you need multi-container orchestration with service discovery and rollouts, consider GKE.
6) Should I use COS or GKE?
Use COS on Compute Engine when you want VM-based control and simpler operations for a smaller set of services. Use GKE when you need Kubernetes orchestration features, multi-service scheduling, and Kubernetes-native policies.
7) Does COS work with Managed Instance Groups?
Yes. COS is often used with MIGs for autohealing, autoscaling, and rolling updates.
8) How do I pull private images from Artifact Registry?
Attach a service account to the VM with Artifact Registry read permissions and ensure network access to the registry endpoint. Verify the exact required IAM role(s) in Artifact Registry docs.
9) Where should I store secrets for COS workloads?
Use Secret Manager and fetch secrets at runtime using the VM’s service account identity. Avoid embedding secrets in metadata or images.
10) How do I handle persistent storage?
Prefer managed services. If you must persist files, use persistent disks or other Google Cloud storage products. Design carefully so instance replacement does not lose state.
11) Is COS “more secure” than Ubuntu by default?
It’s designed with a smaller footprint and hardened patterns, which can reduce attack surface. Security still depends heavily on your container image, IAM, network exposure, and operational practices.
12) Can I run non-container workloads on COS?
COS is intended for containers. If you need general-purpose workloads or host-installed software, use a general-purpose OS image.
13) How do I do blue/green deployments with COS?
Commonly: create a new instance template (new container image digest), roll a new MIG or update an existing MIG with controlled rollout, and switch traffic via load balancer backends.
14) How do I observe container logs and metrics?
Use Cloud Logging/Monitoring. The exact agent/collection method depends on COS support and your chosen approach. Verify the current recommended method in official docs.
15) What’s the difference between COS on Compute Engine and COS as GKE node image?
On Compute Engine, you manage the VM lifecycle and container startup method. On GKE, Google (or you, depending on mode) manages nodes and Kubernetes orchestrates containers.
16) Can I use COS for highly regulated environments?
Possibly, but you must validate the entire system (IAM, logging, encryption, network boundaries, patching processes) against your compliance framework. Don’t assume compliance from OS choice alone.
17) Do I need a public IP for a COS VM?
No. Many production designs use private VMs behind a load balancer, and use Cloud NAT for outbound access.
17. Top Online Resources to Learn Container-Optimized OS
| Resource Type | Name | Why It Is Useful |
|---|---|---|
| Official documentation | Container-Optimized OS docs — https://cloud.google.com/container-optimized-os/docs | Primary source for COS concepts, images, security model, and operations |
| Official release notes | Container-Optimized OS release notes — https://cloud.google.com/container-optimized-os/docs/release-notes | Track security fixes, version changes, and behavioral updates |
| Official Compute Engine containers guide | Deploying containers on VMs (Compute Engine) — https://cloud.google.com/compute/docs/containers | Authoritative guide for create-with-container and container declaration patterns |
| Official pricing | Compute Engine pricing — https://cloud.google.com/compute/pricing | COS cost is primarily Compute Engine cost; this is the base pricing reference |
| Pricing calculator | Google Cloud Pricing Calculator — https://cloud.google.com/products/calculator | Build a region-specific estimate including disks, egress, and load balancing |
| Architecture guidance | Google Cloud Architecture Center — https://cloud.google.com/architecture | Patterns for MIGs, load balancing, security, and operations |
| Observability | Cloud Operations suite docs — https://cloud.google.com/products/operations | Logging/Monitoring patterns that apply to VM/container architectures |
| Container image registry | Artifact Registry docs — https://cloud.google.com/artifact-registry/docs | Secure private image storage and IAM-controlled access |
| Security/IAM | IAM overview — https://cloud.google.com/iam/docs/overview | Correct identity model for VM/container workloads |
| Tutorials (official) | Compute Engine tutorials — https://cloud.google.com/compute/docs/tutorials | VM patterns that often pair well with COS (MIGs, LBs, networking) |
18. Training and Certification Providers
| Institute | Suitable Audience | Likely Learning Focus | Mode | Website URL |
|---|---|---|---|---|
| DevOpsSchool.com | DevOps engineers, SREs, platform teams | DevOps tooling, cloud operations, CI/CD, container operations | Check website | https://www.devopsschool.com/ |
| ScmGalaxy.com | Beginners to intermediate DevOps learners | SCM, DevOps fundamentals, build/release practices | Check website | https://www.scmgalaxy.com/ |
| CLoudOpsNow.in | Cloud ops practitioners | Cloud operations, monitoring, automation | Check website | https://www.cloudopsnow.in/ |
| SreSchool.com | SREs and reliability-focused engineers | SRE practices, reliability engineering, observability | Check website | https://www.sreschool.com/ |
| AiOpsSchool.com | Ops teams exploring AIOps | AIOps concepts, automation, monitoring analytics | Check website | https://www.aiopsschool.com/ |
19. Top Trainers
| Platform/Site | Likely Specialization | Suitable Audience | Website URL |
|---|---|---|---|
| RajeshKumar.xyz | DevOps/Cloud training content (verify offering) | Individuals and teams seeking guided learning | https://www.rajeshkumar.xyz/ |
| devopstrainer.in | DevOps training (verify course catalog) | Beginners to advanced DevOps learners | https://www.devopstrainer.in/ |
| devopsfreelancer.com | Freelance DevOps guidance/services (treat as a resource platform unless verified) | Teams needing short-term expert help | https://www.devopsfreelancer.com/ |
| devopssupport.in | DevOps support/training resources (verify scope) | Engineers needing practical support | https://www.devopssupport.in/ |
20. Top Consulting Companies
| Company Name | Likely Service Area | Where They May Help | Consulting Use Case Examples | Website URL |
|---|---|---|---|---|
| cotocus.com | DevOps and cloud consulting (verify exact offerings) | Cloud migration, CI/CD, infrastructure automation | COS-based MIG design, secure container hosting on Compute Engine, rollout/rollback automation | https://www.cotocus.com/ |
| DevOpsSchool.com | DevOps consulting and enablement | Platform engineering, training + implementation | Designing container-on-VM reference architectures, setting up Artifact Registry + CI pipelines | https://www.devopsschool.com/ |
| DEVOPSCONSULTING.IN | DevOps consulting (verify exact offerings) | Operational readiness, automation, reliability practices | MIG + load balancer production setup, logging/monitoring baseline, security review | https://www.devopsconsulting.in/ |
21. Career and Learning Roadmap
What to learn before Container-Optimized OS
- Google Cloud fundamentals:
- projects, billing, IAM, service accounts
- VPC networking and firewall rules
- Compute Engine basics:
- instances, images, disks
- instance templates and MIGs (recommended)
- Containers fundamentals:
- Docker/OCI images, registries
- container networking and ports
- basic security (non-root, minimal images)
What to learn after Container-Optimized OS
- Production architectures:
- external HTTP(S) load balancing
- Cloud Armor basics
- multi-zone design and SLOs
- CI/CD and supply chain:
- Cloud Build or other CI
- Artifact Registry permissions and lifecycle policies
- vulnerability scanning and provenance (verify your selected tooling)
- Kubernetes (optional but common next step):
- GKE Standard/Autopilot
- deployment strategies, services, ingress, policies
Job roles that use it
- Cloud engineer (Compute Engine + container hosting)
- DevOps engineer / platform engineer
- SRE (especially VM fleet operations)
- Security engineer (hardened baseline, workload identity, network boundaries)
Certification path (if available)
There is no “Container-Optimized OS certification” specifically. Relevant Google Cloud certifications typically include:
– Associate Cloud Engineer
– Professional Cloud Architect
– Professional Cloud DevOps Engineer
Verify current certification names and requirements: https://cloud.google.com/learn/certification
Project ideas for practice
- Build a small service and deploy it to a COS MIG behind a load balancer.
- Implement blue/green via two MIGs and controlled traffic switching.
- Store images in Artifact Registry and restrict access via service accounts.
- Implement Secret Manager integration and rotate secrets.
- Add Cloud Monitoring alerts on HTTP error rate and latency.
- Create a cost dashboard using labels for team/environment.
22. Glossary
- Artifact Registry: Google Cloud service to store container images and other artifacts with IAM-based access control.
- Compute Engine: Google Cloud’s IaaS VM service.
- Container image: A packaged filesystem and metadata used to run a container (OCI/Docker format).
- Container runtime: Software that runs containers on a host (commonly
containerd; Docker Engine historically in some contexts—verify for your COS image). - COS: Common abbreviation for Container-Optimized OS.
- Firewall rule (VPC): Network rule controlling allowed/denied traffic to VM instances.
- IAM: Identity and Access Management, controls permissions in Google Cloud.
- Instance template: A reusable VM configuration used by Managed Instance Groups.
- Managed Instance Group (MIG): A group of identical VMs managed as a single entity for scaling, autohealing, and rolling updates.
- OS Login: IAM-integrated method for managing SSH access to VMs.
- Service account: A Google identity used by workloads to access Google Cloud APIs.
- Shielded VM: Compute Engine features for protecting against boot-level and rootkit attacks (verify COS compatibility and best practices).
- VPC: Virtual Private Cloud network in Google Cloud.
- Workload identity (VM): Using a VM’s service account credentials to access Google Cloud APIs without static keys.
23. Summary
Container-Optimized OS is a Google-managed, container-focused operating system image for Google Cloud Compute Engine. It matters because it reduces OS maintenance overhead, limits host attack surface, and aligns well with immutable infrastructure practices—especially when combined with Managed Instance Groups for rolling updates and autohealing.
Cost-wise, COS itself is not a separate billed service; your spend is driven by Compute Engine VM runtime, disks, networking (especially egress), load balancing, and observability. Security-wise, COS helps by providing a minimal and hardened baseline, but real security still depends on IAM least privilege, firewall design, image provenance, secrets handling, and logging/auditing.
Use Container-Optimized OS when you want to run containers on VMs with a strong baseline and straightforward operations. If you need full orchestration and Kubernetes-native features, plan for GKE; if you want maximum simplicity and your workload fits, consider Cloud Run.
Next step: take the lab further by putting your COS instances into a Managed Instance Group behind an HTTP(S) load balancer, using Artifact Registry (private images) and Secret Manager (runtime secrets).