Category
Compute
1. Introduction
VM Manager is Google Cloud’s operating-system (OS) management toolkit for Compute Engine virtual machine fleets. It helps you keep VMs patched, understand what software is installed, and enforce baseline OS configuration—without having to log in to each instance.
In simple terms: VM Manager lets you patch and audit many VMs from one place. You define which VMs should be updated (for example, “all production web servers”), when updates should run (maintenance windows), and how results should be reported.
Technically, VM Manager is primarily a Google Cloud Console experience plus APIs (most notably the OS Config API) that coordinate patching and inventory collection through an OS Config agent running inside each VM. VM Manager integrates with Compute Engine (instances, labels, instance groups), Cloud Logging (audit and operational visibility), and IAM (access control). It is designed for both ad hoc operations (run a patch job now) and scheduled operations (recurring patch deployments).
The core problem VM Manager solves is operational control and compliance at scale: reducing exposure from unpatched vulnerabilities, producing inventory evidence for audits, and standardizing OS configuration across many VMs—without building and maintaining your own patch orchestration platform.
Service naming note (important): In current Google Cloud documentation and APIs, many VM Manager capabilities are implemented by the OS Config service (OS Config API). VM Manager remains a widely used name in the Console and Compute documentation as the “suite” that provides these OS management workflows.
2. What is VM Manager?
VM Manager is a set of Google Cloud Compute capabilities used to manage operating systems for Compute Engine VMs. Its official purpose is to provide centralized tools for:
- OS patch management (on-demand patch jobs and scheduled patch deployments)
- OS inventory (what OS version, packages, and software are installed)
- OS configuration policy (enforcing desired state, commonly via OS policies / OS policy assignments)
Core capabilities
- Target groups of VMs using zones/regions, instance names, or labels
- Execute patch operations across fleets with reporting and success/failure status
- Collect and view inventory metadata from the guest OS
- Enforce baseline configuration (for example, “package X must be installed”, “service Y must be running”)
- Provide API-driven automation via the OS Config API, enabling Infrastructure-as-Code and CI/CD integration
Major components
- Google Cloud Console (VM Manager UI): Human-friendly workflow for patching and viewing inventory.
- OS Config API: The control plane API that schedules patching, inventory collection, and policy evaluation.
- OS Config agent: Guest agent that runs inside the VM to apply patches, report results, and collect inventory.
- IAM + Audit Logs: Controls who can run patch jobs, view inventory, and change policy; logs changes.
Service type
- Managed control plane + guest agent model:
- The control plane is hosted and managed by Google.
- The agent runs in your VM and executes OS-level actions locally (package manager, Windows Update, file/service state).
Scope (how it’s organized)
- Project-scoped: You manage VM Manager operations within a Google Cloud project.
- Fleet-wide across zones/regions in a project: You can target VMs across multiple zones.
- Control plane is effectively global: You access it via APIs/Console; your VMs can be in any supported region/zone.
How it fits into the Google Cloud ecosystem
VM Manager sits in the Compute operational layer:
- Works directly with Compute Engine instances (and their labels/service accounts)
- Complements monitoring agents like the Google Cloud Ops Agent (VM Manager is about OS management; Ops Agent is about telemetry)
- Supports governance with IAM, Cloud Audit Logs, and Cloud Logging sinks
- Helps security posture by reducing patch lag and improving inventory evidence
3. Why use VM Manager?
Business reasons
- Reduce risk and downtime: Faster response to security advisories and critical vulnerabilities.
- Audit readiness: Inventory and patch reports help demonstrate control for internal governance and external audits.
- Operational efficiency: Centralize patch orchestration and reduce manual SSH/RDP work.
Technical reasons
- Fleet targeting: Use labels and projects to apply actions to hundreds/thousands of VMs consistently.
- Repeatable operations: Schedule recurring patch deployments and standardize configuration.
- API-driven automation: Integrate patching into change management pipelines.
Operational reasons
- Visibility into success/failure: Get job-level and per-VM results rather than guessing what updated.
- Controlled rollout: Reduce blast radius by patching canaries first and scaling out.
- Reduced toil: Minimize ad hoc patch scripts and “snowflake” VM handling.
Security/compliance reasons
- Least-privilege operations with IAM: Separate “who can execute patch jobs” from “who can view results”.
- Centralized auditing: Use Cloud Audit Logs to track patch job creation and configuration changes.
- Baseline enforcement: OS policies can ensure required packages/services/files are present.
Scalability/performance reasons
- Designed for fleets: Avoid serial manual patching. Use orchestration built for many instances.
- Minimal control-plane overhead: VM actions happen locally on VMs; the control plane coordinates.
When teams should choose VM Manager
Choose VM Manager when you: – Run Compute Engine VM fleets and need patch/inventory at scale – Need scheduled patch windows and consistent reporting – Want a Google-native approach instead of operating a separate patch orchestration platform – Prefer managing VMs with labels and projects rather than per-VM scripts
When teams should not choose VM Manager
VM Manager may not be the best fit when: – You use immutable infrastructure exclusively (rebuild images and replace instances, never patch in-place) – Your workloads primarily run in GKE (use container image scanning/patching and node management instead) – You need advanced cross-platform configuration management beyond OS-level policies (you may prefer Ansible/Chef/Puppet/Salt for richer state models and app-layer orchestration) – Your OS/distribution is not supported by OS Config agent (verify supported OS list in official docs)
4. Where is VM Manager used?
Industries
- Finance and insurance (strict patch SLAs, audit trails)
- Healthcare (compliance evidence, change windows)
- Retail and e-commerce (large web/app fleets with controlled maintenance)
- SaaS and technology (DevOps automation, fleet standardization)
- Public sector (central governance and reporting)
Team types
- Platform engineering teams managing shared VM fleets
- SRE/operations teams owning uptime and patch cadence
- Security engineering teams measuring patch compliance and inventory
- DevOps teams integrating patching into release/change workflows
- IT operations teams managing Windows and Linux servers in the cloud
Workloads
- Web/app tiers on Compute Engine
- Batch processing and schedulers
- Windows line-of-business apps
- Bastion/jump hosts (hardened and tightly controlled)
- Multi-tier enterprise apps with patch windows
Architectures and contexts
- Multi-zone regional deployments with maintenance windows
- Shared VPC environments (central network, multiple service projects)
- Organizations with separate dev/test/prod projects
- Mixed OS fleets (Linux + Windows)
Production vs dev/test usage
- Production: Emphasis on maintenance windows, canary rollouts, strong reporting, and change control.
- Dev/test: Faster cadence, more frequent patching, experimentation with OS policies and automation.
5. Top Use Cases and Scenarios
Below are realistic VM Manager use cases commonly seen in Google Cloud Compute environments.
1) Monthly patch window for production web fleet
- Problem: Production VMs must patch monthly during a maintenance window.
- Why VM Manager fits: Scheduled patch deployments + label targeting + centralized results.
- Example: Patch all VMs with label
env=prodandrole=webevery Sunday 02:00–04:00.
2) Emergency out-of-band patch for a critical CVE
- Problem: A critical vulnerability requires immediate patching across internet-facing VMs.
- Why VM Manager fits: Run an ad hoc patch job now, track failures quickly.
- Example: Patch all VMs in
dmzsubnet labels within 2 hours, then re-run for failures.
3) Patch compliance reporting for audit
- Problem: Auditors ask for evidence of patching and OS baselines.
- Why VM Manager fits: Central patch job history + per-instance results + inventory.
- Example: Export patch job logs to BigQuery for monthly compliance reporting.
4) Inventory discovery after migration to Google Cloud
- Problem: After a lift-and-shift migration, the team doesn’t know what packages are installed.
- Why VM Manager fits: OS inventory provides package lists and OS details at scale.
- Example: Identify which VMs still run older OpenSSL packages post-migration.
5) Enforce baseline packages and services
- Problem: Security baseline requires certain agents/tools installed (for example, endpoint monitoring).
- Why VM Manager fits: OS policy assignments enforce a desired state.
- Example: Ensure a logging agent package is installed and a service is running on all prod VMs.
6) Canary patch rollout
- Problem: Patching all VMs at once risks widespread outage.
- Why VM Manager fits: Target small canary label first; then broaden scope.
- Example: Patch
canary=trueVMs on Friday, evaluate, then patch the rest Saturday.
7) Standardize OS configuration for golden image drift control
- Problem: Teams patch and change VMs manually, causing configuration drift.
- Why VM Manager fits: OS policies re-apply desired state and reduce drift.
- Example: Enforce that SSH settings and a required config file exist on every VM.
8) Patch management for Windows fleets
- Problem: Windows Update coordination is inconsistent across many VMs.
- Why VM Manager fits: Central patch orchestration and reporting for Windows updates.
- Example: Apply security updates to all Windows Server instances in a maintenance window with reboots controlled.
9) Separate duties: security defines policy, ops executes patching
- Problem: Need separation of duties for compliance.
- Why VM Manager fits: IAM roles can separate “define deployment” from “execute job”.
- Example: Security team creates patch deployments; operations runs emergency patch jobs.
10) Reduce NAT/egress surprises by controlling patch source
- Problem: OS updates cause unexpected outbound data and costs.
- Why VM Manager fits: You can design patching schedules and repositories; VM Manager orchestrates timing.
- Example: Patch at off-peak hours and use internal mirrors where appropriate (architecture-dependent).
6. Core Features
Feature availability can evolve; confirm details in official docs when implementing.
1) Patch jobs (on-demand patch execution)
- What it does: Executes a patch operation immediately against targeted VMs.
- Why it matters: Useful for emergency fixes or one-time maintenance.
- Practical benefit: Centralized execution and per-VM results without manual login.
- Limitations/caveats:
- Requires a supported OS and a functioning OS Config agent.
- Patching can require reboots; careless settings can cause downtime.
- VMs must have network access to required update repositories (or configured internal mirrors).
2) Patch deployments (scheduled patching)
- What it does: Defines a recurring schedule (for example, weekly or monthly) for patching a target set of VMs.
- Why it matters: Enables routine patch hygiene and predictable maintenance.
- Practical benefit: Standardizes patch cadence with reporting.
- Limitations/caveats:
- Requires careful coordination with application maintenance windows.
- Evaluate impact on managed instance groups and autoscaling (patching during scale events can complicate outcomes).
3) Targeting by labels, zones, and instance selection
- What it does: Lets you patch or enforce policy on a defined fleet segment.
- Why it matters: Precision reduces blast radius.
- Practical benefit: Operational workflows map cleanly to labels like
env,tier,owner,patch_group. - Limitations/caveats:
- Label hygiene becomes critical; inconsistent labels create gaps.
- Consider Shared VPC and multi-project segmentation strategies.
4) OS inventory management
- What it does: Collects OS and software inventory data from VMs (for example, OS version and installed packages).
- Why it matters: You can’t manage what you can’t measure—inventory helps with audits and vulnerability response.
- Practical benefit: Quickly locate where a specific package/version exists across fleets.
- Limitations/caveats:
- Inventory is typically “best effort” and depends on agent health.
- Inventory does not automatically equal vulnerability assessment; it’s data you can use in your security processes.
5) OS policies / OS policy assignments (desired state enforcement)
- What it does: Enforces baseline configuration such as packages, repositories, files, and services.
- Why it matters: Reduces configuration drift and strengthens compliance.
- Practical benefit: Codify baselines (“these packages must be installed”) and apply consistently.
- Limitations/caveats:
- Policy design must account for OS differences and app constraints.
- Avoid overly broad policies that can break workloads.
- If you previously used “guest policies,” verify current recommended approach in official docs.
6) Centralized reporting and job status
- What it does: Shows patch job progress and per-instance outcomes (success, failure, not applicable).
- Why it matters: Operations needs actionable results, not just “we ran updates.”
- Practical benefit: Faster remediation: rerun jobs for failed instances only.
- Limitations/caveats:
- Failures often require OS-level troubleshooting (package locks, repo issues).
- Reports depend on agent communication and VM reachability.
7) API-first automation (OS Config API)
- What it does: Enables programmatic creation of patch jobs/deployments and policy assignments.
- Why it matters: Integrates with CI/CD, change management, and ChatOps.
- Practical benefit: Repeatable workflows; easy to standardize across projects.
- Limitations/caveats:
- Requires careful IAM and change controls.
- Consider rate limits/quotas (verify in official docs).
8) Integration with IAM and Cloud Audit Logs
- What it does: Controls who can manage patching/inventory/policies; logs administrative changes.
- Why it matters: Supports governance and investigations.
- Practical benefit: Clear accountability and traceability.
- Limitations/caveats:
- Ensure least privilege; “Project Owner” is convenient but not a best practice for production.
7. Architecture and How It Works
High-level architecture
VM Manager uses a control-plane + agent pattern:
- An operator (or automation) defines a patch job or policy in the Console or via OS Config API.
- The OS Config control plane identifies target instances based on project/zone/labels.
- Each VM’s OS Config agent receives instructions, performs OS-level actions (apt/yum/zypper or Windows Update), and reports status back.
- Results are visible in the Console and available via APIs; administrative actions are captured in Cloud Audit Logs.
Request/data/control flow
- Control flow: Operator → OS Config API → orchestration → OS Config agent → OS actions.
- Data flow (inventory): OS Config agent → OS Config API → inventory stored/queried → Console/API consumers.
- Results flow: Agent → OS Config API → patch job results → Console/reporting/logs.
Integrations with related services
- Compute Engine: Instances, labels, service accounts, metadata.
- Cloud Logging: Operational logs (and optional sinks to BigQuery/Cloud Storage/Pub/Sub).
- Cloud Audit Logs: Tracks who created/modified patch deployments and policies.
- VPC networking: Internet access or Private Google Access paths for agent communication and package repositories.
- Cloud NAT (common): Enables private VMs to reach update repos and Google APIs without external IPs.
Dependency services
- OS Config API (must be enabled)
- Compute Engine API (for instance discovery and targeting)
- Guest environment inside VM (supported OS + OS Config agent + package manager availability)
Security/authentication model
- Admin/operator access: IAM controls access to OS Config resources and Compute resources.
- VM agent authentication: The OS Config agent uses the VM’s identity and connectivity to communicate with OS Config endpoints. The exact permission model can vary by configuration; validate with official docs for your environment.
Networking model
VMs need:
– Connectivity to Google APIs used by OS Config (for example, osconfig.googleapis.com)—often via internet, Private Google Access, or controlled egress.
– Connectivity to OS package repositories:
– Debian/Ubuntu apt mirrors
– RHEL/CentOS yum/dnf repositories
– SUSE repositories
– Windows Update endpoints (or your organization’s update infrastructure)
Monitoring/logging/governance considerations
- Export patch job outcomes and administrative activity to centralized logging.
- Use labels for ownership and environment segmentation.
- Consider org policies and change approvals for who can execute fleet-wide patch jobs.
Simple architecture diagram
flowchart LR
Admin[Operator / CI Pipeline] -->|Console / OS Config API| OSConfig[OS Config control plane\n(VM Manager workflows)]
OSConfig -->|Patch instructions| Agent1[OS Config agent\non VM A]
OSConfig -->|Patch instructions| Agent2[OS Config agent\non VM B]
Agent1 -->|apt/yum/Windows Update| Repos[OS update repositories]
Agent2 -->|apt/yum/Windows Update| Repos
Agent1 -->|Status + inventory| OSConfig
Agent2 -->|Status + inventory| OSConfig
OSConfig --> Results[VM Manager reports\n(Console/API)]
Production-style architecture diagram
flowchart TB
subgraph Org[Google Cloud Organization]
subgraph NetProj[Network Project (Shared VPC)]
VPC[VPC + Subnets]
NAT[Cloud NAT / Egress Controls]
PGA[Private Google Access (optional)]
end
subgraph ProdProj[Production Service Project]
CE[Compute Engine VM fleets\n(labels: env=prod, app=...)]
OSConf[OS Config API\n(VM Manager control plane)]
Logs[Cloud Logging + Audit Logs]
Sink[Log Sink to BigQuery/Storage/PubSub]
IAM[IAM Roles + Service Accounts]
end
end
Admin[Ops/Sec/Automation] --> OSConf
OSConf --> CE
CE --> NAT
CE --> PGA
NAT --> Repos[External OS repos\nor internal mirrors]
PGA --> GoogleAPIs[Google APIs\n(osconfig.googleapis.com)]
OSConf --> Logs
Logs --> Sink
IAM --> OSConf
IAM --> CE
8. Prerequisites
Before you start using VM Manager in Google Cloud Compute, ensure the following.
Account/project requirements
- A Google Cloud account with access to a project.
- Billing enabled on the project (Compute Engine resources require billing).
Permissions / IAM roles
For a hands-on lab, the simplest is: – Project Owner on a dedicated sandbox project.
For production, use least privilege. Common role patterns include: – Roles to manage OS Config/VM Manager resources (patch jobs/deployments, policies) – Roles to view inventory and results – Roles to view/operate Compute Engine instances as needed
Role names and exact permissions can change; verify the recommended IAM roles in the official VM Manager / OS Config docs: – https://cloud.google.com/compute/docs/osconfig
APIs to enable
- Compute Engine API
- OS Config API
You can enable them via Console or gcloud.
Tools
- Google Cloud SDK (
gcloud) - SSH client (built-in via
gcloud compute sshis sufficient) - Optional:
curlfor API calls
Region availability
- Compute Engine is regional/zonal.
- OS Config/VM Manager control plane is accessed globally; availability depends on Google Cloud service availability. Verify if you have strict region residency requirements.
Quotas/limits
- Compute Engine quotas (vCPUs, instances, disks)
- OS Config/VM Manager quotas (patch job/deployment API limits, agent interactions). Verify in official docs if you expect very large fleets or heavy automation.
Prerequisite services/config
- VM instances must run a supported OS and have the OS Config agent installed and running.
- VMs must have outbound access to:
- OS update repositories (or internal mirrors)
- Google APIs required by OS Config (direct internet, Private Google Access, or controlled egress)
9. Pricing / Cost
Pricing model (what you pay for)
VM Manager functionality itself is generally treated as part of Google Cloud’s management capabilities for Compute Engine and is commonly documented as no additional charge beyond the resources it manages. However, pricing and billing SKUs can evolve—verify current billing behavior in official docs.
Start with: – VM Manager / OS Config documentation: https://cloud.google.com/compute/docs/osconfig – Compute Engine pricing: https://cloud.google.com/compute/pricing – Google Cloud Pricing Calculator: https://cloud.google.com/products/calculator
Pricing dimensions to understand
Even if VM Manager does not add a direct fee, patching and inventory drive costs indirectly:
-
Compute Engine runtime – Patching consumes CPU, memory, and disk I/O. – Reboots can cause availability impacts and sometimes additional operational overhead.
-
Network egress and NAT – Downloading OS packages can generate outbound traffic. – Private VMs commonly use Cloud NAT; NAT has its own cost model and can become a cost driver.
-
Storage and snapshots (optional) – Best practice often includes backups/snapshots before major patching—snapshots cost money.
-
Cloud Logging – Patch results and operational logs may increase logging volume (especially with large fleets). – Exporting logs to BigQuery/Storage has additional costs.
-
Internal mirrors/repositories (architecture-dependent) – Hosting apt/yum mirrors or Windows update infrastructure has compute/storage/network costs.
Free tier
- Compute Engine has specific free tier/always free options in some regions for certain small VM types, but eligibility changes and is region-specific. Verify on the official pricing page:
- https://cloud.google.com/free
- https://cloud.google.com/compute/pricing
Cost drivers (what increases spend)
- Fleet size and patch frequency
- Download volume (large updates, many packages)
- Using Cloud NAT heavily for updates
- High logging verbosity and long retention
- Frequent snapshots or large disk sizes
Hidden/indirect costs to plan for
- Downtime cost if patching triggers reboots unexpectedly
- Incident response cost if patching breaks dependencies
- Change management overhead for regulated industries
- Extra capacity if you need redundancy during patch windows
How to optimize cost
- Prefer label-based targeting to avoid patching everything unnecessarily.
- Use canary groups to reduce failed broad rollouts.
- Patch during off-peak to reduce performance impact and avoid scaling surges.
- For large fleets, consider internal repository mirrors (balanced against operational cost).
- Control logging retention and use log sinks carefully.
Example low-cost starter estimate (conceptual)
A minimal lab typically uses: – One small VM (for example, an e2-small class instance) for 30–60 minutes – Standard persistent disk – Minimal network traffic for package updates – Limited logs
Exact pricing depends on region, machine type, disk size, and traffic. Use: – https://cloud.google.com/products/calculator
Example production cost considerations
In production (hundreds/thousands of VMs): – Cloud NAT and egress can become significant during patch windows. – Logging volume can grow quickly if you centralize results for compliance. – You may budget for pre-patch snapshots and rollback capacity.
10. Step-by-Step Hands-On Tutorial
This lab is designed to be safe and low-cost, while still being “real”: you will create a VM, confirm the OS Config agent, view inventory, run a patch job, and verify results.
Objective
Use VM Manager in Google Cloud to: 1. Create a small Compute Engine VM with a patch target label. 2. Confirm OS Config/VM Manager prerequisites. 3. View OS inventory. 4. Run an on-demand patch job and review results.
Lab Overview
- Cloud: Google Cloud
- Category: Compute
- Service: VM Manager (via OS Config)
- Resources created:
- 1 Compute Engine VM
- Optional firewall rules are not required for this lab
- Estimated time: 30–60 minutes
- Cost: Mostly the VM runtime + minimal network; keep VM small and delete after.
Step 1: Create/select a project and enable required APIs
Option A: Console
- In Google Cloud Console, select or create a project.
- Ensure billing is enabled.
- Go to APIs & Services → Library.
- Enable: – Compute Engine API – OS Config API
Expected outcome: APIs are enabled for the project.
Option B: gcloud
gcloud auth login
gcloud config set project YOUR_PROJECT_ID
gcloud services enable compute.googleapis.com
gcloud services enable osconfig.googleapis.com
Expected outcome: The APIs are enabled without errors.
Step 2: Create a small VM and add patch targeting labels
Create a Debian-based VM (common for labs). Choose a region/zone close to you.
export ZONE=us-central1-a
export VM_NAME=vmmanager-lab-1
gcloud compute instances create "$VM_NAME" \
--zone="$ZONE" \
--machine-type=e2-small \
--image-family=debian-12 \
--image-project=debian-cloud \
--boot-disk-size=20GB \
--labels=env=lab,patch_group=demo \
--scopes=https://www.googleapis.com/auth/cloud-platform
Expected outcome:
– The VM is created.
– It has labels env=lab and patch_group=demo.
Why the scope flag matters: Some guest agents and Google APIs rely on VM OAuth scopes for access tokens. Using cloud-platform avoids scope-related failures in many labs. In production, you should tighten scopes and IAM where possible—validate with your organization’s standards.
Step 3: Verify the OS Config agent is installed and running
Many Google-provided images include the OS Config agent by default, but you should verify.
-
SSH into the VM:
bash gcloud compute ssh "$VM_NAME" --zone="$ZONE" -
On the VM, check for the agent service.
For Debian/Ubuntu, try:
sudo systemctl status google-osconfig-agent || true
If the service exists and is active, you should see active (running).
If it’s not found, the agent may not be installed. Exit SSH and consult the official installation instructions for your OS: – OS Config docs: https://cloud.google.com/compute/docs/osconfig
Expected outcome: OS Config agent is running.
Notes: – If your VM does not have outbound connectivity (no external IP and no NAT/Private Google Access), the agent may run but fail to communicate. You’ll fix that by ensuring proper egress in real environments.
Step 4: View OS inventory in VM Manager
- In Google Cloud Console, go to: – Compute Engine → VM Manager
- Open Inventory (wording can vary slightly by Console updates).
- Filter or search for your instance name:
vmmanager-lab-1.
Expected outcome: – You can see inventory details for the VM (OS details and installed packages/software categories depending on what’s collected).
Verification tip: – If inventory is empty, wait a few minutes and refresh. – If it never appears, troubleshoot agent health and connectivity (see Troubleshooting section).
Step 5: Run an on-demand patch job (VM Manager)
You can do this entirely in the Console (recommended for beginners).
- Go to Compute Engine → VM Manager → Patching.
- Choose Create patch job (or similar).
- Configure:
– Name:
patchjob-lab-1– Instance filter / Target: by labelenv=labpatch_group=demo- Patch type: choose the defaults appropriate for your OS (for Debian, this uses apt under the hood).
- Reboot options: choose a safe option for a lab.
- For example, reboot if required, or never reboot (behavior depends on OS and settings).
- Start the patch job.
Expected outcome: – Patch job enters a running state. – The VM reports progress and eventually success/failure.
What is happening: The OS Config control plane instructs the OS Config agent to run OS package updates locally, then report results back.
Step 6: Review patch job results
- Open the patch job details in VM Manager.
- Review per-VM results: – Success / failure – Any error messages – Whether a reboot was required/performed (if shown)
Expected outcome: – Your VM shows a completed status. – If it failed, you have an error message to act on.
Step 7 (Optional): Validate from inside the VM
SSH back in and confirm package updates occurred.
gcloud compute ssh "$VM_NAME" --zone="$ZONE"
On Debian/Ubuntu, you can inspect APT history:
sudo tail -n 50 /var/log/apt/history.log || true
You can also run:
sudo apt-get update
sudo apt-get -s upgrade | head -n 40
Expected outcome: – You see evidence of updates in logs or fewer pending upgrades in the simulated output.
Validation
Use this checklist:
-
VM exists and is labeled correctly:
bash gcloud compute instances describe "$VM_NAME" --zone="$ZONE" \ --format="value(labels)" -
OS Config agent is running:
bash gcloud compute ssh "$VM_NAME" --zone="$ZONE" --command \ "systemctl is-active google-osconfig-agent && systemctl status google-osconfig-agent --no-pager | head" -
Inventory appears in Console under VM Manager.
- Patch job completes and shows per-VM results in VM Manager.
Troubleshooting
Common issues and realistic fixes:
-
Inventory never appears – Causes:
- OS Config agent not installed/running.
- VM has no egress path to Google APIs (osconfig endpoints).
- OS is unsupported.
- Fix:
- Confirm agent service status (
systemctl status google-osconfig-agent). - Ensure VM can reach Google APIs (internet, Cloud NAT, or Private Google Access depending on design).
- Verify supported OS list in official docs: https://cloud.google.com/compute/docs/osconfig
-
Patch job fails with repository/network errors – Causes:
- No access to Debian/Ubuntu mirrors (or relevant repos).
- DNS or firewall restrictions.
- Fix:
- Check DNS and outbound rules.
- If VM has no external IP, confirm Cloud NAT configuration (production pattern).
- Consider internal mirrors for locked-down environments.
-
Patch job reports package manager lock – Causes:
- Another process is running apt/dpkg/yum.
- Fix:
- Wait and re-run, or investigate running processes.
- Avoid overlapping patch jobs.
-
Unexpected reboot or service disruption – Causes:
- Reboot settings or package updates that require restart.
- Fix:
- Use maintenance windows.
- Canary first; scale out gradually.
- Test in staging and use app-level health checks.
-
Permissions errors in Console – Causes:
- Insufficient IAM permissions to create patch jobs or view inventory.
- Fix:
- For lab: use Project Owner on a sandbox project.
- For production: grant appropriate OS Config roles (verify current role names in docs).
Cleanup
To avoid ongoing charges, delete the VM:
gcloud compute instances delete "$VM_NAME" --zone="$ZONE"
Optional: If you created a dedicated project for this lab, delete the entire project (most reliable cleanup).
11. Best Practices
Architecture best practices
- Prefer labels as the primary targeting mechanism (
env,app,tier,patch_group,owner). - Use canary + phased rollout:
- Patch a small group first, validate, then expand.
- Align patch scheduling with:
- Application maintenance windows
- Load balancer draining strategies
- Managed instance group update strategies (if used)
IAM/security best practices
- Avoid broad roles in production. Prefer:
- Separate roles for “create/modify patch deployments” vs “execute patch job”.
- Separate roles for “view inventory” vs “change policy”.
- Use dedicated automation identities (service accounts) for scheduled operations.
- Require approvals for fleet-wide patch actions (process + IAM).
Cost best practices
- Plan patch windows to avoid triggering unnecessary autoscaling.
- For private fleets, evaluate Cloud NAT and egress cost during patch bursts.
- Control logging:
- Keep only necessary fields
- Set retention policies appropriate for compliance
- Export summaries rather than all verbose logs when possible
Performance best practices
- Avoid running patch jobs during peak traffic.
- Stagger patching across zones/tiers to keep capacity.
- Watch for disk space constraints on
/varor package caches.
Reliability best practices
- Use multi-zone designs so you can patch one zone while others serve traffic.
- Maintain rollback options:
- Snapshots (where appropriate)
- Immutable rebuild strategy for critical tiers
- Treat patching as a change with testing and validation.
Operations best practices
- Track patch SLAs and failures:
- Re-run patch jobs for failed instances only
- Establish a runbook for common failures
- Document exemptions (where patching must be delayed) with time-bound exceptions.
Governance/tagging/naming best practices
- Standardize labels:
env=dev|test|prodapp=<name>owner=<team>patch_group=<group>compliance=<regime>- Standardize patch deployment names:
pd-<env>-<app>-monthly- Use separate projects (or folders) for lifecycle separation (dev/test/prod).
12. Security Considerations
Identity and access model
- VM Manager administrative actions are controlled via IAM.
- Prefer:
- Least privilege roles
- Separate identities for human operators vs automation
- Organization policies to restrict who can create service accounts or assign broad roles
Encryption
- VM Manager control plane uses Google Cloud’s standard encryption for data in transit and at rest (platform behavior).
- Your VM disk encryption follows Compute Engine settings (Google-managed keys by default; CMEK optional depending on requirements).
Network exposure
- Patching requires outbound connectivity to repositories and Google APIs.
- Avoid giving every VM a public IP if not required:
- Use Cloud NAT for outbound internet access from private VMs
- Use Private Google Access for Google APIs where applicable
- Restrict outbound traffic with firewall policies and/or secure egress designs where needed, but ensure required endpoints remain reachable.
Secrets handling
- VM Manager is not a secrets manager.
- Don’t store credentials in scripts or OS policies.
- Use Secret Manager for secrets and retrieve them securely at runtime (application-level pattern).
Audit/logging
- Ensure Cloud Audit Logs are enabled and retained appropriately:
- Track creation/modification of patch deployments and policies
- Centralize logs with sinks for compliance:
- BigQuery for reporting
- Cloud Storage for archival
- Pub/Sub for automation triggers
Compliance considerations
- VM Manager can support compliance initiatives by providing:
- Evidence of patch operations
- Inventory evidence of installed software
- Compliance still requires:
- Defined policy (what to patch, when, and acceptable exceptions)
- Evidence retention and reporting workflows
- Change management approvals and documentation
Common security mistakes
- Running patch jobs as broad “Owner” identities in production.
- Patching everything at once without canaries and maintenance windows.
- Allowing unmanaged outbound access while lacking visibility into what was downloaded.
- Assuming “inventory collected” equals “vulnerability-free.”
Secure deployment recommendations
- Combine VM Manager with:
- Centralized logging and audit retention
- Strong IAM role separation
- Network egress control (NAT, proxy, internal mirrors if needed)
- Regular compliance reporting and exception management
13. Limitations and Gotchas
These are common real-world issues to plan for (verify any environment-specific constraints in official docs).
- Supported OS constraints: OS Config agent and patching support specific OS distributions/versions. Unsupported OS images may not work.
- Agent dependency: Patch/inventory relies on the OS Config agent being installed, running, and able to communicate.
- Network dependency: If a VM can’t reach required repos or Google APIs, patching/inventory fails.
- Reboot behavior: Some patches require reboot; incorrect reboot settings can cause outage or incomplete patch state.
- Package manager locks: apt/dpkg/yum locks can cause patch failures.
- Repository configuration drift: Custom repos or pinned packages can cause inconsistent outcomes across “similar” VMs.
- Immutable vs in-place tension: If your organization practices immutable rebuilds, in-place patching may conflict with your standards.
- Windows specifics: Windows update behavior differs from Linux; patch timing and reboot behavior require careful scheduling.
- Scale and quotas: Large fleets may hit API limits or operational timeouts; design for batching and staged rollouts.
- Inventory ≠ vulnerability scanning: Inventory is data; vulnerability assessment and prioritization require additional security tooling/process.
14. Comparison with Alternatives
VM Manager is one option in a broader operations and configuration landscape.
| Option | Best For | Strengths | Weaknesses | When to Choose |
|---|---|---|---|---|
| Google Cloud VM Manager (OS Config) | Patching/inventory/baseline OS policy on Compute Engine | Google-native, label targeting, console + API, integrates with IAM/Audit Logs | Focused on OS-level management; relies on agent and network/repo access | You want Google Cloud–native OS management for Compute Engine fleets |
| Immutable infrastructure (golden images + instance replacement) | High-control environments with standardized builds | Predictable, reduces drift, easier rollback via redeploy | Requires mature image pipeline; may not fit legacy stateful servers | You can rebuild and redeploy instead of patching in-place |
| Configuration management tools (Ansible/Chef/Puppet/Salt) | Rich configuration state and app-layer automation | Very flexible; large ecosystem; cross-cloud | You operate the tooling; must design security and reporting | You need deeper configuration/app orchestration than VM Manager provides |
| Google Cloud Ops Agent + Monitoring workflows | Observability (metrics/logs/traces) | Great telemetry; integrates with alerting | Not a patch/config orchestrator | You need monitoring; pair it with VM Manager for OS management |
| AWS Systems Manager (Patch Manager, Inventory) | OS management on AWS EC2 | Mature OS management suite in AWS | Not applicable to Google Cloud VMs directly | You run fleets primarily in AWS |
| Azure Update Manager / Azure Automation | OS patching for Azure VMs | Integrated with Azure governance | Not applicable to Google Cloud VMs directly | You run fleets primarily in Azure |
| WSUS / on-prem update infrastructure | Windows-heavy enterprises with strict control | Central Windows update control | Operational overhead; hybrid complexity | You must tightly control Windows update sources in regulated environments |
15. Real-World Example
Enterprise example: regulated financial services patch governance
- Problem: A bank runs 2,000+ Linux and Windows VMs on Compute Engine. Regulators require monthly patching with evidence, plus emergency patch capability for critical CVEs.
- Proposed architecture:
- Separate prod and non-prod projects under an organization folder.
- VM Manager patch deployments scheduled per tier:
canarygroup first, thenprodweb/app tiers, then batch tiers.
- Label strategy:
env,tier,app,owner,patch_group,criticality. - Private subnets with Cloud NAT and controlled egress to approved repositories (or internal mirrors).
- Cloud Logging sinks export:
- OS Config patch job outcomes to BigQuery for compliance dashboards
- Audit Logs for change tracking
- Why VM Manager was chosen:
- Native integration with Compute Engine targeting and IAM.
- Central visibility and standardized job execution.
- Reduces need for custom patch orchestration tooling.
- Expected outcomes:
- Shorter patch cycles, measurable compliance posture
- Faster emergency response with repeatable runbooks
- Better audit evidence and reduced manual effort
Startup/small-team example: small SaaS fleet hygiene
- Problem: A startup runs ~25 VMs (web, workers, databases). They need basic patch hygiene without building a full configuration management platform.
- Proposed architecture:
- Use VM Manager to schedule weekly patch deployments for non-prod and monthly for prod.
- Canary patch on one VM per tier.
- Minimal logging: keep patch job results; alert on repeated failures via operational process.
- Why VM Manager was chosen:
- Low operational overhead; quick to adopt.
- Console workflow is easy for a small team; API is available later.
- Expected outcomes:
- Reduced vulnerability exposure
- Less time spent manually updating servers
- Clear view of patch failures and fleet status
16. FAQ
-
Is VM Manager a separate product from OS Config?
VM Manager is commonly presented as the operational “suite” in Compute Engine. Many capabilities are implemented via the OS Config API. Use VM Manager in the Console for workflows, and OS Config API for automation. -
Does VM Manager work for all Compute Engine OS images?
No. Support depends on OS Config agent support for specific OS distributions and versions. Verify supported OS lists in official docs. -
Do I need to install an agent?
Typically yes—VM Manager relies on the OS Config agent inside the VM. Some Google-provided images include it by default, but you should verify. -
Can VM Manager patch VMs with no external IP address?
Yes, if you provide outbound access via Cloud NAT and/or Private Google Access (and ensure access to OS repositories). The VM still must reach required endpoints. -
Can I patch only security updates?
Patch configuration can often distinguish patch classifications and package sources depending on OS. Verify current patch configuration options in the OS Config patch management docs. -
How do I avoid patching everything at once?
Use labels to define canary groups and schedule phased patching (canary first, then expand). Avoid broad “all instances” targeting. -
Will patching reboot my VM?
Some updates require reboot. VM Manager provides reboot behavior settings, but you must plan maintenance windows and redundancy. -
Can I patch Managed Instance Groups (MIGs) with VM Manager?
You can target MIG instances by label or other selectors. However, many teams prefer immutable rebuilds + rolling updates for MIGs. Choose the approach that matches your operational model. -
Does VM Manager replace Ansible/Chef/Puppet?
Not entirely. VM Manager focuses on OS patching, inventory, and baseline OS policies. Full configuration management tools provide broader app-layer orchestration. -
Can I export patch reports for compliance?
Yes. Combine patch job results visibility with Cloud Logging/Audit Logs exports (for example, to BigQuery). The exact export method depends on your logging strategy. -
How do I troubleshoot patch failures?
Start with patch job error details, then check VM-level logs (agent status, package manager logs, repository reachability, DNS). -
Can I run patching across multiple projects?
VM Manager operations are typically project-scoped. For multi-project operations, use consistent labeling, central automation, and appropriate IAM across projects. -
Is inventory real-time?
Inventory collection depends on agent reporting cadence and connectivity; it’s not typically real-time. Treat it as operational data, not instantaneous truth. -
Can I use VM Manager for application updates too?
VM Manager focuses on OS-level management. Application deployments are usually handled by CI/CD, configuration management, or orchestration systems. -
Is there a direct cost for VM Manager?
VM Manager is often documented as not having an extra charge, but you still pay for Compute, network, logging, and related resources. Always verify current billing terms in official docs.
17. Top Online Resources to Learn VM Manager
| Resource Type | Name | Why It Is Useful |
|---|---|---|
| Official documentation | VM Manager / OS Config documentation: https://cloud.google.com/compute/docs/osconfig | Primary source for supported OSes, agent behavior, patching and policy features |
| Official documentation | Compute Engine documentation: https://cloud.google.com/compute/docs | Context for VM lifecycle, networking, IAM, and operations |
| Official pricing | Compute Engine pricing: https://cloud.google.com/compute/pricing | Understand VM/disk/network costs that dominate patching operations |
| Official pricing tool | Google Cloud Pricing Calculator: https://cloud.google.com/products/calculator | Build region-specific estimates for VMs, NAT, logging exports |
| Official docs/tutorials | Google Cloud SDK install: https://cloud.google.com/sdk/docs/install | Required for gcloud workflows in this tutorial |
| Official reference | OS Config API reference (start from docs and API explorer): https://cloud.google.com/compute/docs/osconfig | Needed for automation at scale and CI/CD integration |
| Official learning | Google Cloud Skills Boost: https://www.cloudskillsboost.google | Hands-on labs (search for OS Config / patching / VM operations) |
| Official videos | Google Cloud Tech YouTube: https://www.youtube.com/@googlecloudtech | Product walkthroughs and best practices (search for OS Config / VM patching) |
| Community (reputable) | Google Cloud Architecture Center: https://cloud.google.com/architecture | Patterns for operations, governance, logging, and fleet management (not VM Manager-specific, but highly relevant) |
18. Training and Certification Providers
| Institute | Suitable Audience | Likely Learning Focus | Mode | Website URL |
|---|---|---|---|---|
| DevOpsSchool.com | DevOps engineers, SREs, platform teams | DevOps practices, cloud operations, automation fundamentals | Check website | https://www.devopsschool.com/ |
| ScmGalaxy.com | Beginners to intermediate engineers | SCM, DevOps tooling, CI/CD concepts | Check website | https://www.scmgalaxy.com/ |
| CLoudOpsNow.in | Cloud operations teams | Cloud operations, monitoring, operational readiness | Check website | https://www.cloudopsnow.in/ |
| SreSchool.com | SREs, reliability engineers | SRE practices, reliability engineering, incident response | Check website | https://www.sreschool.com/ |
| AiOpsSchool.com | Ops teams exploring AIOps | AIOps concepts, automation, operations analytics | Check website | https://www.aiopsschool.com/ |
19. Top Trainers
| Platform/Site | Likely Specialization | Suitable Audience | Website URL |
|---|---|---|---|
| RajeshKumar.xyz | DevOps/cloud training content (verify specifics) | Beginners to intermediate | https://rajeshkumar.xyz/ |
| devopstrainer.in | DevOps tooling and practices (verify specifics) | DevOps engineers | https://www.devopstrainer.in/ |
| devopsfreelancer.com | Freelance DevOps consulting/training offerings (verify specifics) | Teams needing targeted help | https://www.devopsfreelancer.com/ |
| devopssupport.in | DevOps support and training resources (verify specifics) | Operations/DevOps teams | https://www.devopssupport.in/ |
20. Top Consulting Companies
| Company Name | Likely Service Area | Where They May Help | Consulting Use Case Examples | Website URL |
|---|---|---|---|---|
| cotocus.com | Cloud/DevOps consulting (verify service catalog) | Cloud migrations, operations setup, automation | Designing patch governance, setting up fleet labels, operational runbooks | https://cotocus.com/ |
| DevOpsSchool.com | DevOps consulting and enablement | DevOps process/tooling adoption | Building standardized patch pipelines and compliance reporting workflows | https://www.devopsschool.com/ |
| DEVOPSCONSULTING.IN | DevOps consulting services (verify service catalog) | CI/CD, automation, operations | Integrating VM patching with change management and monitoring | https://www.devopsconsulting.in/ |
21. Career and Learning Roadmap
What to learn before VM Manager
- Compute Engine fundamentals:
- Instances, images, disks, metadata, instance groups
- Linux/Windows administration basics:
- Package managers (apt/yum/dnf/zypper), services, logging, reboots
- Networking essentials:
- VPC, subnets, routes, firewall rules, DNS
- Cloud NAT and Private Google Access concepts
- IAM fundamentals:
- Roles, service accounts, least privilege, audit logs
What to learn after VM Manager
- Fleet governance and automation:
- OS Config API automation patterns
- Policy-as-code and change management workflows
- Observability:
- Google Cloud Ops Agent, Cloud Monitoring alerting, log-based metrics
- Security posture:
- Vulnerability management lifecycle, asset inventory strategies, patch SLAs
- Advanced compute operations:
- Immutable images pipelines, rolling replacements, managed instance groups operations
Job roles that use it
- Cloud Engineer / Cloud Operations Engineer
- DevOps Engineer
- Site Reliability Engineer (SRE)
- Platform Engineer
- Security Engineer (vulnerability and patch governance)
- Systems Administrator (Linux/Windows) in cloud environments
Certification path (if available)
VM Manager itself isn’t typically a standalone certification topic, but it aligns strongly with:
– Google Cloud Associate Cloud Engineer
– Google Cloud Professional Cloud Architect
– Google Cloud Professional Cloud DevOps Engineer
Verify current certification outlines on the official site:
– https://cloud.google.com/learn/certification
Project ideas for practice
- Build a patch governance model for a 3-tier app:
- labels, canaries, schedules, reporting
- Create a compliance dashboard:
- export patch results to BigQuery and build a basic report
- Design private patching:
- private subnets + Cloud NAT + controlled egress + patch schedules
- Implement a baseline OS policy:
- enforce specific packages/services and validate drift handling
22. Glossary
- Compute Engine: Google Cloud service for running virtual machines.
- VM Manager: Compute-focused suite for OS patching, inventory, and configuration management across VM fleets.
- OS Config: The Google Cloud service/API that implements many VM Manager features.
- OS Config agent: Guest agent installed on the VM to execute patching/inventory/policy tasks.
- Patch job: An on-demand patch execution against selected VMs.
- Patch deployment: A scheduled/repeating patch configuration applied to a VM set.
- Inventory: Collected OS/software details from VMs (OS version, packages, etc.).
- Label: Key/value metadata attached to resources (like VMs) used for targeting and organization.
- Canary: A small subset of VMs patched first to validate safety before broad rollout.
- Cloud NAT: Managed NAT service enabling outbound internet access for private VMs.
- Private Google Access: Allows private VMs to reach Google APIs without external IPs (configuration-dependent).
- IAM: Identity and Access Management; controls who can do what in a project.
- Cloud Audit Logs: Logs of administrative actions and API access for governance and investigations.
23. Summary
VM Manager in Google Cloud (Compute) is the practical, fleet-oriented way to manage OS patching, OS inventory, and baseline OS configuration for Compute Engine VMs—primarily powered by the OS Config service and its guest agent.
It matters because patching and inventory are core operational controls: they reduce vulnerability exposure, support compliance evidence, and cut manual toil. VM Manager fits best when you operate VM fleets and need centralized, repeatable OS operations with IAM controls and auditability.
Cost-wise, the biggest drivers are usually Compute runtime, network egress/NAT, and logging volume—not a separate “VM Manager license.” Security-wise, the most important design points are least-privilege IAM, controlled rollout (canary/maintenance windows), and ensuring secure, controlled outbound access to required repositories and Google APIs.
Next learning step: move from ad hoc patch jobs to scheduled patch deployments, define a label taxonomy, and (when ready) automate workflows using the OS Config API starting from https://cloud.google.com/compute/docs/osconfig.