Google Cloud Pub/Sub Tutorial: Architecture, Pricing, Use Cases, and Hands-On Guide for Data analytics and pipelines

Category

Data analytics and pipelines

1. Introduction

Pub/Sub is Google Cloud’s fully managed asynchronous messaging service for event ingestion and distribution. It’s commonly used as the “front door” for streaming data analytics and pipelines, and as the backbone for event-driven architectures.

In simple terms: producers publish messages to a topic, and one or more consumers receive those messages through subscriptions. Producers and consumers don’t need to know about each other, and they can scale independently.

Technically, Pub/Sub implements a durable publish/subscribe messaging pattern with features such as push or pull delivery, message retention, ordering (when enabled), filtering, dead-letter topics, and integrations across Google Cloud services. It is designed for high-throughput, low-latency event ingestion and fan-out, supporting many-to-many communication.

Pub/Sub solves common problems in data analytics and pipelines such as: – Decoupling data producers from downstream processing systems – Buffering bursts of events while consumers scale out – Reliably delivering event streams to multiple consumers (fan-out) – Building resilient ETL/ELT streaming pipelines (e.g., into BigQuery via Dataflow) – Enabling event-driven microservices without running messaging infrastructure

Naming note: You may still see “Cloud Pub/Sub” in older articles and code samples. The product is commonly referenced as “Pub/Sub” in Google Cloud documentation and the console. Verify current naming in the official docs if you are aligning to internal standards: https://cloud.google.com/pubsub/docs/overview


2. What is Pub/Sub?

Pub/Sub is a managed messaging service on Google Cloud for ingesting and delivering event streams. Its official purpose is to let applications publish events (messages) and deliver them asynchronously to subscribers, with durable storage for a configurable retention window and scalable delivery semantics.

Core capabilities

  • Publish messages to a topic (events can include a payload and attributes)
  • Deliver messages to subscribers via:
  • Pull subscriptions (subscribers pull messages)
  • Push subscriptions (Pub/Sub pushes to an HTTPS endpoint)
  • Scale to high throughput and large numbers of publishers/subscribers
  • Support common resilience patterns: retries, dead-letter topics, replay/seek (snapshots), and buffering
  • Integrate with many Google Cloud services (Dataflow, Cloud Run, Cloud Functions, BigQuery patterns, Logging/Monitoring, IAM)

Major components

  • Topic: A named resource that accepts published messages.
  • Subscription: A named resource attached to a topic that delivers that topic’s messages to subscribers.
  • Publisher: Any client that sends messages to a topic.
  • Subscriber: Any client that receives and processes messages from a subscription.
  • Push endpoint (push subscriptions): A publicly reachable HTTPS service (often Cloud Run) that receives POST requests.
  • Dead-letter topic (DLT/DLQ): A topic to receive messages that can’t be processed successfully after a configured number of delivery attempts.
  • Snapshot: A point-in-time state of a subscription, used for replay/seek workflows.
  • Schema (optional): A definition for message validation (e.g., Avro/Protocol Buffers). Verify current schema support details in docs: https://cloud.google.com/pubsub/docs/schemas

Service type

  • Fully managed, serverless messaging service (you don’t manage brokers, partitions, or clusters for standard Pub/Sub).

Scope and resource model

  • Pub/Sub resources are project-scoped in Google Cloud.
  • Topics and subscriptions are global resources within a project in the sense that you don’t pick a single VM zone for them; however, Pub/Sub provides controls such as message storage policies (data residency constraints) and may offer regional behaviors/endpoints depending on configuration. Confirm the latest location and residency semantics in official docs: https://cloud.google.com/pubsub/docs/locations

How it fits into Google Cloud

In the Google Cloud ecosystem, Pub/Sub often sits between: – Ingestion sources: applications, IoT devices, logs/telemetry collectors, Cloud Run services, on-prem systems – Stream processing: Dataflow (Apache Beam), Dataproc (Spark), custom consumers on GKE/Compute Engine – Storage/analytics sinks: BigQuery, Cloud Storage, Bigtable, Spanner, Elasticsearch (self-managed), third-party platforms

For Data analytics and pipelines, Pub/Sub is frequently the ingestion layer that decouples upstream event production from downstream transformations and analytics.


3. Why use Pub/Sub?

Business reasons

  • Faster delivery of data products: teams can add new consumers without changing producer code.
  • Reduced operational burden: no need to operate a messaging cluster for many common scenarios.
  • Improved resilience: messages can buffer during spikes or downstream outages, reducing data loss risk.

Technical reasons

  • Decoupling: producers and consumers evolve independently.
  • Fan-out: multiple subscriptions can receive the same topic’s events for different use cases (analytics, monitoring, ML features, auditing).
  • Backpressure handling: retention and subscriber scaling help absorb bursty traffic.
  • Delivery controls: acknowledgments, retries, dead-letter topics, message filtering, ordering (when enabled).

Operational reasons

  • Elastic scaling: handle bursty ingestion and variable consumption rates.
  • Managed reliability: reduces the complexity of patching, scaling, and monitoring broker fleets.
  • Observability hooks: integrates with Cloud Monitoring metrics and Cloud Logging / Audit Logs.

Security/compliance reasons

  • IAM-based access control for topics and subscriptions.
  • Encryption at rest and in transit (Google-managed by default; CMEK options may be available—verify in docs).
  • Auditability via Cloud Audit Logs for admin and data access patterns (depending on configuration and log types enabled).
  • Data residency controls via storage policies (confirm details in official docs).

Scalability/performance reasons

  • Designed for high-throughput ingestion and delivery.
  • Supports parallelism through multiple subscribers and flow control settings in client libraries.

When teams should choose it

Choose Pub/Sub when you need: – Asynchronous event ingestion for streaming pipelines – Decoupled microservices and event-driven workflows – Multi-consumer fan-out – Managed messaging without running brokers

When teams should not choose it

Avoid or reconsider Pub/Sub when: – You need strict exactly-once processing end-to-end across complex pipelines without carefully designed idempotency (Pub/Sub can offer exactly-once delivery for some subscription modes, but “exactly-once processing” still requires application design—verify latest constraints in docs). – You require very long-term storage of messages as a system of record (use BigQuery/Cloud Storage instead; Pub/Sub is a transport/buffer). – You need complex stream reprocessing across large windows beyond Pub/Sub retention (consider storing raw events durably in Cloud Storage/BigQuery). – You require Kafka protocol compatibility (consider Kafka on GKE/Compute Engine, or a managed Kafka offering from a partner).


4. Where is Pub/Sub used?

Industries

  • Fintech and banking: transaction events, fraud signals, audit pipelines
  • Retail/e-commerce: clickstreams, orders, inventory updates
  • Media/ads: event tracking, near-real-time analytics
  • Healthcare: event-driven integrations (with careful compliance controls)
  • Manufacturing/IoT: device telemetry ingestion
  • SaaS: product analytics, webhook processing, workflow orchestration

Team types

  • Data engineering teams building streaming pipelines
  • Platform teams implementing event buses
  • Backend application teams building microservices
  • SRE/operations teams centralizing telemetry
  • Security teams building detection pipelines

Workloads

  • Streaming ETL/ELT (Pub/Sub → Dataflow → BigQuery)
  • Event-driven microservices (Pub/Sub → Cloud Run)
  • Near-real-time monitoring/alert enrichment
  • Log ingestion pipelines
  • Asynchronous task distribution (when “event messaging” fits better than task queues)

Architectures

  • Event-driven architecture (EDA)
  • CQRS/event sourcing adjunct pipelines (Pub/Sub as a transport, not the event store)
  • Streaming analytics
  • Hybrid ingestion (on-prem → Pub/Sub → Google Cloud analytics)

Real-world deployment contexts

  • Production: multi-topic event bus, DLQs, monitoring dashboards, SLOs for consumer lag, IAM boundaries
  • Dev/test: smaller topics, shorter retention, ephemeral subscriptions, emulator usage for local tests (verify Pub/Sub emulator capabilities in official docs)

5. Top Use Cases and Scenarios

Below are realistic scenarios where Pub/Sub is commonly the right fit in Google Cloud Data analytics and pipelines.

1) Streaming ingestion into BigQuery (via Dataflow)

  • Problem: You need near-real-time analytics over application events.
  • Why Pub/Sub fits: Durable ingestion buffer; Dataflow can read from Pub/Sub and write to BigQuery with transforms.
  • Example: Web app publishes click events to clicks-topic; Dataflow aggregates and writes to BigQuery tables for dashboards.

2) Event-driven microservices fan-out

  • Problem: Multiple services need to react to the same business event.
  • Why Pub/Sub fits: Multiple subscriptions can consume the same topic independently.
  • Example: orders-topic feeds billing, shipping, email notifications, and analytics services via separate subscriptions.

3) Decoupling batch producers from real-time consumers

  • Problem: Upstream systems produce bursts; downstream systems can’t keep up.
  • Why Pub/Sub fits: Buffers bursts; consumers scale out; retention covers downtime windows.
  • Example: Nightly job publishes thousands of “recompute” events; subscribers process them over hours.

4) Centralized audit/event pipeline

  • Problem: You need a unified stream of key business events for audit and compliance reporting.
  • Why Pub/Sub fits: Central event topic with controlled subscribers; retention allows short-term replay.
  • Example: “Account updated” and “Privilege changed” events published to an audit topic; consumers store to BigQuery and Cloud Storage.

5) IoT telemetry ingestion

  • Problem: Many devices send small messages continuously; ingestion must scale.
  • Why Pub/Sub fits: Designed for high-throughput event ingestion; supports multiple processing paths.
  • Example: Device telemetry published to telemetry-topic; one subscriber detects anomalies; another stores raw events.

6) Webhook ingestion and smoothing

  • Problem: Third-party webhooks arrive unpredictably and can spike.
  • Why Pub/Sub fits: Convert synchronous HTTP intake into asynchronous processing.
  • Example: Cloud Run endpoint validates webhook and publishes to Pub/Sub; downstream workers process without timing out the webhook sender.

7) Dead-letter handling for “poison messages”

  • Problem: Some messages consistently fail processing and block progress.
  • Why Pub/Sub fits: Dead-letter topics isolate failures after N delivery attempts.
  • Example: A malformed JSON event is retried; after max attempts it lands in DLQ for inspection.

8) Cross-environment event distribution (dev/test/prod)

  • Problem: You want consistent event contracts across environments.
  • Why Pub/Sub fits: Topics/subscriptions per environment; schemas help enforce message structure.
  • Example: orders-v1 schema enforced across orders-dev, orders-stage, orders-prod.

9) Real-time feature updates for ML systems

  • Problem: ML features must be updated as events happen.
  • Why Pub/Sub fits: Low-latency event distribution to feature pipelines.
  • Example: “User clicked item” events consumed by a service that updates an online feature store (implementation varies).

10) Pipeline branching by message attributes (filtering)

  • Problem: Different consumers only need subsets of events.
  • Why Pub/Sub fits: Subscription filters reduce downstream load and cost.
  • Example: events-topic contains many event types; one subscription filters only eventType="purchase".

11) Near-real-time cache invalidation

  • Problem: Cache entries must be invalidated when data changes.
  • Why Pub/Sub fits: Publish invalidation events; multiple caches/services react.
  • Example: Product updates publish productId invalidation messages; cache services subscribe.

12) Data quality and anomaly detection sidecar

  • Problem: You need real-time checks without slowing main pipeline.
  • Why Pub/Sub fits: Add a new subscription for DQ checks without touching producers.
  • Example: A DQ service subscribes to raw events and flags schema drift or missing fields.

6. Core Features

This section focuses on important, current Pub/Sub capabilities. Always validate feature availability and limitations in the official docs because some features vary by subscription type, region, or client library.

Topics and subscriptions

  • What it does: Topics receive published messages; subscriptions deliver them to consumers.
  • Why it matters: Clean separation enables fan-out, independent scaling, and access control.
  • Practical benefit: Add a new downstream system by creating a subscription—no producer changes.
  • Caveats: Subscription configuration impacts delivery, retries, and costs.

Push and pull delivery

  • What it does:
  • Pull: subscribers poll or stream-pull messages and ack/nack.
  • Push: Pub/Sub sends HTTPS POST requests to an endpoint.
  • Why it matters: Lets you choose between consumer-controlled flow (pull) and simpler webhooks (push).
  • Practical benefit: Pull fits worker pools; push fits HTTP services (Cloud Run).
  • Caveats: Push requires a reachable HTTPS endpoint and careful auth; pull requires managing subscriber processes.

At-least-once delivery (default behavior)

  • What it does: Messages may be delivered more than once; subscribers must handle duplicates.
  • Why it matters: Enables reliability under retries and transient failures.
  • Practical benefit: Fewer lost events; supports robust pipeline design.
  • Caveats: Consumers should implement idempotency (e.g., de-dup keys) when needed.

Exactly-once delivery (subscription feature)

  • What it does: Prevents acknowledged messages from being redelivered (within the exactly-once model).
  • Why it matters: Reduces duplicate processing for pull subscribers.
  • Practical benefit: Simplifies consumer logic in some cases.
  • Caveats: Availability and requirements can depend on client libraries and subscription type. Verify current constraints and how to enable it in docs: https://cloud.google.com/pubsub/docs/exactly-once-delivery

Message ordering (ordering keys)

  • What it does: Preserves order of messages that share an ordering key, when ordering is enabled.
  • Why it matters: Some workflows require per-entity ordering (e.g., per order ID).
  • Practical benefit: Avoids complex reordering in consumers.
  • Caveats: Ordering reduces throughput for a given key and can increase latency; requires publisher discipline and correct key selection. Verify latest ordering behavior: https://cloud.google.com/pubsub/docs/ordering

Message retention and replay (seek/snapshots)

  • What it does: Retain messages for a configured duration; allow replay by seeking to a timestamp or snapshot.
  • Why it matters: Enables recovery from consumer bugs or backfills.
  • Practical benefit: Reprocess last N hours of events without needing a separate event store (within retention).
  • Caveats: Retention increases storage costs; replay can amplify downstream costs.

Acknowledgments, ack deadlines, and retry behavior

  • What it does: Subscriber acks confirm processing; ack deadlines define how long before redelivery if not acked.
  • Why it matters: Core reliability mechanism for handling failures and slow processing.
  • Practical benefit: Consumers can scale and manage long-running tasks by extending ack deadlines (via client libraries).
  • Caveats: Poor ack management causes duplicates and increased costs. Push subscriptions have different retry semantics—verify docs: https://cloud.google.com/pubsub/docs/subscriber

Dead-letter topics (DLQ)

  • What it does: Routes messages that exceed max delivery attempts to a dead-letter topic.
  • Why it matters: Prevents poison messages from endlessly retrying.
  • Practical benefit: Operational clarity; separate remediation workflow.
  • Caveats: Requires IAM permissions between subscription and dead-letter topic; DLQ itself must be monitored.

Subscription filtering

  • What it does: Filters messages delivered to a subscription based on attributes.
  • Why it matters: Reduces unnecessary downstream processing.
  • Practical benefit: Lower compute and simpler consumers.
  • Caveats: Requires consistent use of attributes by publishers; filtering rules must be tested.

Schemas (Avro/Protocol Buffers) and validation

  • What it does: Defines and validates message structure.
  • Why it matters: Helps prevent malformed events and contract drift.
  • Practical benefit: Safer evolution of event types.
  • Caveats: Schema enforcement is optional and requires coordination; verify current support and limitations.

IAM integration

  • What it does: Controls who can publish, subscribe, administer, and view resources.
  • Why it matters: Messaging is often a central integration point; it must be tightly controlled.
  • Practical benefit: Least-privilege patterns; separate publisher/subscriber identities.
  • Caveats: Misconfigured IAM is a common cause of failures and security incidents.

Observability (metrics, logs, audit)

  • What it does: Exposes metrics (e.g., backlog, throughput), logs for admin activity, and operational signals.
  • Why it matters: Streaming pipelines require active monitoring and alerting.
  • Practical benefit: SLOs around consumer lag and error rates.
  • Caveats: Logging can create cost; choose log levels/types intentionally.

7. Architecture and How It Works

High-level service architecture

Pub/Sub follows a managed broker pattern: 1. A publisher sends messages to a topic using authenticated Google Cloud APIs. 2. Pub/Sub durably stores the message for the topic (within retention and policies). 3. Each subscription on the topic tracks delivery state independently. 4. Subscribers receive messages (push or pull). 5. Subscribers acknowledge (ack) messages after processing; unacked messages are retried. 6. Messages exceeding retry attempts can be moved to a dead-letter topic (if configured).

Request/data/control flow

  • Data plane: publish requests, message delivery to subscribers, ack/nack operations.
  • Control plane: create/update topics/subscriptions, IAM policies, schemas, DLQ configuration.

Integrations with related services

Common Google Cloud integrations in Data analytics and pipelines: – Dataflow: native Pub/Sub IO connectors for streaming pipelines. – BigQuery: common sink via Dataflow; direct ingestion patterns exist but confirm current recommended approach in docs. – Cloud Run / Cloud Functions: event-driven processing, typically via push subscriptions or Eventarc (depending on trigger model—verify current guidance). – Cloud Storage: store raw events or batch outputs; notifications can feed Pub/Sub in some patterns (verify current “Cloud Storage notifications” options). – Cloud Monitoring and Cloud Logging: metrics dashboards, alerting, audit trails.

Dependency services

  • IAM, Service Usage API (enabling Pub/Sub API)
  • Cloud Monitoring/Logging for observability
  • Optional: Cloud KMS for CMEK (verify current support and setup)

Security/authentication model

  • Publishers/subscribers authenticate using:
  • User credentials (for development in Cloud Shell)
  • Service accounts (for production workloads)
  • Authorization is via IAM roles on topics/subscriptions.
  • Push subscriptions to an endpoint can use authentication tokens (OIDC) to secure the HTTP receiver; verify current push authentication model: https://cloud.google.com/pubsub/docs/push

Networking model

  • Pub/Sub is accessed via Google APIs endpoints over HTTPS.
  • Workloads in VPC can access Pub/Sub using normal outbound internet routes or with Private Google Access where applicable; perimeter controls may be applied using VPC Service Controls (verify supported configurations).
  • Push delivery requires the target endpoint to be reachable and able to validate auth tokens.

Monitoring/logging/governance considerations

  • Monitor subscription backlog and oldest unacked message age to detect lag.
  • Track publish and ack error rates.
  • Use Cloud Audit Logs for admin actions and review IAM changes.
  • Use naming conventions and labels for environment, owner, data classification, and cost center.

Simple architecture diagram

flowchart LR
  A[Publisher App] -->|Publish messages| T[Pub/Sub Topic]
  T --> S1[Subscription A]
  T --> S2[Subscription B]
  S1 -->|Pull| C1[Subscriber Service A]
  S2 -->|Push HTTPS| C2[Subscriber Service B]

Production-style architecture diagram

flowchart TB
  subgraph Producers
    P1[Cloud Run API]
    P2[Batch Job]
    P3[On-prem Connector]
  end

  subgraph Messaging
    T[Pub/Sub Topic: events]
    F[Subscription Filter: purchases]
    G[Subscription: all-events]
    DLQ[Dead-letter Topic]
    DLS[DLQ Subscription]
  end

  subgraph Processing
    DF[Dataflow Streaming Pipeline]
    CR[Cloud Run Consumer]
    DQ[Data Quality Service]
  end

  subgraph Storage_Analytics
    BQ[BigQuery]
    GCS[Cloud Storage Raw Archive]
  end

  subgraph Ops
    CM[Cloud Monitoring Alerts]
    CL[Cloud Logging / Audit Logs]
  end

  P1 --> T
  P2 --> T
  P3 --> T

  T --> F
  T --> G

  F --> DF
  G --> CR
  G --> DQ

  DF --> BQ
  DF --> GCS

  CR -. failed after attempts .-> DLQ
  DLQ --> DLS

  T --> CL
  F --> CM
  G --> CM
  DF --> CM

8. Prerequisites

Before starting the hands-on lab and using Pub/Sub in Google Cloud:

Account/project requirements

  • A Google Cloud account
  • A Google Cloud project with billing enabled (Pub/Sub usage is billable beyond free tier/quotas)

Permissions / IAM roles

For the lab, your user or service account typically needs: – roles/pubsub.admin (create/manage topics/subscriptions) – Or more limited roles: – roles/pubsub.editor (if applicable) – roles/pubsub.publisher (publish) – roles/pubsub.subscriber (consume) – Permission to enable APIs: roles/serviceusage.serviceUsageAdmin (or project Owner)

In production, prefer separate service accounts for publishers and subscribers with least privilege.

CLI/SDK/tools

  • Cloud Shell (recommended) or local setup with:
  • Google Cloud CLI (gcloud)
  • Python 3.10+ (or your language runtime)
  • Pub/Sub client library (for the tutorial): google-cloud-pubsub

APIs to enable

  • Pub/Sub API
    Official docs: https://cloud.google.com/pubsub/docs/quickstart-client-libraries

Region availability

  • Pub/Sub is generally available across Google Cloud, but location/residency features and CMEK support can vary. Verify location support: https://cloud.google.com/pubsub/docs/locations

Quotas/limits

Pub/Sub enforces quotas (requests per second, message sizes, subscriptions per topic, etc.). Quotas can change and can be project/region dependent: – Check quotas: https://cloud.google.com/pubsub/quotas

Prerequisite services (optional, depending on your architecture)

  • Cloud Monitoring/Logging for dashboards and alerts
  • Cloud KMS if using CMEK
  • Dataflow/BigQuery/Cloud Run if building end-to-end pipelines

9. Pricing / Cost

Pub/Sub pricing is usage-based. Exact prices vary by SKU and can change, so use official sources for current rates.

  • Official pricing page: https://cloud.google.com/pubsub/pricing
  • Pricing calculator: https://cloud.google.com/products/calculator

Pricing dimensions (what you pay for)

Common pricing dimensions for Pub/Sub include (verify the latest breakdown on the pricing page): – Data volume: – Data published to topics (ingress) – Data delivered to subscriptions (egress within Pub/Sub service) – Message delivery: delivery volume scales with fan-out (multiple subscriptions increase delivered bytes). – Message storage / retention: retaining messages longer can incur storage charges (depending on current SKUs). – Network egress: delivering to subscribers outside Google Cloud regions or to the public internet (push endpoints, on-prem consumers) can incur network egress charges. – Additional features: Some advanced capabilities may have pricing implications. Confirm on pricing page if applicable (e.g., exactly-once delivery, dedicated capacity, etc.).

Free tier (if applicable)

Google Cloud often provides free usage tiers for some services, but details and limits can change. Check the Pub/Sub pricing page for any free tier, free operations, or monthly free volume.

Cost drivers (the biggest levers)

  • Fan-out multiplier: 1 published message delivered to 5 subscriptions counts as 5 deliveries.
  • Message size: large payloads and verbose attributes cost more.
  • Retention: longer retention increases storage.
  • Cross-region and internet delivery: egress charges can dominate.
  • Retry behavior: repeatedly failing consumers increase redeliveries and costs.
  • Subscriber inefficiency: slow consumers cause backlog; may lead to longer retention usage and higher compute costs downstream.

Hidden or indirect costs

  • Downstream compute: Dataflow/Cloud Run/GKE costs can exceed Pub/Sub costs.
  • Logging: verbose application logs and audit logs retention can become significant.
  • Operational overhead: dashboards, alerting, and incident response time.

Network/data transfer implications

  • Pull subscribers running outside Google Cloud (or in different regions) can incur egress.
  • Push subscriptions to public endpoints also involve internet egress.
  • Within Google Cloud, network paths can still incur charges depending on source/destination and region—validate with pricing docs and calculator.

How to optimize cost

  • Minimize message payload size (store large payloads in Cloud Storage; publish pointers/URIs).
  • Use subscription filtering to reduce downstream delivery and compute.
  • Keep retention as low as your recovery requirements allow.
  • Fix poison messages quickly; use DLQs to avoid infinite retries.
  • Batch publish messages where appropriate (client libraries support batching).
  • Use compression at the application layer if your consumers can handle it (trade CPU vs bytes).
  • Avoid unnecessary fan-out; consider a single processing pipeline that writes to multiple sinks if it reduces delivery duplication.

Example low-cost starter estimate (conceptual)

A small lab setup might include: – One topic, two subscriptions – A few thousand small messages (hundreds of KB to a few MB total) – Pull subscriber running in Cloud Shell briefly

This is typically very low cost, often near free tier levels (if available), but do not assume: always confirm with the pricing page and your billing reports.

Example production cost considerations

In production, plan for: – Daily/weekly message volume (GiB) × number of subscriptions (fan-out) – Peak throughput (to avoid quota issues and to size downstream compute) – Retention needs (hours vs days) – Cross-region egress – Error rates and retries – Cost attribution by team: use labels, separate projects, or billing export analysis


10. Step-by-Step Hands-On Tutorial

This lab builds a small but realistic Pub/Sub workflow using: – A topic for events – A “main” subscription with a dead-letter topic – A filtered subscription (only some events) – A Python pull subscriber that intentionally fails some messages so you can observe retries and DLQ behavior

This is beginner-friendly, executable in Cloud Shell, and designed to stay low-cost.

Objective

  1. Create a Pub/Sub topic and subscriptions.
  2. Publish sample messages with attributes.
  3. Consume messages with a Python subscriber (ack successes, nack failures).
  4. Observe retries and dead-letter routing.
  5. Validate filtered subscription behavior.
  6. Clean up resources.

Lab Overview

You will create: – Topic: events-topic – Dead-letter topic: events-dlq-topic – Subscription (main): events-sub with DLQ configured – Subscription (DLQ): events-dlq-sub – Subscription (filtered): events-us-sub that only receives messages where region="us"

You will publish messages like: – Payload: JSON string – Attributes: region, eventType, shouldFail

Then you’ll run a Python subscriber that: – If shouldFail=true, it nacks the message to trigger retries – Otherwise it acks

After max delivery attempts, failing messages should appear in the DLQ subscription.

Notes: – Exact timing of retries and DLQ forwarding can vary. Expect a few minutes for DLQ behavior to become visible. – Commands below use the gcloud CLI. Run them from Cloud Shell for easiest setup.


Step 1: Set your project and enable the Pub/Sub API

1) Open Google Cloud Console → Cloud Shell.

2) Set environment variables:

export PROJECT_ID="$(gcloud config get-value project)"
echo "PROJECT_ID=${PROJECT_ID}"

If PROJECT_ID is empty, set it:

gcloud config set project YOUR_PROJECT_ID
export PROJECT_ID="YOUR_PROJECT_ID"

3) Enable the Pub/Sub API:

gcloud services enable pubsub.googleapis.com

Expected outcome – Pub/Sub API is enabled for the project.

Verify

gcloud services list --enabled --filter="name:pubsub.googleapis.com"

Step 2: Create topics (main + dead-letter)

Create the main topic:

gcloud pubsub topics create events-topic

Create the DLQ topic:

gcloud pubsub topics create events-dlq-topic

Expected outcome – Two topics exist in your project.

Verify

gcloud pubsub topics list

Step 3: Create subscriptions (main with DLQ, DLQ subscription, and filtered subscription)

1) Create the main subscription with a dead-letter policy.

Set a relatively small max delivery attempts so you can see DLQ behavior quickly (for real production you may want higher values):

gcloud pubsub subscriptions create events-sub \
  --topic=events-topic \
  --ack-deadline=20 \
  --dead-letter-topic=events-dlq-topic \
  --max-delivery-attempts=5

2) Create a subscription on the dead-letter topic:

gcloud pubsub subscriptions create events-dlq-sub \
  --topic=events-dlq-topic

3) Create a filtered subscription (only region=us):

gcloud pubsub subscriptions create events-us-sub \
  --topic=events-topic \
  --filter='attributes.region="us"'

Expected outcomeevents-sub receives all messages (with DLQ handling). – events-us-sub receives only messages with attribute region=us. – events-dlq-sub receives dead-lettered messages.

Verify

gcloud pubsub subscriptions list
gcloud pubsub subscriptions describe events-sub

If events-sub creation fails due to permissions on the dead-letter topic, verify IAM. Pub/Sub needs permission to publish to the dead-letter topic. Official DLQ docs include required roles/permissions: https://cloud.google.com/pubsub/docs/dead-letter-topics


Step 4: Publish sample messages with attributes

Publish a few messages. We’ll include attributes to drive filtering and failure behavior.

Publish a successful US purchase event:

gcloud pubsub topics publish events-topic \
  --message='{"eventId":"e-1001","eventType":"purchase","amount":42.50}' \
  --attribute=region=us,eventType=purchase,shouldFail=false

Publish a successful EU signup event:

gcloud pubsub topics publish events-topic \
  --message='{"eventId":"e-1002","eventType":"signup","plan":"free"}' \
  --attribute=region=eu,eventType=signup,shouldFail=false

Publish a failing US event (will be nacked by our subscriber):

gcloud pubsub topics publish events-topic \
  --message='{"eventId":"e-1003","eventType":"purchase","amount":13.37}' \
  --attribute=region=us,eventType=purchase,shouldFail=true

Expected outcome – Three messages are now available for delivery on subscriptions.

Verify (quick pull from filtered sub) This pulls one message if available and auto-acks it (good for quick checks; not how you’d run production consumers):

gcloud pubsub subscriptions pull events-us-sub --limit=5 --auto-ack

You should see only messages where region=us (depending on timing and whether another subscriber consumed them).


Step 5: Create a Python pull subscriber (ack successes, nack failures)

1) Install the Pub/Sub Python client library in Cloud Shell:

python3 -m pip install --user --upgrade google-cloud-pubsub

2) Create a subscriber script:

cat > subscriber.py <<'PY'
import json
import os
import time
from google.cloud import pubsub_v1

project_id = os.environ.get("PROJECT_ID")
subscription_id = os.environ.get("SUBSCRIPTION_ID", "events-sub")

if not project_id:
    raise SystemExit("PROJECT_ID env var is required")

subscription_path = f"projects/{project_id}/subscriptions/{subscription_id}"

subscriber = pubsub_v1.SubscriberClient()

def callback(message: pubsub_v1.subscriber.message.Message) -> None:
    attrs = dict(message.attributes or {})
    data = message.data.decode("utf-8", errors="replace")

    should_fail = attrs.get("shouldFail", "false").lower() == "true"

    print("\n--- Received message ---")
    print(f"Message ID: {message.message_id}")
    print(f"Publish time: {message.publish_time}")
    print(f"Attributes: {attrs}")
    print(f"Data: {data}")

    # Simulate processing
    time.sleep(1)

    if should_fail:
        print("Simulated failure -> nack (will retry, may go to DLQ after max attempts)")
        message.nack()
        return

    # Example idempotency hint: use eventId from payload if present
    try:
        payload = json.loads(data)
        print(f"Parsed eventId: {payload.get('eventId')}")
    except Exception:
        pass

    print("Processed OK -> ack")
    message.ack()

streaming_pull_future = subscriber.subscribe(subscription_path, callback=callback)
print(f"Listening on {subscription_path}... Press Ctrl+C to stop.")

try:
    streaming_pull_future.result()
except KeyboardInterrupt:
    streaming_pull_future.cancel()
    print("Stopped.")
PY

3) Run the subscriber:

export PROJECT_ID="$(gcloud config get-value project)"
export SUBSCRIPTION_ID="events-sub"
python3 subscriber.py

Expected outcome – The subscriber prints received messages. – Messages with shouldFail=false are acked and stop retrying. – Messages with shouldFail=true are nacked and retried until they reach max delivery attempts, then they should be published to the dead-letter topic.


Step 6: Observe dead-letter behavior

While the subscriber is running (or after stopping it), wait a bit for retries and DLQ routing (often a couple minutes).

Then pull from the DLQ subscription:

gcloud pubsub subscriptions pull events-dlq-sub --limit=10 --auto-ack

Expected outcome – The failing message (eventId e-1003) eventually appears in events-dlq-sub.

If you don’t see it yet – Wait longer and try again. – Confirm that events-sub has the dead-letter policy configured: bash gcloud pubsub subscriptions describe events-sub --format="flattened(deadLetterPolicy)" – Confirm your subscriber is nacking the message (or not acking it). – Verify max-delivery-attempts is set and the message is being retried.


Step 7: Validate subscription filtering

Publish two more messages—one US and one EU:

gcloud pubsub topics publish events-topic \
  --message='{"eventId":"e-1004","eventType":"purchase","amount":9.99}' \
  --attribute=region=us,eventType=purchase,shouldFail=false

gcloud pubsub topics publish events-topic \
  --message='{"eventId":"e-1005","eventType":"purchase","amount":19.99}' \
  --attribute=region=eu,eventType=purchase,shouldFail=false

Pull from the US-only subscription:

gcloud pubsub subscriptions pull events-us-sub --limit=10 --auto-ack

Pull from the main subscription (if your Python subscriber is stopped):

gcloud pubsub subscriptions pull events-sub --limit=10 --auto-ack

Expected outcomeevents-us-sub returns only the US message(s). – events-sub can return both US and EU messages (unless already consumed).


Validation

Use these checks to confirm your setup is correct.

1) List topics/subscriptions:

gcloud pubsub topics list
gcloud pubsub subscriptions list

2) Inspect subscription configuration:

gcloud pubsub subscriptions describe events-sub
gcloud pubsub subscriptions describe events-us-sub

3) Check backlog/metrics in the Console: – Go to Google Cloud Console → Pub/Sub → Subscriptions – Open events-sub – Look for message backlog and delivery metrics
(Exact metric names and UI can change; verify with Cloud Monitoring docs if needed.)


Troubleshooting

Common issues and fixes:

1) PERMISSION_DENIED when creating DLQ subscription – Cause: missing permissions for dead-letter topic usage. – Fix: ensure you have admin rights for the lab, or follow required IAM bindings in DLQ docs: https://cloud.google.com/pubsub/docs/dead-letter-topics

2) No messages received by subscriber – Confirm you’re listening to the correct subscription: bash echo $PROJECT_ID echo $SUBSCRIPTION_ID – Publish a new test message and watch logs. – Ensure the subscription exists and is attached to the topic.

3) Messages keep retrying even after “success” – Check your subscriber code calls message.ack() and does not crash before acking. – If your subscriber process terminates before ack, the message will be redelivered.

4) DLQ message never appears – Ensure your subscriber is consistently failing the same message (nack or no ack). – Ensure --max-delivery-attempts is set low enough for the lab. – Wait longer; delivery attempts may not happen instantly.

5) Local Python dependency issues in Cloud Shell – Re-run: bash python3 -m pip install --user --upgrade google-cloud-pubsub – Ensure you’re using python3.


Cleanup

Delete lab resources to avoid ongoing costs.

gcloud pubsub subscriptions delete events-sub
gcloud pubsub subscriptions delete events-us-sub
gcloud pubsub subscriptions delete events-dlq-sub

gcloud pubsub topics delete events-topic
gcloud pubsub topics delete events-dlq-topic

Expected outcome – Topics and subscriptions are removed.

Verify:

gcloud pubsub topics list
gcloud pubsub subscriptions list

11. Best Practices

Architecture best practices

  • Design for idempotency: assume duplicate delivery (at-least-once). Use unique event IDs and de-dup logic where necessary.
  • Separate topics by event domain: e.g., orders-events, payments-events, not one giant topic for everything.
  • Use attributes intentionally: keep payload stable; use attributes for routing/filtering metadata (e.g., eventType, tenantId, region, schemaVersion).
  • Apply fan-out thoughtfully: every additional subscription increases delivery volume and cost. Consider whether a downstream pipeline can branch internally.
  • Use DLQs for poison messages: configure dead-letter topics and build operational playbooks for DLQ remediation.

IAM/security best practices

  • Least privilege:
  • Publishers: roles/pubsub.publisher on specific topics
  • Subscribers: roles/pubsub.subscriber on specific subscriptions
  • Admin tasks: limited to platform team
  • Use separate service accounts per workload and environment.
  • Avoid user credentials in production; use service accounts with workload identity where appropriate.

Cost best practices

  • Control message size: store large payloads outside Pub/Sub and publish references.
  • Tune retention to real recovery needs.
  • Prevent runaway retries: set sane retry/DLQ policies and monitor error rates.
  • Use filtering to avoid delivering irrelevant events to consumers.

Performance best practices

  • Batch publishing: use client library batching settings to improve throughput.
  • Use flow control in subscribers: cap outstanding messages/bytes to avoid memory pressure.
  • Parallelize subscribers: scale horizontally; ensure processing is stateless where possible.
  • Use ordering keys only when required: ordering can reduce effective parallelism per key.

Reliability best practices

  • Set SLOs: backlog size, oldest unacked message age, subscriber error rate.
  • Plan for consumer outages: retention should cover expected recovery times.
  • Test failover: intentionally stop subscribers and validate recovery and replay behavior.
  • Use DLQ + alerting: DLQ growth should page or create incidents.

Operations best practices

  • Standardize naming:
  • topic: {env}.{domain}.{event} (example: prod.orders.events)
  • subscription: {env}.{consumer}.{topic} (example: prod.analytics.orders.events.sub)
  • Label resources: owner, cost center, environment, data classification.
  • Document contracts: schemas, attribute conventions, versioning strategy.
  • Automate provisioning: use Terraform or other IaC to manage topics/subscriptions.

Governance/tagging/naming best practices

  • Use consistent labels:
  • env=dev|stage|prod
  • team=data-platform
  • pii=false|true
  • cost-center=1234
  • Define a topic lifecycle process: creation, versioning, deprecation, migration.

12. Security Considerations

Identity and access model

  • Pub/Sub uses IAM for authorization.
  • Common IAM patterns:
  • Grant publish rights on a topic to producer service accounts only.
  • Grant subscribe rights on a subscription to consumer service accounts only.
  • Restrict topic/subscription admin rights to platform/security teams.

Recommended roles (verify exact role names in IAM docs): – roles/pubsub.publisherroles/pubsub.subscriberroles/pubsub.viewerroles/pubsub.admin

Encryption

  • In transit: Pub/Sub uses TLS for API communication.
  • At rest: encrypted by default with Google-managed keys.
  • CMEK (Customer-managed encryption keys): Pub/Sub may support CMEK for certain resources/configurations; availability can vary. Verify CMEK support and setup steps: https://cloud.google.com/pubsub/docs/encryption

Network exposure

  • Pub/Sub is accessed via Google APIs over HTTPS.
  • For subscribers/publishers in VPC environments:
  • Use organization policies and egress controls as needed.
  • Consider VPC Service Controls for data exfiltration risk reduction (verify compatibility and supported services).
  • For push subscriptions:
  • Endpoints must be internet-reachable (or reachable via appropriate networking) and must validate authentication tokens.

Secrets handling

  • Avoid embedding service account keys in code.
  • Prefer:
  • Workload Identity (GKE)
  • Service account attached to Cloud Run / Compute Engine
  • Short-lived credentials via ADC (Application Default Credentials)
  • Use Secret Manager for application secrets unrelated to Pub/Sub credentials.

Audit/logging

  • Use Cloud Audit Logs to track:
  • Topic/subscription creation/deletion
  • IAM policy changes
  • Schema changes (if used)
  • Consider enabling and routing logs to a central logging project if you operate at scale.

Compliance considerations

  • For regulated data, validate:
  • Data residency/location controls (message storage policies)
  • Retention duration and deletion behavior
  • Access boundaries (projects, folders, org policies, VPC-SC)
  • Pub/Sub is often part of a larger compliance story; ensure downstream systems also comply.

Common security mistakes

  • Granting roles/pubsub.admin broadly to developers and workloads
  • Using a single shared service account for many services
  • Push endpoints without authentication/authorization
  • Publishing sensitive payloads without classification and retention controls
  • No monitoring for DLQ growth (can hide attacks or data quality failures)

Secure deployment recommendations

  • Use separate projects for dev/test/prod.
  • Use per-team topics only when boundaries matter; otherwise central platform-managed topics with strict IAM.
  • Enforce schema validation for critical event domains.
  • Use DLQs and alerts to detect abnormal failure patterns.

13. Limitations and Gotchas

Always confirm details in official docs because limits and behaviors can change.

Known limitations / common constraints

  • Message size limits exist (payload + attributes). Verify current max size: https://cloud.google.com/pubsub/quotas
  • At-least-once delivery means duplicates are possible unless exactly-once delivery is enabled and used correctly (and even then, end-to-end exactly-once processing is not automatic).
  • Ordering is per ordering key; ordering across different keys is not guaranteed.
  • Retention is not archival: Pub/Sub is not designed as a long-term event store.

Quotas

  • Quotas exist for:
  • publish rate, pull rate
  • subscriptions per topic, topics per project
  • outstanding messages/bytes
  • Review and request increases where needed: https://cloud.google.com/pubsub/quotas

Regional constraints

  • Location/residency features and CMEK may have constraints.
  • Cross-region subscribers can incur egress costs and add latency.
  • Verify Pub/Sub location guidance: https://cloud.google.com/pubsub/docs/locations

Pricing surprises

  • Fan-out multiplies delivery volume costs.
  • Retention and replay can increase storage and delivery charges.
  • Retries due to failing subscribers can significantly increase delivery volume.

Compatibility issues

  • Some features depend on client library versions (exactly-once delivery support, flow control behavior).
  • Always pin and update client libraries carefully in production.

Operational gotchas

  • Poor ack handling causes redeliveries and cost spikes.
  • Misconfigured push endpoints can cause repeated delivery attempts.
  • DLQ without alerting can silently accumulate failures.

Migration challenges

  • Migrating from Kafka/RabbitMQ may require adapting message keying, ordering assumptions, and consumer group semantics.
  • Pub/Sub’s model (topic + independent subscriptions) is different from Kafka partitions and consumer groups. Plan carefully.

Vendor-specific nuances

  • Pub/Sub integrates deeply with Google Cloud IAM and monitoring; this is a benefit, but it also means you should design around Google Cloud operational patterns.

14. Comparison with Alternatives

Pub/Sub is one option in Google Cloud and among cloud providers. The best choice depends on delivery semantics, throughput, protocol needs, and operational constraints.

Comparison table

Option Best For Strengths Weaknesses When to Choose
Pub/Sub (Google Cloud) Event ingestion, fan-out, streaming pipelines Fully managed, scalable, native Google Cloud integrations, DLQ/filtering/ordering options Not a long-term event store; duplicates possible; non-Kafka protocol Default choice for event-driven architectures and Data analytics and pipelines on Google Cloud
Pub/Sub Lite (Google Cloud) Cost-sensitive, very high-throughput streaming (where applicable) Lower-cost model for certain patterns; regional Different operational model and constraints vs Pub/Sub; feature parity may differ Consider if your use case matches Lite’s model and you’ve verified current status and fit in official docs
Cloud Tasks (Google Cloud) Task queueing, request scheduling Good for HTTP task execution, retries, scheduling Not a pub/sub fan-out event bus When you need task execution semantics rather than event streaming
Eventarc (Google Cloud) Event routing from Google Cloud sources to services Simplifies event triggers to Cloud Run; integrates with Google Cloud events Not a general-purpose high-throughput ingestion layer by itself When you want managed event routing/triggers rather than building consumer plumbing
Kafka (self-managed on GKE/Compute Engine) Kafka protocol, long retention, stream replay Strong ecosystem, partitions, consumer groups, long retention Operational overhead, scaling/patching complexity When you require Kafka compatibility, long replay windows, or existing Kafka tooling
AWS SNS + SQS Pub/sub + queueing on AWS Mature services; flexible patterns Different ecosystem; multi-service composition When operating primarily on AWS
AWS Kinesis High-throughput streaming on AWS Tight integration with AWS analytics Different API model and cost structure When on AWS and needing managed streaming
Azure Event Hubs / Service Bus Messaging/streaming on Azure Strong Azure integration Different semantics When operating primarily on Azure
RabbitMQ (self-managed/managed elsewhere) Traditional messaging, routing patterns Flexible routing, familiar AMQP Operational overhead; scaling limits for very high throughput When you need AMQP/routing semantics and accept operational tradeoffs

15. Real-World Example

Enterprise example: Streaming analytics for a retail platform

  • Problem: A large retailer needs near-real-time analytics of purchases and browsing behavior, plus multiple downstream consumers (fraud, recommendations, BI dashboards). They want reliability, replay within a limited window, and strong IAM separation between teams.
  • Proposed architecture:
  • Microservices publish events to domain topics (orders-events, clickstream-events)
  • Subscription filtering routes subsets (e.g., only purchases) to a Dataflow pipeline
  • Dataflow validates schemas, enriches events, writes to BigQuery
  • Separate subscriptions feed fraud detection services and operational monitoring
  • Dead-letter topics capture poison messages; remediation pipeline stores failed payloads to Cloud Storage with incident tickets
  • Why Pub/Sub was chosen:
  • Managed ingestion and fan-out at scale
  • Tight Google Cloud integration (Dataflow + BigQuery)
  • Independent subscriptions per team with IAM boundaries
  • DLQs and retention for operational safety
  • Expected outcomes:
  • Reduced coupling and faster onboarding of new consumers
  • Near-real-time dashboards in BigQuery
  • Improved reliability under traffic spikes
  • Clear operational model for failures (DLQ + alerts)

Startup/small-team example: Webhook processing for a SaaS product

  • Problem: A small SaaS team receives webhooks from payment and CRM providers. Webhooks spike during billing cycles. They need to process events reliably without timing out the webhook sender and without running message brokers.
  • Proposed architecture:
  • Cloud Run service receives webhook, verifies signature, publishes normalized event to Pub/Sub
  • Cloud Run worker(s) subscribe (push or pull) and process events asynchronously
  • DLQ captures events that fail repeatedly; team reviews and replays
  • Why Pub/Sub was chosen:
  • Minimal ops effort
  • Scales automatically during spikes
  • Clean separation between webhook intake and background processing
  • Expected outcomes:
  • Fewer webhook failures/timeouts
  • Better reliability during peak spikes
  • Simple path to add analytics subscription later (fan-out)

16. FAQ

1) Is Pub/Sub the same as Kafka?
No. Pub/Sub is a managed messaging service with topics and independent subscriptions; Kafka is a distributed log with partitions and consumer groups. They solve similar streaming problems but have different operational models and semantics.

2) Does Pub/Sub guarantee exactly-once processing?
Not automatically. Pub/Sub can provide exactly-once delivery in supported scenarios, but end-to-end exactly-once processing still requires idempotent design, transactional sinks, or de-duplication strategies. Verify exactly-once delivery details: https://cloud.google.com/pubsub/docs/exactly-once-delivery

3) What’s the difference between a topic and a subscription?
A topic is where messages are published. A subscription is a delivery configuration that receives messages from a topic and tracks ack state independently.

4) Can multiple subscribers read from the same subscription?
Yes. Multiple subscriber instances can share a subscription to scale out processing (competing consumers). Each message is delivered to one of the subscribers for that subscription.

5) How do I broadcast the same message to multiple systems?
Create multiple subscriptions on the same topic (fan-out). Each subscription receives a copy of the message stream.

6) What happens if my subscriber crashes?
Unacked messages become eligible for redelivery after the ack deadline. Your application should be idempotent and able to process duplicates.

7) How do push subscriptions work?
Pub/Sub sends HTTPS POST requests to your endpoint. Your service must return appropriate success responses and handle retries. Secure the endpoint using authentication (OIDC token) per docs: https://cloud.google.com/pubsub/docs/push

8) Should I use push or pull?
Use pull for worker pools, fine-grained flow control, and many data processing pipelines. Use push for HTTP-based event handling (e.g., Cloud Run) where you want simpler delivery and don’t want polling logic.

9) How do I handle poison messages?
Use dead-letter topics and alert on DLQ growth. Also store failing payloads and add tools/workflows to replay after fixing bugs.

10) Can I filter messages so a subscription only receives some events?
Yes, with subscription filtering based on message attributes. Ensure producers publish consistent attributes.

11) Where should I put large payloads?
Prefer storing large data in Cloud Storage (or another datastore) and publishing a reference (object path, ID) to Pub/Sub. This reduces messaging costs and avoids size limits.

12) How do I replay messages?
Use retention plus seek-to-timestamp or snapshots to reset subscription state. Replay is bounded by retention. Verify replay/seek docs: https://cloud.google.com/pubsub/docs/replay-overview (verify exact URL in docs if it changed)

13) How do I monitor subscriber lag?
Track backlog metrics (undelivered/unacked messages) and oldest unacked message age in Cloud Monitoring. Create alerts for sustained growth.

14) Is Pub/Sub regional or global?
Pub/Sub resources are managed by Google Cloud and are not tied to a specific VM zone. Data residency and location controls exist (message storage policy). Verify current location semantics: https://cloud.google.com/pubsub/docs/locations

15) What’s a good retention period?
Long enough to recover from expected outages and deploy rollbacks (often hours to a couple days), but not so long that cost and operational replay risk become high. Choose based on your incident recovery objectives.

16) Can I use Pub/Sub for request/response RPC?
It’s possible but usually not ideal. Pub/Sub is optimized for asynchronous messaging. For synchronous request/response, use HTTP/gRPC directly.

17) How do I secure access between teams?
Use separate topics/subscriptions by domain or environment and grant IAM roles only to required service accounts. Consider separate projects for stronger isolation.


17. Top Online Resources to Learn Pub/Sub

Resource Type Name Why It Is Useful
Official documentation Pub/Sub Overview — https://cloud.google.com/pubsub/docs/overview Authoritative description of concepts, features, and core behavior
Official quickstart Pub/Sub client library quickstarts — https://cloud.google.com/pubsub/docs/quickstarts Step-by-step getting started for multiple languages
Official pricing Pub/Sub pricing — https://cloud.google.com/pubsub/pricing Current pricing dimensions and rates (don’t rely on third-party summaries)
Pricing calculator Google Cloud Pricing Calculator — https://cloud.google.com/products/calculator Estimate costs for different volumes, regions, and architectures
Architecture guidance Cloud Architecture Center (search event-driven / streaming) — https://cloud.google.com/architecture Reference architectures and best practices for event-driven systems
Dead-letter topics DLQ docs — https://cloud.google.com/pubsub/docs/dead-letter-topics Required IAM, configuration details, and operational guidance
Ordering Message ordering — https://cloud.google.com/pubsub/docs/ordering Correct setup and constraints for ordering keys
Exactly-once delivery Exactly-once delivery — https://cloud.google.com/pubsub/docs/exactly-once-delivery How it works, limitations, and supported clients
Subscriber guidance Subscriber overview — https://cloud.google.com/pubsub/docs/subscriber Ack deadlines, retries, flow control, and client behavior
GitHub samples (official) GoogleCloudPlatform Pub/Sub samples — https://github.com/GoogleCloudPlatform Practical code examples (search within org for pubsub client samples)
Video (official) Google Cloud Tech YouTube — https://www.youtube.com/@googlecloudtech Product explainers and architecture talks (search for “Pub/Sub”)
Hands-on labs Google Cloud Skills Boost (search Pub/Sub labs) — https://www.cloudskillsboost.google/ Guided labs with temporary projects and step-by-step instructions

18. Training and Certification Providers

Below are training providers as requested. Verify course outlines, delivery modes, and schedules on their websites.

1) DevOpsSchool.com
Suitable audience: DevOps engineers, SREs, cloud engineers, developers
Likely learning focus: DevOps/cloud automation; may include Google Cloud messaging and pipelines in broader tracks
Mode: check website
Website: https://www.devopsschool.com/

2) ScmGalaxy.com
Suitable audience: beginners to intermediate IT professionals
Likely learning focus: software configuration management and DevOps fundamentals; may offer cloud-related modules
Mode: check website
Website: https://www.scmgalaxy.com/

3) CLoudOpsNow.in
Suitable audience: cloud operations and platform teams
Likely learning focus: cloud ops practices, reliability, monitoring, and operational runbooks
Mode: check website
Website: https://www.cloudopsnow.in/

4) SreSchool.com
Suitable audience: SREs, production engineers, operations teams
Likely learning focus: SRE practices, observability, incident response, reliability architecture
Mode: check website
Website: https://www.sreschool.com/

5) AiOpsSchool.com
Suitable audience: operations teams adopting AIOps practices
Likely learning focus: AIOps concepts, monitoring automation, incident analytics
Mode: check website
Website: https://www.aiopsschool.com/


19. Top Trainers

These are trainer-related sites/platforms as requested. Verify specific Pub/Sub or Google Cloud coverage on each site.

1) RajeshKumar.xyz
Likely specialization: DevOps/cloud training content (verify on site)
Suitable audience: engineers seeking practical training
Website: https://rajeshkumar.xyz/

2) devopstrainer.in
Likely specialization: DevOps and cloud training
Suitable audience: beginners to intermediate DevOps/cloud learners
Website: https://www.devopstrainer.in/

3) devopsfreelancer.com
Likely specialization: DevOps consulting/training resources (verify offerings)
Suitable audience: teams/individuals looking for hands-on guidance
Website: https://www.devopsfreelancer.com/

4) devopssupport.in
Likely specialization: DevOps support and training resources
Suitable audience: operations teams and engineers needing implementation support
Website: https://www.devopssupport.in/


20. Top Consulting Companies

These consulting companies are listed as requested. Descriptions below are kept generic; verify exact offerings directly with each firm.

1) cotocus.com
Likely service area: cloud/DevOps consulting (verify service catalog)
Where they may help: architecture reviews, implementation support, operationalization
Consulting use case examples: – Designing an event-driven architecture with Pub/Sub topics/subscriptions and IAM boundaries
– Implementing DLQ strategy and monitoring/alerting dashboards
Website: https://cotocus.com/

2) DevOpsSchool.com
Likely service area: DevOps and cloud consulting/training
Where they may help: platform enablement, CI/CD, cloud best practices, skills development
Consulting use case examples: – Building a reference Pub/Sub-based ingestion layer for data analytics and pipelines
– Creating Terraform modules for topics/subscriptions with standardized naming/labels
Website: https://www.devopsschool.com/

3) DEVOPSCONSULTING.IN
Likely service area: DevOps and cloud consulting
Where they may help: implementation and operational support for cloud platforms
Consulting use case examples: – Migrating event workloads to Google Cloud Pub/Sub with monitoring and cost controls
– Designing subscriber scaling and reliability patterns (ack handling, retries, DLQ)
Website: https://www.devopsconsulting.in/


21. Career and Learning Roadmap

What to learn before Pub/Sub

  • Google Cloud fundamentals: projects, IAM, billing, APIs
  • Basic networking and identity concepts: service accounts, OAuth, least privilege
  • Event-driven architecture basics: async messaging, retries, idempotency
  • Data analytics and pipelines fundamentals: streaming vs batch, ETL/ELT, schema evolution

What to learn after Pub/Sub

  • Dataflow (Apache Beam) for streaming transformations and windowing
  • BigQuery for analytics, partitioning, and streaming ingestion patterns
  • Cloud Run for event-driven compute
  • Cloud Monitoring/Logging for SLOs and alerting on pipeline health
  • Terraform for IaC management of topics/subscriptions and IAM
  • Security: VPC Service Controls, org policies, audit logging strategies

Job roles that use Pub/Sub

  • Cloud engineer / platform engineer
  • Data engineer / analytics engineer
  • Backend engineer (microservices)
  • DevOps engineer / SRE
  • Security engineer (event-driven detections, audit pipelines)
  • Solutions architect

Certification path (Google Cloud)

Google Cloud certifications change over time. Pub/Sub knowledge is relevant for: – Associate Cloud Engineer (foundational services and operations) – Professional Cloud Architect (architecture decisions and tradeoffs) – Professional Data Engineer (streaming pipelines with Pub/Sub/Dataflow/BigQuery)

Verify current Google Cloud certification tracks: https://cloud.google.com/learn/certification

Project ideas for practice

  • Build a clickstream ingestion pipeline: Pub/Sub → Dataflow → BigQuery with dashboards.
  • Implement an event-driven order workflow with 3 services consuming different subscriptions.
  • Add schema validation and versioning to an event domain.
  • Implement DLQ remediation tooling: DLQ → Cloud Run job to reprocess after fix.
  • Add Monitoring alerts for backlog growth and DLQ spikes; define SLOs.

22. Glossary

  • Ack (Acknowledgment): A confirmation from subscriber to Pub/Sub that a message was processed successfully.
  • Ack deadline: The time allowed for a subscriber to ack a message before it becomes eligible for redelivery.
  • At-least-once delivery: Delivery guarantee where messages may be delivered multiple times; consumers must handle duplicates.
  • Dead-letter topic (DLQ): A topic where messages are sent after repeated delivery failures.
  • Fan-out: Pattern where one published message is delivered to multiple independent consumers via multiple subscriptions.
  • Filtering: Subscription feature to only receive messages matching attribute-based expressions.
  • Message attributes: Key/value metadata attached to a message, often used for routing/filtering.
  • Ordering key: A key used to preserve the order of messages within that key’s stream (when ordering is enabled).
  • Publisher: Client that sends messages to a Pub/Sub topic.
  • Pull subscription: Subscriber pulls messages from Pub/Sub and controls flow/ack.
  • Push subscription: Pub/Sub pushes messages to an HTTPS endpoint.
  • Retention: How long Pub/Sub stores messages for delivery/replay.
  • Schema: A formal definition of message structure (e.g., Avro/Protobuf) used for validation and compatibility.
  • Seek: Reset a subscription to a timestamp or snapshot to replay messages.
  • Snapshot: A saved point-in-time cursor/state of a subscription for replay.

23. Summary

Pub/Sub is Google Cloud’s managed publish/subscribe messaging service and a foundational component for Data analytics and pipelines and event-driven architectures. It provides durable ingestion, scalable fan-out, and operational features like retries, dead-letter topics, retention-based replay, filtering, and integrations with services such as Dataflow and Cloud Run.

Cost and security require intentional design: – Cost is driven by data volume, fan-out (deliveries per subscription), retention, retries, and network egress. Use filtering, control payload size, and monitor retry/DLQ rates. – Security relies on IAM least privilege, service accounts, encryption controls (including CMEK where applicable), and audit logging.

Use Pub/Sub when you need decoupled, scalable event distribution and ingestion on Google Cloud. Pair it with Dataflow/BigQuery for streaming analytics, or with Cloud Run for event-driven services. Next step: build a small streaming pipeline (Pub/Sub → Dataflow → BigQuery) and add production-grade monitoring and DLQ remediation playbooks using the official docs: https://cloud.google.com/pubsub/docs/overview