Azure Event Hubs Tutorial: Architecture, Pricing, Use Cases, and Hands-On Guide for Analytics

Category

Analytics

1. Introduction

Azure Event Hubs is a fully managed, high-throughput event ingestion service designed to collect, buffer, and stream large volumes of events and telemetry into Azure—often as the front door for real-time Analytics pipelines.

In simple terms: producers (apps, devices, services) send streams of events to Event Hubs, and multiple consumers (analytics jobs, stream processors, data platforms) read those events independently at their own pace.

Technically, Event Hubs is a distributed, partitioned streaming platform that supports high ingest rates, parallel consumption via partitions, offset-based reads, configurable retention, and integrations with Azure analytics services. It supports AMQP-based clients and offers a Kafka-compatible endpoint for many Kafka workloads.

The core problem it solves is reliable, scalable event ingestion: decoupling event producers from downstream processing systems so that analytics and data platforms can ingest large streams without being directly coupled to device fleets, microservices, or logs.

Service status / naming: Azure Event Hubs is an active Azure service and remains the current official name. It is part of Azure’s broader messaging and data ingestion ecosystem. (Verify any recent tier/feature changes in official docs and release notes before production rollout.)


2. What is Event Hubs?

Official purpose

Azure Event Hubs is intended for big data streaming ingestion—collecting millions of events per second (scale depends on tier, configuration, and region) from websites, apps, IoT devices, infrastructure, and SaaS systems, and making them available to downstream consumers for processing and Analytics.

Official docs: https://learn.microsoft.com/azure/event-hubs/

Core capabilities (what it does)

  • High-throughput ingestion of event streams.
  • Partitioned event storage to enable parallel consumption.
  • Offset-based consumption with consumer groups.
  • Configurable retention (within tier limits).
  • Event Hubs Capture to automatically land data in Azure Storage / Azure Data Lake Storage for batch Analytics.
  • Security via Azure AD (Entra ID) RBAC and SAS.
  • Networking via public endpoints, firewall rules, and Private Endpoints (Private Link).
  • Kafka-compatible endpoint (Event Hubs for Apache Kafka) for many Kafka client applications.

Major components

  • Event Hubs namespace: A container for Event Hubs and related entities (authorization rules, network settings). Deployed into an Azure subscription and resource group.
  • Event hub: The actual stream (topic-like) entity you send events to and read from.
  • Partitions: Ordered, append-only logs. Events are written to partitions; consumers scale by reading partitions in parallel.
  • Consumer groups: Separate “views” of the stream. Each consumer group has its own offsets/checkpoints.
  • Producer: An app/service sending events to an event hub.
  • Consumer: An app/service reading events, typically using checkpoints stored externally (often Azure Blob Storage when using the Event Processor client model).
  • Throughput/Capacity model: The way you scale ingestion and egress, depending on tier (details in Pricing section).

Service type

  • Fully managed PaaS (Platform as a Service) for streaming ingestion and event distribution.
  • Not a general-purpose database; it’s a streaming ingestion buffer with retention and replay.

Scope and availability model

  • Regional service: an Event Hubs namespace is created in a specific Azure region.
  • Resource model: deployed in an Azure subscription and resource group; accessed via namespace FQDN endpoints.
  • High availability and resiliency depend on tier and configuration (for example, zone redundancy may be available for certain tiers/regions—verify in official docs for your tier and region).

How it fits into the Azure ecosystem (Analytics focus)

Event Hubs commonly sits at the ingestion layer in Azure analytics architectures: – Stream processing: Azure Stream Analytics, Azure Databricks, Apache Spark, Azure Functions – Storage for Analytics: Azure Data Lake Storage Gen2, Azure Blob Storage (via Capture), Azure Synapse – Real-time analytics: Azure Data Explorer (Kusto) can ingest from Event Hubs – Observability: route logs/telemetry to Event Hubs and then to SIEM/analytics systems


3. Why use Event Hubs?

Business reasons

  • Faster time-to-insight: ingest event data continuously for real-time dashboards and anomaly detection.
  • Decouple producers and consumers: teams can add new analytics consumers without changing producers.
  • Managed scale: avoid managing Kafka clusters for many common ingestion scenarios.

Technical reasons

  • High ingest with partitioned parallelism.
  • Multiple consumer groups allow multiple independent processing apps (e.g., fraud detection + BI + archiving) to read the same event stream.
  • Replay capability within retention: consumers can reprocess data for backfills, bug fixes, and model retraining.
  • Protocol flexibility: AMQP clients; Kafka endpoint for many Kafka clients.

Operational reasons

  • Managed service: reduces operational overhead vs self-managed brokers.
  • Integrates with Azure Monitor metrics and logs for monitoring and alerting.
  • Strong identity, networking, and governance integration with Azure.

Security/compliance reasons

  • Azure AD (Entra ID) RBAC for fine-grained access management.
  • Private Link and firewall rules help limit public exposure.
  • Encryption at rest is handled by Azure; customer-managed keys may be available in certain tiers (verify in official docs for your tier).

Scalability/performance reasons

  • Scale is primarily achieved by:
  • Increasing partitions for parallelism
  • Increasing capacity units (tier-dependent) for ingress/egress throughput
  • Using efficient batching and appropriate partition keys

When teams should choose Event Hubs

Choose Event Hubs when you need: – High-volume event ingestion and streaming – Multiple independent consumers – Buffering with replay (retention-based) – Tight Azure integration for Analytics pipelines – Kafka client compatibility without running Kafka clusters (validate Kafka feature parity for your use case)

When teams should not choose Event Hubs

Avoid (or reconsider) Event Hubs when: – You need message-level acknowledgements, dead-letter queues, FIFO with strict per-message semantics, scheduled delivery, or complex routing rules—consider Azure Service Bus. – You need device provisioning, device identity, and IoT-specific features—consider Azure IoT Hub. – You need push-based event routing from Azure services with filtering—consider Azure Event Grid. – You need long-term storage as the system of record—use ADLS/Blob/Database; Event Hubs retention is limited and cost-optimized for streaming, not indefinite storage.


4. Where is Event Hubs used?

Industries

  • IoT and manufacturing (sensor telemetry ingestion)
  • Finance (market data, risk signals, fraud analytics)
  • Retail/e-commerce (clickstream analytics, personalization)
  • Media/gaming (player events, session analytics)
  • Transportation/logistics (vehicle telemetry, route optimization)
  • Security/IT operations (log/metric streaming, SIEM pipelines)

Team types

  • Data engineering and Analytics teams
  • Platform engineering and SRE teams
  • Application developers building event-driven systems
  • Security engineering teams streaming audit logs

Workloads

  • Telemetry ingestion for dashboards and alerting
  • Stream processing and anomaly detection
  • Data ingestion for lakehouse architectures
  • Log streaming from infrastructure and apps

Architectures

  • Event ingestion layer for a lakehouse (Event Hubs → ADLS/Delta)
  • Real-time analytics (Event Hubs → Stream Analytics / Databricks → ADX/Synapse)
  • Event-driven microservices (Event Hubs as the event backbone for high-volume event streams)
  • Hybrid ingestion (on-prem systems → Event Hubs via VPN/ExpressRoute + Private Link)

Production vs dev/test usage

  • Production: careful capacity planning, RBAC, Private Link, diagnostic settings, alerting, and consumer scaling patterns.
  • Dev/test: smaller tiers/capacity, fewer partitions, shorter retention, and simplified security (but still avoid embedding secrets in code).

5. Top Use Cases and Scenarios

Below are realistic Event Hubs scenarios with the “why” and a concrete example.

1) IoT telemetry ingestion for real-time Analytics

  • Problem: Millions of devices produce telemetry; downstream analytics must scale independently.
  • Why Event Hubs fits: High-throughput ingestion + partitioned consumption + multiple consumer groups.
  • Example: Smart meters send readings to Event Hubs; Stream Analytics computes rolling averages; Capture archives raw data to ADLS.

2) Clickstream ingestion for product Analytics

  • Problem: Web/app click events spike unpredictably; analytics and data science need consistent ingestion.
  • Why it fits: Buffers spikes; supports replay; integrates with Databricks/Spark.
  • Example: Mobile app sends JSON events; Databricks Structured Streaming enriches and writes to Delta Lake.

3) Centralized application log streaming

  • Problem: Microservices logs are scattered; security and SRE teams need centralized processing.
  • Why it fits: High-volume ingestion and multiple consumers (SIEM, troubleshooting, cost analytics).
  • Example: Apps push structured logs to Event Hubs; one consumer writes to ADX for query; another forwards to SIEM.

4) Security audit pipeline for near-real-time detection

  • Problem: Need fast detection of suspicious patterns across systems.
  • Why it fits: Low-latency streaming to analytics engines; consumer isolation by groups.
  • Example: Identity/audit events sent to Event Hubs; Azure Functions triggers enrichment; ADX runs detection queries.

5) Streaming ingestion into Azure Data Explorer

  • Problem: Need fast, interactive queries over incoming telemetry.
  • Why it fits: Event Hubs is a common ingestion source for ADX.
  • Example: Devices stream metrics to Event Hubs; ADX ingests and provides dashboards and ad-hoc queries.

6) Kafka client migration without running Kafka

  • Problem: Applications are built on Kafka APIs; ops team doesn’t want to manage Kafka clusters.
  • Why it fits: Kafka-compatible endpoint supports many Kafka producers/consumers (validate required features).
  • Example: Java services using Kafka producer API publish to Event Hubs; a Spark job reads via Kafka connector.

7) Order/event stream fan-out to multiple processing apps

  • Problem: Many downstream services need the same event stream for different purposes.
  • Why it fits: Consumer groups let each service maintain independent offsets.
  • Example: “Orders” event hub is consumed by fraud scoring, customer notifications, and data lake archiving.

8) Telematics ingestion for fleet Analytics

  • Problem: Vehicles produce location and diagnostics continuously; must handle intermittent connectivity.
  • Why it fits: Event Hubs handles ingestion bursts; downstream processing scales.
  • Example: Telematics gateway batches events to Event Hubs; consumer enriches with geofencing.

9) Real-time feature generation for ML

  • Problem: ML models require streaming features and near-real-time aggregation.
  • Why it fits: Stream processors can consume from Event Hubs and produce features to online stores.
  • Example: Event Hubs → Databricks streaming → feature store / Redis; model inference uses updated features.

10) Change data capture (CDC) event distribution

  • Problem: Downstream services need database change events without tight coupling.
  • Why it fits: Event Hubs can be a distribution layer for CDC outputs from connectors.
  • Example: Debezium outputs changes; events go to Event Hubs; consumers update caches and analytics stores.

11) Streaming integration hub for SaaS events

  • Problem: Need to unify event streams from multiple SaaS tools.
  • Why it fits: Central ingestion with routing by event hub/entity and consumer groups.
  • Example: Webhooks land in an API; API forwards normalized events to Event Hubs for processing.

12) Batch + streaming dual path using Capture

  • Problem: Need both real-time processing and long-term raw storage.
  • Why it fits: Capture provides automatic landing in storage while consumers process in real time.
  • Example: Real-time anomaly detection via Functions; raw events archived in ADLS via Capture.

6. Core Features

Namespaces and event hubs (entities)

  • What it does: Organizes event streaming resources; namespaces contain event hubs and policies.
  • Why it matters: Enables consistent network/security configuration and governance per domain.
  • Practical benefit: Separate namespaces per environment (dev/test/prod) or business unit.
  • Caveats: Namespace-level settings (network, auth) affect all event hubs within it.

Partitions (parallelism + ordering within partition)

  • What it does: Events are appended to partitions; ordering is guaranteed within a partition.
  • Why it matters: Allows horizontal scaling of both ingestion and consumption.
  • Practical benefit: Consumers can process partitions in parallel; producers can route related events using a partition key.
  • Caveats: No global ordering across partitions; changing partition count after creation may be limited or tier-dependent (verify in official docs).

Consumer groups

  • What it does: Separate offset/checkpoint tracking per group.
  • Why it matters: Multiple apps can independently read the same stream without interfering.
  • Practical benefit: Add new analytics pipelines without impacting existing ones.
  • Caveats: Too many consumer groups increase management overhead; enforce naming conventions.

Retention and replay

  • What it does: Stores events for a configured retention window; consumers can replay by reading from earlier offsets.
  • Why it matters: Supports reprocessing and recovery from downstream outages.
  • Practical benefit: Fix a bug in stream processing and re-run the pipeline from a prior offset.
  • Caveats: Retention limits depend on tier and configuration. Verify retention maximums for your tier in official docs.

Event Hubs Capture

  • What it does: Automatically writes incoming events to Azure Storage / ADLS Gen2 in batches (time/size windows).
  • Why it matters: Creates a durable landing zone for batch Analytics and compliance archiving.
  • Practical benefit: Near “set-and-forget” raw data archival without building a consumer.
  • Caveats: Adds storage and transaction costs; file format/partitioning settings affect downstream processing efficiency.

Kafka-compatible endpoint (Event Hubs for Apache Kafka)

  • What it does: Allows Kafka clients to produce/consume using Kafka protocol semantics.
  • Why it matters: Reduces migration effort for Kafka-based apps.
  • Practical benefit: Keep Kafka client libraries while using Azure-managed service.
  • Caveats: Not all Kafka features are identical. Validate required semantics (e.g., transactions/exactly-once) against current Microsoft documentation.

Authentication and authorization (Azure AD RBAC and SAS)

  • What it does: Supports Azure AD (Entra ID) role-based access and Shared Access Signatures (SAS) via policies.
  • Why it matters: Centralized, auditable identity is safer and easier to govern than shared keys.
  • Practical benefit: Use managed identities for Azure-hosted consumers/producers.
  • Caveats: Legacy SAS keys are powerful—treat like secrets; rotate and avoid embedding in code.

Networking controls (firewall, virtual network integration, Private Link)

  • What it does: Restricts access by IP rules and/or Private Endpoints.
  • Why it matters: Reduces data exfiltration risk and public exposure.
  • Practical benefit: Keep ingestion private over Private Link from VNets.
  • Caveats: Private Link introduces DNS considerations (private DNS zones) and may affect client connectivity if misconfigured.

Geo-disaster recovery (DR) / alias (where supported)

  • What it does: Provides a way to pair namespaces and use an alias for failover (capability and behavior depend on tier and current docs).
  • Why it matters: Improves business continuity for critical ingestion pipelines.
  • Practical benefit: Fail over producers/consumers by switching alias to secondary namespace.
  • Caveats: Understand what metadata vs data is replicated and what “failover” means for your scenario—verify in official docs.

Monitoring and diagnostics integration

  • What it does: Emits metrics to Azure Monitor; supports diagnostic settings to Log Analytics/Event Hubs/Storage (options depend on Azure Monitor capabilities).
  • Why it matters: Streaming systems must be monitored for backlog, throttling, failures, and consumer lag.
  • Practical benefit: Alert on throttled requests, server errors, or unexpected throughput patterns.
  • Caveats: Diagnostic logs can generate significant log ingestion costs.

7. Architecture and How It Works

High-level architecture

Event Hubs sits between producers and consumers: 1. Producers send events to a specific event hub in a namespace. 2. Events are distributed across partitions (based on partition key or round-robin). 3. Consumers read from partitions using offsets. Each consumer group tracks its own position. 4. Optionally, Capture writes events to Storage/ADLS for batch Analytics. 5. Downstream systems (Stream Analytics, Databricks, ADX, Functions) process events and store results.

Request/data/control flow

  • Data plane:
  • Producers: Send events (batched for efficiency)
  • Consumers: Receive events (pull model), checkpoint offsets externally (common pattern)
  • Control plane:
  • Azure Resource Manager (ARM) operations create namespaces/event hubs, configure networking, policies, capture, and diagnostics.

Common integrations (Azure Analytics ecosystem)

  • Azure Stream Analytics: native connectors for Event Hubs input.
  • Azure Functions: Event Hubs trigger for serverless processing.
  • Azure Databricks / Apache Spark: read from Event Hubs for structured streaming (via supported connectors).
  • Azure Data Explorer (Kusto): ingest from Event Hubs for fast query.
  • Azure Synapse Analytics: integrate via Spark or pipelines (depends on design).

Dependency services (typical)

  • Azure Storage: for Event Processor checkpointing and/or Capture output.
  • Azure Monitor / Log Analytics: for metrics and diagnostic logs.
  • Azure Key Vault: to store connection strings if SAS is used (or store app secrets).

Security/authentication model

  • Azure AD (Entra ID):
  • Preferred for Azure-hosted workloads via managed identities.
  • Assign roles (for example, “Azure Event Hubs Data Sender” / “Azure Event Hubs Data Receiver” roles—verify exact role names in Azure docs).
  • SAS (Shared Access Signatures):
  • Namespace or event hub authorization rules.
  • Keys/connection strings grant send/listen/manage rights depending on policy.

Networking model

  • Default: public endpoint secured with auth.
  • Hardened:
  • Firewall rules (allowed IP ranges)
  • Private Endpoints (Private Link) for private access from VNets
  • Careful DNS configuration for private endpoint FQDN resolution

Monitoring/logging/governance considerations

  • Track:
  • Ingress/egress throughput
  • Server errors
  • Throttling
  • Consumer lag (often from consumer-side metrics; Event Hubs metrics help but lag is usually measured in your consumer app)
  • Use:
  • Azure Monitor metrics and alerts
  • Diagnostic logs for audit/troubleshooting (mind the cost)
  • Governance:
  • Use naming standards and tags
  • Separate prod vs non-prod subscriptions/resource groups
  • Lock down “Owner” rights; use least privilege

Simple architecture diagram (Mermaid)

flowchart LR
  P[Producers\nApps/Devices/Services] --> EH[(Azure Event Hubs)]
  EH --> C1[Consumer Group A\nStream Processing]
  EH --> C2[Consumer Group B\nMonitoring/SIEM]
  EH --> CAP[Capture (optional)] --> ADLS[(Azure Storage / ADLS Gen2)]

Production-style architecture diagram (Mermaid)

flowchart TB
  subgraph OnPrem["On-prem / Edge"]
    D1[Devices / Agents]
    GW[Gateway / Collector]
    D1 --> GW
  end

  subgraph AzureNet["Azure Virtual Network"]
    PE[Private Endpoint\n(Event Hubs)]
    FN[Azure Functions\n(Enrichment)]
    DBX[Databricks / Spark\n(Streaming ETL)]
    MON[Azure Monitor\nMetrics/Alerts]
  end

  subgraph EHNS["Azure Event Hubs Namespace (Regional)"]
    EH1[(Event hub: telemetry)]
    EH2[(Event hub: logs)]
  end

  subgraph DataPlatform["Analytics & Storage"]
    ADLS[(ADLS Gen2 / Blob)]
    ADX[(Azure Data Explorer)]
    SYN[(Synapse / Lakehouse)]
  end

  GW -->|AMQP or Kafka| PE --> EH1
  EH1 --> FN --> EH2
  EH1 --> DBX --> SYN
  EH1 --> ADX
  EH1 -->|Capture| ADLS

  EHNS --> MON

8. Prerequisites

Account/subscription/tenant requirements

  • An active Azure subscription with permissions to create resources.
  • A resource group in a supported region.

Permissions / IAM roles

For the lab: – Control plane: ability to create resource group, Event Hubs namespace, and event hub (e.g., Contributor on the resource group). – Data plane: ability to send/receive events: – If using Azure AD: assign appropriate Event Hubs data roles to your identity (verify role names and scope). – If using SAS: permission to list keys/connection strings for an authorization rule.

Billing requirements

  • Event Hubs is a paid service (tier-based). Ensure billing is enabled.

Tools needed

  • Azure CLI: https://learn.microsoft.com/cli/azure/install-azure-cli
  • Python 3.9+ (3.10+ recommended)
  • Python packages:
  • azure-eventhub
  • Optional:
  • pipx or virtualenv tooling

Region availability

  • Event Hubs is available in many Azure regions, but not every feature/tier is available everywhere.
  • Verify the region supports your required tier (Basic/Standard/Premium/Dedicated) and features (e.g., zones, private link) in official docs.

Quotas/limits (high level)

Common limits you must plan for (details are tier-dependent): – Maximum number of partitions per event hub – Maximum retention period – Throughput/capacity scaling limits per namespace – Number of consumer groups – Concurrent connections and request limits

Always validate current limits in official documentation: https://learn.microsoft.com/azure/event-hubs/event-hubs-quotas

Prerequisite services (optional but common)

  • Azure Storage account:
  • For Event Processor checkpointing (recommended for many consumer patterns)
  • For Event Hubs Capture

9. Pricing / Cost

Event Hubs pricing is tiered and usage-based, and exact costs vary by: – Region – Tier/SKU (Basic, Standard, Premium, Dedicated) – Capacity configuration (throughput units, processing units, or dedicated capacity) – Data volume and feature usage (Capture, networking, monitoring)

Official pricing page (always confirm latest):
https://azure.microsoft.com/pricing/details/event-hubs/

Azure Pricing Calculator:
https://azure.microsoft.com/pricing/calculator/

Pricing dimensions (what you pay for)

Pricing details change over time, but commonly include:

1) Tier / capacity units

  • Basic / Standard often scale with Throughput Units (TU) (capacity allocated per hour).
  • Premium often scales with Processing Units (PU) (dedicated resources within a shared cluster model).
  • Dedicated uses reserved capacity units for very high throughput and isolation.

Verify exact capacity model and billing meter names for your chosen tier on the official pricing page.

2) Ingress and/or operations

Depending on tier, you may see charges based on: – Number of events/operations ingested (often per million events) – Data throughput and request operations

Because these meters vary by SKU and can change, do not assume a specific combination—confirm for your tier in the pricing page.

3) Capture and Storage

If you enable Event Hubs Capture: – You pay for Azure Storage (data at rest) and storage transactions. – Capture may create many files/transactions depending on time/size window settings.

4) Networking

  • Public ingress to Event Hubs is included in the service cost, but:
  • Private Endpoints (Private Link) can add cost (Private Link billing) and operational complexity.
  • Data transfer costs:
  • Intra-region vs inter-region data movement can affect your overall bill.
  • Egress from Azure to the internet or across regions may be charged (depends on traffic path and Azure bandwidth pricing).

5) Monitoring and logging

  • Azure Monitor metrics are generally available, but:
  • Diagnostic logs exported to Log Analytics can incur ingestion and retention costs.
  • Exporting to Storage or Event Hubs can incur storage and transaction costs.

Cost drivers (what makes bills go up)

  • Overprovisioned throughput/capacity units (running 24/7)
  • High ingress volume (events/sec, KB per event)
  • Many consumer apps causing higher egress and reads
  • Large retention settings (if tier supports) leading to higher storage overhead (implementation is managed but capacity affects pricing in some tiers)
  • Capture output configuration producing many small files and high transaction counts
  • Extensive diagnostic logging to Log Analytics

Hidden or indirect costs

  • Downstream consumers: Stream Analytics, Databricks, Functions, ADX ingestion all have their own costs.
  • Checkpoint storage (Blob) for consumer offset tracking.
  • Private DNS and network operations overhead for Private Link.

How to optimize cost

  • Right-size tier:
  • Use lower tiers for dev/test and non-critical streams.
  • Use Premium/Dedicated only when you need strict isolation, higher throughput, or advanced needs.
  • Avoid “always-on” high capacity if workload is bursty:
  • Consider autoscale options where supported (e.g., auto-inflate in some tiers—verify availability and behavior).
  • Batch producer sends to reduce protocol overhead and improve throughput efficiency.
  • Limit diagnostic logs to what you truly need; set retention policies in Log Analytics.
  • Design Capture settings (time/size windows) to avoid too many small files.
  • Separate high-volume noisy streams from critical streams (different event hubs/namespaces) for easier cost attribution.

Example low-cost starter estimate (conceptual)

A low-cost lab setup typically includes: – One namespace in Basic/Standard tier (depending on requirements) – One small number of partitions – Public network access – No Capture, minimal diagnostics

Because pricing varies by region and SKU, use the Pricing Calculator and set: – Tier and capacity (e.g., 1 TU or equivalent for the minimum configuration in your region) – Expected ingress events/day – Optional features (Capture, Private Link, logging)

Example production cost considerations (conceptual)

For production, budget for: – Capacity to cover peak ingest (and headroom) – Multiple consumer groups and read traffic – Capture storage and transactions (if enabled) – Monitoring/logging – Network hardening (Private Link) – DR strategy (paired namespaces, multi-region consumers—verify supported patterns)


10. Step-by-Step Hands-On Tutorial

Objective

Create an Azure Event Hubs namespace and an event hub, then: 1. Send events using Python. 2. Receive events using Python. 3. Validate in Azure portal/metrics. 4. Clean up resources to avoid ongoing cost.

This lab uses SAS connection strings for simplicity. For production, prefer Azure AD RBAC + managed identities where possible.

Lab Overview

You will build this: – Producer script sends a small batch of JSON events to Event Hubs. – Consumer script reads events from the beginning of the partition(s) for a short time.

You will use: – Azure CLI to provision resources – Python azure-eventhub SDK to send/receive events

Step 1: Sign in and select subscription

  1. Install Azure CLI if needed: https://learn.microsoft.com/cli/azure/install-azure-cli
  2. Sign in:
az login
  1. (Optional) Select the correct subscription:
az account set --subscription "<SUBSCRIPTION_ID_OR_NAME>"
az account show --output table

Expected outcome: Azure CLI shows your active subscription.


Step 2: Create a resource group

Choose a region that supports Event Hubs and is close to your consumers.

RG="rg-eh-lab"
LOCATION="eastus"   # change if needed

az group create \
  --name "$RG" \
  --location "$LOCATION"

Expected outcome: Resource group is created successfully.


Step 3: Create an Event Hubs namespace

Pick a globally unique namespace name.

EHNS="ehns$RANDOM$RANDOM"  # quick uniqueness; you can also choose a DNS-safe name
SKU="Standard"             # Standard is commonly used for labs; verify your subscription supports it

az eventhubs namespace create \
  --resource-group "$RG" \
  --name "$EHNS" \
  --location "$LOCATION" \
  --sku "$SKU"

Expected outcome: Namespace is created. This can take a minute.

Verify:

az eventhubs namespace show \
  --resource-group "$RG" \
  --name "$EHNS" \
  --query "{name:name, location:location, sku:sku.name, status:status}"

Step 4: Create an event hub and consumer group

Create an event hub with a small number of partitions (start small for labs).

EH="telemetry"
PARTITIONS=2
RETENTION_DAYS=1  # keep small for lab; allowed values depend on tier

az eventhubs eventhub create \
  --resource-group "$RG" \
  --namespace-name "$EHNS" \
  --name "$EH" \
  --partition-count $PARTITIONS \
  --message-retention $RETENTION_DAYS

Create a dedicated consumer group for the lab consumer:

CG="lab-consumer"

az eventhubs eventhub consumer-group create \
  --resource-group "$RG" \
  --namespace-name "$EHNS" \
  --eventhub-name "$EH" \
  --name "$CG"

Expected outcome: Event hub and consumer group exist.

Verify:

az eventhubs eventhub show \
  --resource-group "$RG" \
  --namespace-name "$EHNS" \
  --name "$EH" \
  --query "{name:name, partitions:partitionCount, retention:messageRetentionInDays}"

Step 5: Get a connection string (SAS) for sending/receiving

Event Hubs namespaces typically have default authorization rules (often RootManageSharedAccessKey). For a lab, you can use it, but for production create least-privilege policies (Send/Listen).

List authorization rules:

az eventhubs namespace authorization-rule list \
  --resource-group "$RG" \
  --namespace-name "$EHNS" \
  --output table

Fetch the primary connection string for a rule (commonly RootManageSharedAccessKey):

RULE="RootManageSharedAccessKey"

CONN_STR=$(az eventhubs namespace authorization-rule keys list \
  --resource-group "$RG" \
  --namespace-name "$EHNS" \
  --name "$RULE" \
  --query primaryConnectionString \
  --output tsv)

echo "$CONN_STR"

Expected outcome: You have a connection string. Treat it like a secret.


Step 6: Create a Python virtual environment and install SDK

python3 -m venv .venv
source .venv/bin/activate   # Linux/macOS
# .\.venv\Scripts\activate  # Windows PowerShell

pip install --upgrade pip
pip install azure-eventhub

Expected outcome: azure-eventhub installs successfully.


Step 7: Write and run a producer script (send events)

Create producer.py:

import os
import json
import time
from azure.eventhub import EventHubProducerClient, EventData

CONNECTION_STR = os.environ["EVENTHUB_CONNECTION_STRING"]
EVENTHUB_NAME = os.environ["EVENTHUB_NAME"]

producer = EventHubProducerClient.from_connection_string(
    conn_str=CONNECTION_STR,
    eventhub_name=EVENTHUB_NAME,
)

def main():
    with producer:
        batch = producer.create_batch()
        for i in range(10):
            payload = {
                "deviceId": "device-001",
                "sequence": i,
                "temperatureC": 20 + i * 0.5,
                "ts": time.time(),
            }
            batch.add(EventData(json.dumps(payload)))
        producer.send_batch(batch)
    print("Sent 10 events.")

if __name__ == "__main__":
    main()

Set environment variables and run:

export EVENTHUB_CONNECTION_STRING="$CONN_STR"
export EVENTHUB_NAME="$EH"

python producer.py

Expected outcome: Sent 10 events.


Step 8: Write and run a consumer script (receive events)

Create consumer.py:

import os
import asyncio
from azure.eventhub.aio import EventHubConsumerClient

CONNECTION_STR = os.environ["EVENTHUB_CONNECTION_STRING"]
EVENTHUB_NAME = os.environ["EVENTHUB_NAME"]
CONSUMER_GROUP = os.environ["EVENTHUB_CONSUMER_GROUP"]

async def on_event(partition_context, event):
    # Print the event body
    body = event.body_as_str(encoding="UTF-8")
    print(f"Partition: {partition_context.partition_id}  Offset: {event.offset}  Body: {body}")

    # In real apps, checkpoint to durable storage to persist progress.
    # This simple lab does not checkpoint.

async def main():
    client = EventHubConsumerClient.from_connection_string(
        conn_str=CONNECTION_STR,
        consumer_group=CONSUMER_GROUP,
        eventhub_name=EVENTHUB_NAME,
    )

    async with client:
        # Read from the beginning of the retention window
        await client.receive(
            on_event=on_event,
            starting_position="-1",
            max_wait_time=10,  # seconds
        )

if __name__ == "__main__":
    asyncio.run(main())

Run it:

export EVENTHUB_CONNECTION_STRING="$CONN_STR"
export EVENTHUB_NAME="$EH"
export EVENTHUB_CONSUMER_GROUP="$CG"

python consumer.py

Expected outcome: You should see several lines printed with partition IDs, offsets, and JSON payloads.

Notes: – If you run the consumer before sending, it may wait up to max_wait_time and exit with no events printed. – If you run the producer again, then rerun the consumer, you should see events again (because we start from -1 and are not checkpointing).


Step 9 (Optional): Validate in Azure Portal and metrics

  1. Go to the Azure portal.
  2. Navigate to your namespace: Event Hubs Namespace → Event Hubs → telemetry.
  3. Check Metrics for: – Incoming Messages / Incoming Requests – Successful Requests – Server Errors / Throttled Requests (should be near zero for this lab)

Expected outcome: Metrics reflect send/receive activity (metrics can take a short time to appear).


Validation

Use these quick checks:

  1. Confirm the event hub exists:
az eventhubs eventhub show \
  --resource-group "$RG" \
  --namespace-name "$EHNS" \
  --name "$EH" \
  --query "name" \
  --output tsv
  1. Confirm producer success: – Producer prints Sent 10 events. – Portal metrics show incoming activity (may lag).

  2. Confirm consumer output: – Consumer prints event bodies and offsets.


Troubleshooting

Error: Unauthorized / authentication failures

  • Cause: Wrong connection string, wrong policy rights, or using event-hub-level vs namespace-level incorrectly.
  • Fix:
  • Re-check EVENTHUB_CONNECTION_STRING.
  • Ensure the authorization rule has Send for producer and Listen for consumer (or Manage for lab use).
  • Prefer creating two separate rules (Send-only and Listen-only) for better practice.

Error: Name or service not known / DNS resolution issues

  • Cause: Network restrictions, DNS issues, or Private Endpoint DNS not set up.
  • Fix:
  • For a public lab, ensure the namespace allows public access and your client network can reach it.
  • If using Private Link, configure private DNS zone for privatelink.servicebus.windows.net (verify the exact zone in official docs) and ensure your VM/client resolves correctly.

Consumer receives nothing

  • Cause: No events in retention window, wrong consumer group, or consumer started before producer and timed out quickly.
  • Fix:
  • Run producer again.
  • Increase max_wait_time.
  • Ensure you used the correct consumer group name.

Throttling / ServerBusy / throughput exceeded

  • Cause: Capacity too small for your load.
  • Fix:
  • Reduce event rate for lab.
  • Increase capacity units (tier-dependent).
  • Batch sends and optimize payload size.

Cleanup

To avoid ongoing charges, delete the resource group (this deletes the namespace and event hub):

az group delete --name "$RG" --yes --no-wait

Expected outcome: Resource group deletion starts; resources are removed shortly after.


11. Best Practices

Architecture best practices

  • Design for replay: downstream consumers should be able to reprocess from offsets; keep idempotent writes where possible.
  • Separate streams by domain: use different event hubs (or namespaces) for unrelated workloads to reduce blast radius.
  • Plan partitioning early: partitions are central to scaling. Choose partition count based on expected parallelism and future growth.
  • Use Capture for raw archive: if you need long-term storage, use Capture to ADLS/Blob and treat Event Hubs as ingestion + short-term replay.

IAM/security best practices

  • Prefer Azure AD RBAC + managed identities for Azure-hosted producers/consumers.
  • If using SAS:
  • Create least-privilege policies: Send-only and Listen-only
  • Rotate keys regularly
  • Store secrets in Azure Key Vault
  • Apply RBAC at the smallest practical scope (event hub vs namespace vs resource group), balanced with manageability.

Cost best practices

  • Right-size tier and capacity; avoid paying for unused throughput.
  • Use batching and efficient serialization (consider Avro/Protobuf where appropriate).
  • Keep diagnostic logs targeted; set Log Analytics retention intentionally.
  • Evaluate Capture file sizing to avoid small-file problems (cost + performance downstream).

Performance best practices

  • Batch sends to reduce overhead and increase throughput.
  • Use partition keys to preserve order for related event streams and distribute load.
  • Keep event size small and consistent; large payloads reduce throughput and can increase costs.
  • Scale consumers horizontally using partition-based processing patterns.

Reliability best practices

  • Implement retries with exponential backoff for transient errors.
  • Use multiple consumer instances with cooperative partition balancing (supported in many SDK patterns).
  • Use checkpointing to a durable store (commonly Azure Blob Storage) for consumer progress.

Operations best practices

  • Monitor:
  • incoming/outgoing requests
  • errors
  • throttling
  • consumer lag (app-level)
  • Use Azure Policy, tags, and locks for governance.
  • Separate environments (dev/test/prod) and restrict production changes.

Governance/tagging/naming best practices

  • Naming: include environment and region, e.g., ehns-prod-eus-telemetry
  • Tags: env, owner, costCenter, dataClassification, system
  • Maintain an inventory of namespaces, event hubs, consumer groups, and their owners.

12. Security Considerations

Identity and access model

  • Best: Azure AD (Entra ID) with RBAC roles for data plane access.
  • Alternative: SAS authorization rules with shared keys.
  • Principle of least privilege:
  • Separate send and receive permissions.
  • Avoid using “manage” keys in applications.

Official docs (security/auth):
https://learn.microsoft.com/azure/event-hubs/authorize-access-event-hubs

Encryption

  • Data is encrypted at rest by Azure (platform-managed keys by default).
  • Customer-managed keys (CMK) may be supported depending on tier and configuration—verify in official docs for your tier.

Network exposure

  • Prefer private access for production:
  • Private Endpoint (Private Link)
  • Disable public network access if your architecture allows
  • If public access is required:
  • Restrict with firewall rules
  • Use TLS and strong auth
  • Monitor access patterns

Secrets handling

  • Never commit connection strings to Git.
  • Store SAS connection strings in Key Vault or environment-secured secret stores.
  • Rotate secrets and implement a rollover strategy.

Audit/logging

  • Enable diagnostic settings strategically:
  • Audit management-plane actions via Azure Activity Log
  • Use diagnostic logs for operational troubleshooting when necessary
  • Consider forwarding relevant logs to your SIEM (cost-aware).

Compliance considerations

  • Data residency: namespace region determines where data is processed/stored (within retention and platform behavior).
  • Data classification: treat event streams as sensitive—apply tags, access controls, and network protections accordingly.
  • For regulated workloads, validate compliance offerings and certifications in Azure compliance documentation and Event Hubs-specific guidance.

Common security mistakes

  • Using RootManageSharedAccessKey in production apps
  • Leaving public access open without firewall restrictions
  • Over-permissioned identities (Owner/Contributor) for data plane tasks
  • No key rotation or secret governance
  • Missing alerts for throttling/errors that indicate abuse or misconfiguration

Secure deployment recommendations

  • Use Azure AD + managed identities where possible.
  • Private Link for critical streams.
  • Separate namespaces for prod vs dev/test.
  • Centralize monitoring and enforce policies (Azure Policy) to prevent insecure configurations.

13. Limitations and Gotchas

Always confirm current service limits in official docs:
https://learn.microsoft.com/azure/event-hubs/event-hubs-quotas

Common limitations and gotchas include:

  • Ordering is per partition only: no global ordering.
  • Exactly-once processing is not guaranteed by default: design consumers for at-least-once and idempotency.
  • Partition count planning: partitions determine parallelism; changing partitions later may be limited or operationally complex (verify current behavior).
  • Retention is not indefinite: Event Hubs is not long-term storage.
  • Consumer lag visibility: you often need consumer-side metrics/checkpointing to measure lag accurately.
  • Capture creates many files if configured with small time/size windows (downstream small-file problem).
  • Kafka compatibility is not perfect parity: validate required Kafka features and client compatibility.
  • Private Link DNS: misconfigured private DNS is a common cause of outages.
  • Cross-region considerations: producers/consumers in different regions can add latency and bandwidth cost.
  • Quota errors: hitting throughput/capacity limits can manifest as throttling/server busy responses.

14. Comparison with Alternatives

How to choose the right tool

Event Hubs is primarily for high-throughput event streaming ingestion. Alternatives differ by semantics (queues vs streams), push vs pull, retention, and operational model.

Option Best For Strengths Weaknesses When to Choose
Azure Event Hubs High-throughput streaming ingestion for Analytics Partitioned scale, replay within retention, multiple consumer groups, Azure Analytics integrations, Kafka endpoint Not a queue, limited retention, ordering only per partition Telemetry/clickstreams/log streams; multiple analytics consumers
Azure Service Bus Enterprise messaging (queues/topics) Dead-lettering, sessions, FIFO patterns, message settlement, scheduled delivery Not optimized for huge streaming throughput Business workflows, command messaging, integration patterns requiring message semantics
Azure Event Grid Event routing (push) for Azure resources and SaaS Push delivery, filtering, lightweight integration events Not for high-volume streaming telemetry; different semantics Reactive integrations, “something happened” events from Azure services
Azure IoT Hub IoT device connectivity and management Device identity, device management, IoT protocols, per-device security IoT-specific; not a generic streaming bus Device fleets and IoT solutions needing device-level features
Azure Data Explorer ingestion Fast analytics queries Powerful query engine; near real-time analytics Not a message bus; ingestion costs When interactive analytics is the primary goal and you already chose ADX
Apache Kafka (self-managed) Full Kafka control and ecosystem Full feature set, broad ecosystem Ops overhead, scaling/patching/availability burden When you require Kafka features not supported/available in Event Hubs or need on-prem control
AWS Kinesis Data Streams Streaming ingestion on AWS Managed streaming on AWS Different ecosystem If you are primarily on AWS
Google Cloud Pub/Sub Global messaging/streaming on GCP Simple model, global service Different ecosystem and semantics If you are primarily on GCP

15. Real-World Example

Enterprise example: Manufacturing telemetry + real-time Analytics

  • Problem: A manufacturer collects telemetry from factories worldwide. They need real-time monitoring (seconds), long-term storage for model training, and independent consumer teams (quality, operations, R&D).
  • Proposed architecture:
  • Edge gateway batches telemetry and sends to Event Hubs (regional namespaces per geography).
  • Consumer group 1: Stream processing (Databricks or Stream Analytics) computes real-time KPIs and writes to Azure Data Explorer for dashboards.
  • Consumer group 2: An enrichment service adds reference data and writes curated streams to downstream systems.
  • Capture writes raw events to ADLS Gen2 for lakehouse storage and reprocessing.
  • Private connectivity using Private Link; identities via Azure AD managed identities.
  • Why Event Hubs was chosen:
  • High-volume ingestion, replay for backfills, and fan-out to multiple analytics teams.
  • Strong Azure ecosystem integration and managed operations.
  • Expected outcomes:
  • Reliable ingestion even during spikes.
  • Faster incident detection and operational insights.
  • Reduced operational overhead vs self-managed streaming infrastructure.

Startup/small-team example: SaaS product clickstream Analytics

  • Problem: A startup wants clickstream analytics and anomaly detection with minimal ops. They need to ingest events from their web app and run near-real-time dashboards.
  • Proposed architecture:
  • Web app posts click events to a small ingestion API.
  • API publishes to Event Hubs.
  • A small consumer (Functions or container app) aggregates metrics and stores them in a database for dashboards.
  • Optional Capture to Blob for raw archives once volume grows.
  • Why Event Hubs was chosen:
  • Managed ingestion and replay without managing Kafka.
  • Easy to add a second consumer later (e.g., experimentation platform).
  • Expected outcomes:
  • Quick implementation with predictable scaling path.
  • Low operational overhead and straightforward cost attribution.

16. FAQ

  1. Is Event Hubs a queue?
    Not exactly. Event Hubs is a streaming ingestion service with partitioned logs and retention-based replay. For queue semantics (dead-lettering, settlement, FIFO sessions), look at Azure Service Bus.

  2. What’s the difference between Event Hubs and Event Grid?
    Event Grid is a push-based event routing service for integration events. Event Hubs is for high-throughput telemetry/event streaming with retention and replay.

  3. How does Event Hubs scale?
    Scaling depends on tier: typically by increasing capacity (e.g., throughput/processing units) and designing partitioning for parallelism.

  4. Do events guarantee ordering?
    Ordering is guaranteed within a partition only. There is no global ordering across partitions.

  5. How long does Event Hubs store data?
    Data is stored for a configured retention period, subject to tier limits. Verify exact retention max for your tier in official docs.

  6. Can I replay events from yesterday?
    Yes, if your retention window includes that time. Consumers can read from earlier offsets/timestamps within retention.

  7. How do multiple applications read the same events?
    Use consumer groups. Each consumer group maintains independent offsets.

  8. What is checkpointing and why do I need it?
    Checkpointing stores progress (offset) so a consumer can resume after restarts without reprocessing. Many SDK patterns store checkpoints in Azure Blob Storage.

  9. Does Event Hubs support exactly-once processing?
    Generally, consumers should assume at-least-once delivery and design for idempotency. Kafka-specific semantics depend on compatibility features—verify your requirements against official docs.

  10. When should I use the Kafka endpoint?
    When you already have Kafka clients/connectors and want to reduce migration effort. Validate compatibility and required Kafka features.

  11. Is Capture a replacement for a consumer?
    Capture is for archival/landing data to storage. You still use consumers for real-time processing and business logic.

  12. Can I restrict Event Hubs to a VNet only?
    Yes, using Private Endpoints and disabling public network access (where supported). Plan DNS carefully.

  13. How do I secure producer/consumer apps in Azure?
    Prefer managed identity + Azure AD RBAC. Avoid storing connection strings where possible.

  14. What’s the best way to monitor consumer lag?
    Combine Event Hubs metrics with consumer-side checkpoint metrics (last processed offset/time) and alert on backlog growth.

  15. Can I use Event Hubs for commands to devices/services?
    It’s possible but often not ideal; commands usually require queue semantics and acknowledgements. Consider Service Bus or IoT Hub depending on scenario.

  16. How many partitions do I need?
    Enough to support current and future parallelism. Start with expected consumer parallelism and throughput needs, then validate with load testing. Partition changes later may be constrained—verify current capabilities.

  17. Does Event Hubs support schema management?
    Azure offers schema registry capabilities associated with Event Hubs in some contexts. Availability and tier support can change—verify in official docs before relying on it.


17. Top Online Resources to Learn Event Hubs

Resource Type Name Why It Is Useful
Official documentation Event Hubs documentation – https://learn.microsoft.com/azure/event-hubs/ Canonical reference for concepts, security, SDKs, and operations
Official quotas/limits Event Hubs quotas – https://learn.microsoft.com/azure/event-hubs/event-hubs-quotas Critical for capacity planning and avoiding production surprises
Official pricing Event Hubs pricing – https://azure.microsoft.com/pricing/details/event-hubs/ Current tier and meter details; region-specific pricing
Pricing calculator Azure Pricing Calculator – https://azure.microsoft.com/pricing/calculator/ Build scenario-based estimates for capacity + usage
Security guidance Authorize access to Event Hubs – https://learn.microsoft.com/azure/event-hubs/authorize-access-event-hubs Azure AD RBAC vs SAS, role guidance, best practices
Kafka endpoint docs Event Hubs for Apache Kafka – https://learn.microsoft.com/azure/event-hubs/azure-event-hubs-kafka-overview Understand Kafka compatibility and configuration
Capture docs Event Hubs Capture – https://learn.microsoft.com/azure/event-hubs/event-hubs-capture-overview Configure Capture and understand output and tradeoffs
SDK docs (Python) Azure Event Hubs client library for Python – https://learn.microsoft.com/azure/developer/python/sdk/azure-sdk-library-usage?tabs=linux SDK usage patterns and authentication approaches (verify Event Hubs-specific pages from here)
Architecture guidance Azure Architecture Center – https://learn.microsoft.com/azure/architecture/ Reference architectures and best practices for analytics pipelines
Samples Azure SDK for Python GitHub – https://github.com/Azure/azure-sdk-for-python Official SDK source and examples (search for eventhub samples)

18. Training and Certification Providers

Institute Suitable Audience Likely Learning Focus Mode Website URL
DevOpsSchool.com Engineers, DevOps, SRE, architects Azure, DevOps, cloud operations, pipelines, platform fundamentals Check website https://www.devopsschool.com/
ScmGalaxy.com Developers, DevOps practitioners SCM, DevOps practices, CI/CD foundations Check website https://www.scmgalaxy.com/
CLoudOpsNow.in Cloud engineers, operations teams Cloud operations, monitoring, reliability practices Check website https://www.cloudopsnow.in/
SreSchool.com SREs, platform teams SRE principles, reliability engineering, observability Check website https://www.sreschool.com/
AiOpsSchool.com Ops, SRE, IT analysts AIOps concepts, automation, monitoring analytics Check website https://www.aiopsschool.com/

19. Top Trainers

Platform/Site Likely Specialization Suitable Audience Website URL
RajeshKumar.xyz DevOps/cloud training content (verify offerings) Beginners to intermediate engineers https://rajeshkumar.xyz/
devopstrainer.in DevOps tooling and practices (verify offerings) DevOps engineers and developers https://www.devopstrainer.in/
devopsfreelancer.com Freelance DevOps support/training resources (verify offerings) Teams needing practical guidance https://www.devopsfreelancer.com/
devopssupport.in Operational support and DevOps guidance (verify offerings) Ops/DevOps teams https://www.devopssupport.in/

20. Top Consulting Companies

Company Name Likely Service Area Where They May Help Consulting Use Case Examples Website URL
cotocus.com Cloud/DevOps consulting (verify exact portfolio) Architecture reviews, cloud migrations, operational setup Event ingestion architecture, CI/CD for stream processors, observability setup https://cotocus.com/
DevOpsSchool.com DevOps and cloud consulting/training Platform engineering enablement, DevOps practices Landing zone setup, IaC pipelines, governance for Event Hubs environments https://www.devopsschool.com/
DEVOPSCONSULTING.IN DevOps consulting (verify exact services) DevOps adoption, automation, reliability Production readiness for event streaming, monitoring/alerting strategy https://devopsconsulting.in/

21. Career and Learning Roadmap

What to learn before Event Hubs

  • Azure fundamentals:
  • Resource groups, regions, subscriptions
  • Identity basics (Azure AD/Entra ID), RBAC
  • Networking basics (VNets, Private Link concepts)
  • Streaming fundamentals:
  • Partitions, offsets, retention, consumer groups
  • At-least-once processing and idempotency
  • Basic Python/.NET/Java skills for implementing producers/consumers

What to learn after Event Hubs (to build Analytics systems)

  • Stream processing:
  • Azure Stream Analytics
  • Azure Functions (Event Hubs trigger)
  • Apache Spark streaming (Databricks)
  • Storage/lakehouse:
  • ADLS Gen2, Delta Lake concepts
  • Real-time Analytics:
  • Azure Data Explorer ingestion + KQL
  • Production engineering:
  • Observability (Azure Monitor), alerting, SLOs
  • IaC (Bicep/Terraform) and CI/CD

Job roles that use Event Hubs

  • Cloud engineer / platform engineer
  • Data engineer / Analytics engineer
  • SRE / operations engineer
  • Backend engineer (event-driven systems)
  • Security engineer (log pipelines)

Certification path (Azure)

Microsoft certification offerings evolve. Relevant paths typically include: – Azure Fundamentals (AZ-900)Azure Data Engineer (DP-203) for Analytics pipelines – Azure Developer (AZ-204) for application integration patterns – Azure Solutions Architect (AZ-305) for architecture design

Verify current certification details: https://learn.microsoft.com/credentials/

Project ideas for practice

  • Build a clickstream pipeline: Event Hubs → consumer → ADLS (raw) + dashboard store
  • Implement checkpointing with Blob Storage using an Event Processor pattern
  • Create a Kafka-client producer app writing to Event Hubs Kafka endpoint
  • Build cost dashboards: estimate per-event cost and optimize batch sizing
  • Add Private Link and test connectivity from a VNet-only compute environment

22. Glossary

  • Event: A single record/message sent by a producer (often JSON/Avro/Protobuf bytes).
  • Namespace: Azure resource container for Event Hubs entities and shared configuration.
  • Event hub: A named stream under a namespace, similar to a topic in streaming systems.
  • Partition: An ordered append-only log segment within an event hub; enables parallelism.
  • Partition key: Value used to map related events to the same partition for ordering.
  • Consumer group: A logical group that maintains its own offsets for reading the stream.
  • Offset: Position of an event within a partition.
  • Retention: How long events are kept and available for consumption/replay.
  • Checkpointing: Persisting the last processed offset so a consumer can resume reliably.
  • SAS (Shared Access Signature): Token/keys for Event Hubs access via authorization rules.
  • Azure AD (Entra ID) RBAC: Role-based access control using Azure identities.
  • Capture: Automatic delivery of Event Hubs data to Storage/ADLS for archival/batch use.
  • Private Endpoint (Private Link): Private IP access to a PaaS service within a VNet.
  • Throttling: Service limiting requests when capacity is exceeded.
  • Consumer lag: How far behind consumers are relative to the latest events.

23. Summary

Azure Event Hubs is Azure’s managed service for high-throughput event streaming ingestion, widely used as the entry point for Analytics pipelines. It provides partitioned scale, consumer groups for fan-out, retention-based replay, and deep integrations with Azure stream processing and data platforms.

Cost and operations hinge on choosing the right tier/capacity model, managing ingress/egress volume, and controlling add-ons like Capture, Private Link, and diagnostic logs. Security is strongest with Azure AD RBAC + managed identities, plus private networking where appropriate; avoid overusing shared SAS keys.

Use Event Hubs when you need scalable ingestion and multiple consumers; choose alternatives like Service Bus, Event Grid, or IoT Hub when you need different messaging semantics or device management features.

Next step: extend the lab by adding checkpointing to Azure Blob Storage, enable targeted Azure Monitor alerts, and test a downstream Analytics consumer (Stream Analytics, Databricks, or Azure Data Explorer) end-to-end.