{"id":129,"date":"2026-04-12T22:25:19","date_gmt":"2026-04-12T22:25:19","guid":{"rendered":"https:\/\/www.devopsschool.com\/tutorials\/aws-amazon-kinesis-data-streams-tutorial-architecture-pricing-use-cases-and-hands-on-guide-for-analytics\/"},"modified":"2026-04-12T22:25:19","modified_gmt":"2026-04-12T22:25:19","slug":"aws-amazon-kinesis-data-streams-tutorial-architecture-pricing-use-cases-and-hands-on-guide-for-analytics","status":"publish","type":"post","link":"https:\/\/www.devopsschool.com\/tutorials\/aws-amazon-kinesis-data-streams-tutorial-architecture-pricing-use-cases-and-hands-on-guide-for-analytics\/","title":{"rendered":"AWS Amazon Kinesis Data Streams Tutorial: Architecture, Pricing, Use Cases, and Hands-On Guide for Analytics"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">Category<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Analytics<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">1. Introduction<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Amazon Kinesis Data Streams is an AWS Analytics service for capturing, storing, and processing streaming data in real time. It provides a durable, ordered stream of records that producers write to and consumers read from, enabling near-real-time analytics, alerting, and event-driven processing.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In simple terms: you create a stream, send events (records) to it continuously, and one or more applications read those events to process them\u2014often within seconds. Typical examples include clickstream events, IoT telemetry, application logs, and financial transactions.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Technically, Amazon Kinesis Data Streams is a managed distributed log service. Data is grouped into <strong>shards<\/strong> that provide capacity. Records are ordered <strong>within<\/strong> a shard, and shard assignment is controlled by a record\u2019s <strong>partition key<\/strong>. Consumers read via shard iterators or enhanced fan-out, while AWS handles replication and durability across multiple Availability Zones in a region.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">It solves the problem of building reliable, scalable streaming ingestion and replay. Instead of running your own Kafka-like cluster and managing brokers, partitions, replication, and scaling, you use a managed AWS service that integrates tightly with other AWS Analytics and compute services.<\/p>\n\n\n\n<blockquote>\n<p>Naming note (important): Amazon Kinesis is a family of streaming services. <strong>Amazon Kinesis Data Streams<\/strong> remains the correct current name for this service. Related services have had naming evolution: for example, <strong>Kinesis Data Firehose<\/strong> is now <strong>Amazon Data Firehose<\/strong>, and <strong>Kinesis Data Analytics<\/strong> is now <strong>Amazon Kinesis Data Analytics for Apache Flink<\/strong>. This tutorial focuses only on <strong>Amazon Kinesis Data Streams<\/strong>.<\/p>\n<\/blockquote>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">2. What is Amazon Kinesis Data Streams?<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Official purpose (scope):<\/strong> Amazon Kinesis Data Streams is designed to ingest and store large volumes of streaming data, allowing multiple applications to process the data concurrently and replay it within the configured retention period.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Core capabilities<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Real-time ingestion<\/strong> from many producers (apps, devices, agents, services).<\/li>\n<li><strong>Durable, ordered storage<\/strong> of records for a configurable retention window (default 24 hours; extended retention available up to 365 days\u2014verify current limits in official docs for your region).<\/li>\n<li><strong>Multiple consumer patterns<\/strong>:<\/li>\n<li>Shared throughput consumers using <code>GetRecords<\/code><\/li>\n<li><strong>Enhanced fan-out (EFO)<\/strong> consumers using <code>SubscribeToShard<\/code> for dedicated read throughput per consumer per shard<\/li>\n<li><strong>Replay and reprocessing<\/strong> by reading from an earlier position (sequence number \/ timestamp).<\/li>\n<li><strong>Elastic scaling<\/strong>:<\/li>\n<li><strong>Provisioned mode<\/strong> (you control shard count)<\/li>\n<li><strong>On-demand mode<\/strong> (AWS manages capacity scaling for you)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Major components<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Stream<\/strong>: The named resource you create (e.g., <code>orders-stream<\/code>).<\/li>\n<li><strong>Shard<\/strong>: A throughput and ordering unit inside a stream. Records in a shard are strictly ordered.<\/li>\n<li><strong>Record<\/strong>: A data blob (up to 1 MB) plus partition key and metadata (sequence number, approximate arrival timestamp).<\/li>\n<li><strong>Partition key<\/strong>: Determines which shard a record goes to (via hashing). Controls ordering boundaries and hotspot risk.<\/li>\n<li><strong>Producers<\/strong>: Apps\/services writing data (SDK, Kinesis Producer Library, agents).<\/li>\n<li><strong>Consumers<\/strong>: Apps\/services reading data (SDK, Kinesis Client Library, AWS Lambda, Kinesis Data Analytics, custom apps).<\/li>\n<li><strong>Retention<\/strong>: How long Kinesis keeps records available for replay.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Service type<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Managed streaming data service<\/strong> (durable stream storage + read APIs).<\/li>\n<li>Not a database; not a query engine by itself. It\u2019s an ingestion + ordered log primitive used by Analytics pipelines.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Scope: regional vs global<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Regional service<\/strong>: Streams are created in an AWS Region and data stays in that region unless you replicate it yourself.<\/li>\n<li><strong>Highly available within a region<\/strong>: AWS replicates stream data across multiple Availability Zones for durability (verify exact durability statements in official docs for compliance requirements).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">How it fits into the AWS ecosystem<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Amazon Kinesis Data Streams is commonly used as the \u201cfront door\u201d for streaming ingestion in AWS Analytics architectures, feeding:\n&#8211; <strong>AWS Lambda<\/strong> for event-driven processing\n&#8211; <strong>Amazon Kinesis Data Analytics for Apache Flink<\/strong> for streaming analytics\n&#8211; <strong>Amazon Data Firehose<\/strong> for delivery to S3, Redshift, OpenSearch, Splunk, and more\n&#8211; <strong>Amazon S3<\/strong> (via Firehose or consumer apps) for a data lake\n&#8211; <strong>Amazon DynamoDB \/ Aurora \/ OpenSearch<\/strong> as downstream stores for operational analytics\n&#8211; <strong>Amazon CloudWatch<\/strong> for metrics and alarms (service metrics + consumer app metrics)<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">3. Why use Amazon Kinesis Data Streams?<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Business reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Faster time-to-insight<\/strong>: Move from batch analytics to near-real-time dashboards and alerts.<\/li>\n<li><strong>Lower operational overhead<\/strong>: Avoid managing streaming clusters and their patching\/scaling.<\/li>\n<li><strong>Event replay<\/strong>: Reprocess historical events within retention to fix bugs or rebuild derived datasets.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Technical reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Ordered processing boundaries<\/strong>: Order is guaranteed within a shard, enabling per-key sequencing (e.g., per customer, per device).<\/li>\n<li><strong>Backpressure handling<\/strong>: Consumers can fall behind while data remains available for replay.<\/li>\n<li><strong>Multiple consumers<\/strong>: Different teams\/apps can independently consume the same stream (shared throughput or enhanced fan-out).<\/li>\n<li><strong>Integration<\/strong>: Works cleanly with AWS compute and Analytics services.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Operational reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Managed capacity options<\/strong>:<\/li>\n<li>On-demand for unpredictable traffic<\/li>\n<li>Provisioned for steady workloads and fine cost control<\/li>\n<li><strong>Built-in metrics<\/strong> in CloudWatch (throughput, iterator age, throttling).<\/li>\n<li><strong>Ecosystem tools<\/strong>: Kinesis Client Library (KCL), Kinesis Producer Library (KPL), and SDKs.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Security\/compliance reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>IAM-based access control<\/strong> and AWS CloudTrail auditing.<\/li>\n<li><strong>Encryption at rest<\/strong> via AWS Key Management Service (AWS KMS).<\/li>\n<li><strong>Private connectivity<\/strong> using VPC interface endpoints (AWS PrivateLink) where supported.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Scalability\/performance reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Horizontal scaling via shards (provisioned) or automatic scaling (on-demand).<\/li>\n<li>High ingestion rates with many parallel producers.<\/li>\n<li>Enhanced fan-out for dedicated consumer throughput at scale.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">When teams should choose it<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Choose Amazon Kinesis Data Streams when you need:\n&#8211; Streaming ingestion with <strong>durable storage and replay<\/strong>\n&#8211; <strong>Near-real-time<\/strong> processing\n&#8211; <strong>Ordering per key<\/strong> (partition key)\n&#8211; Multiple concurrent consumers\n&#8211; Tight AWS integration<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">When teams should not choose it<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Consider alternatives when:\n&#8211; You need <strong>simple message queuing<\/strong> with per-message acknowledgements and dead-letter queues as the primary pattern (consider <strong>Amazon SQS<\/strong>).\n&#8211; You need <strong>event routing and SaaS integrations<\/strong> rather than a durable log (consider <strong>Amazon EventBridge<\/strong>).\n&#8211; You want a managed Kafka-compatible ecosystem with Kafka tooling and semantics (consider <strong>Amazon MSK<\/strong>).\n&#8211; You only need <strong>delivery to destinations<\/strong> (S3\/Redshift\/OpenSearch) without custom consumers (consider <strong>Amazon Data Firehose<\/strong>).<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">4. Where is Amazon Kinesis Data Streams used?<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Industries<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>E-commerce and retail (clickstream, cart\/checkout events)<\/li>\n<li>FinTech (transaction streams, fraud detection, audit pipelines)<\/li>\n<li>Media and gaming (player events, matchmaking telemetry)<\/li>\n<li>Manufacturing\/IoT (sensor data ingestion, anomaly detection)<\/li>\n<li>Healthcare (device telemetry, operational monitoring\u2014ensure compliance design)<\/li>\n<li>AdTech\/MarTech (impressions, bidding telemetry, attribution signals)<\/li>\n<li>Cybersecurity (security event pipelines, SIEM ingestion)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Team types<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Data engineering teams building streaming data lakes<\/li>\n<li>Platform teams providing shared ingestion primitives<\/li>\n<li>SRE\/observability teams streaming logs\/metrics\/events<\/li>\n<li>Application teams implementing event-driven features<\/li>\n<li>Security teams collecting and correlating events<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Workloads<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Streaming ETL\/ELT<\/li>\n<li>Real-time analytics (windowed aggregations, anomaly detection)<\/li>\n<li>Operational monitoring and alerting<\/li>\n<li>Event sourcing and CQRS-style pipelines (within constraints)<\/li>\n<li>Log aggregation and stream processing<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Architectures<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Streaming ingestion \u2192 processing (Lambda\/Flink) \u2192 data lake (S3)<\/li>\n<li>Streaming ingestion \u2192 enrichment \u2192 operational store (DynamoDB\/OpenSearch)<\/li>\n<li>Multi-consumer pub\/sub pipelines (billing + analytics + fraud all reading same stream)<\/li>\n<li>Replay-based backfills and bug-fix reprocessing<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Production vs dev\/test usage<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Production<\/strong>: used as a shared ingestion backbone with strong IAM controls, encryption, tagging, alarms, and runbooks.<\/li>\n<li><strong>Dev\/test<\/strong>: used for pipeline prototyping, load testing shard scaling, validating schema evolution, and consumer checkpoint logic.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">5. Top Use Cases and Scenarios<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Below are realistic scenarios where Amazon Kinesis Data Streams is a strong fit.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1) Clickstream ingestion for near-real-time dashboards<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Web\/mobile events arrive continuously; batch ETL is too slow for product decisions.<\/li>\n<li><strong>Why it fits:<\/strong> High-throughput ingestion + multiple consumers + replay for backfills.<\/li>\n<li><strong>Example:<\/strong> Website events go into <code>clickstream-stream<\/code>; one consumer aggregates sessions to a dashboard, another writes raw events to S3.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">2) IoT telemetry ingestion with per-device ordering<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Devices send time-series events; you must preserve ordering per device.<\/li>\n<li><strong>Why it fits:<\/strong> Partition key can be <code>deviceId<\/code> to keep per-device event order within a shard.<\/li>\n<li><strong>Example:<\/strong> Factory sensors send readings to Kinesis; a consumer detects anomalies and triggers alerts.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">3) Fraud detection feature pipeline<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> You need low-latency risk scoring from transaction events.<\/li>\n<li><strong>Why it fits:<\/strong> Stream processing can enrich and score events in seconds; replay supports model re-training pipelines.<\/li>\n<li><strong>Example:<\/strong> Card transactions stream into Kinesis; a Lambda\/Flink job enriches with user risk signals and flags suspicious activity.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">4) Security event ingestion for SIEM<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Consolidate security logs across accounts\/services; handle bursty traffic.<\/li>\n<li><strong>Why it fits:<\/strong> On-demand mode handles bursts; retention allows reprocessing after rule updates.<\/li>\n<li><strong>Example:<\/strong> CloudTrail\/event collectors publish to Kinesis; downstream consumers normalize and send to OpenSearch\/S3.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">5) Application event bus for microservices (within ordering boundaries)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Multiple services must react to domain events with loose coupling.<\/li>\n<li><strong>Why it fits:<\/strong> Durable log + multiple consumers; reprocessing supports new consumers.<\/li>\n<li><strong>Example:<\/strong> <code>orders-stream<\/code> emits order lifecycle events; billing, fulfillment, and analytics services consume independently.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6) Real-time personalization signals<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> You want to update user profiles quickly based on behavior.<\/li>\n<li><strong>Why it fits:<\/strong> Stream can feed real-time feature stores \/ profile stores.<\/li>\n<li><strong>Example:<\/strong> App events stream in; consumer updates DynamoDB user profile and triggers recommendation refresh.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">7) Streaming ETL to a data lake<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Batch loads cause delays and large ETL windows.<\/li>\n<li><strong>Why it fits:<\/strong> Continuous ingest + delivery to S3 via consumers or Firehose.<\/li>\n<li><strong>Example:<\/strong> Kinesis \u2192 consumer transforms JSON to Parquet \u2192 writes to S3 partitioned by time.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">8) Operational telemetry for SRE (custom events)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> High-cardinality operational events are hard to handle with metrics-only systems.<\/li>\n<li><strong>Why it fits:<\/strong> Stream captures detailed events; consumers generate aggregates\/alerts.<\/li>\n<li><strong>Example:<\/strong> Services publish deployment events, error traces, and feature flags to Kinesis; consumer computes SLO burn rates.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">9) Real-time leaderboards and counters<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> You need low-latency event aggregation and periodic snapshots.<\/li>\n<li><strong>Why it fits:<\/strong> Partitioning by game\/region; stream processing updates aggregates.<\/li>\n<li><strong>Example:<\/strong> Player actions stream in; consumer updates Redis\/DynamoDB counters for live leaderboards.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">10) ML online feature pipeline (training + inference)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Keep features consistent between real-time inference and offline training.<\/li>\n<li><strong>Why it fits:<\/strong> Same stream feeds online feature computation and raw archival to S3.<\/li>\n<li><strong>Example:<\/strong> Stream events are enriched and written to an online store; also persisted to S3 for offline training.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">11) Data replication and cache invalidation events<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Multiple caches\/indexes must be updated when primary data changes.<\/li>\n<li><strong>Why it fits:<\/strong> Fan-out to multiple consumers; ordered updates per entity.<\/li>\n<li><strong>Example:<\/strong> Product updates stream in; one consumer updates OpenSearch, another invalidates CDN cache keys.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">12) \u201cReplay to recover\u201d after downstream outage<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Downstream DB\/search cluster outage causes lost processing in push-only systems.<\/li>\n<li><strong>Why it fits:<\/strong> Kinesis retains data so consumers can catch up.<\/li>\n<li><strong>Example:<\/strong> OpenSearch is unavailable for 30 minutes; consumer resumes later and replays the backlog.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">6. Core Features<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">1) Streams with durable retention and replay<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Stores records for a retention period and allows consumers to read from earlier offsets (sequence numbers) or timestamps.<\/li>\n<li><strong>Why it matters:<\/strong> Enables reprocessing, backfills, and consumer recovery.<\/li>\n<li><strong>Practical benefit:<\/strong> Fix a parsing bug and replay the last 6 hours without asking producers to resend.<\/li>\n<li><strong>Limitations\/caveats:<\/strong> Retention is time-based, not size-based; long retention increases cost and may require planning for compliance.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">2) Shards for ordered, parallel throughput (provisioned mode)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Shards are the unit of capacity and ordering. Each shard supports a defined write\/read throughput.<\/li>\n<li><strong>Why it matters:<\/strong> Lets you scale horizontally and control concurrency.<\/li>\n<li><strong>Practical benefit:<\/strong> Increase shard count to handle ingestion spikes in provisioned mode.<\/li>\n<li><strong>Limitations\/caveats:<\/strong> Hot partitions can overload a shard; you must design partition keys carefully.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">3) On-demand capacity mode<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> AWS manages scaling to match throughput without pre-provisioning shards.<\/li>\n<li><strong>Why it matters:<\/strong> Reduces operational burden when traffic is spiky or unpredictable.<\/li>\n<li><strong>Practical benefit:<\/strong> Start small and handle bursts without resharding workflows.<\/li>\n<li><strong>Limitations\/caveats:<\/strong> Pricing and scaling behavior differs from provisioned; verify current on-demand pricing dimensions and any quotas in the official pricing\/docs.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">4) Multiple consumer models (shared throughput vs enhanced fan-out)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Supports classic polling consumers (<code>GetRecords<\/code>) and Enhanced Fan-Out (EFO) with dedicated throughput per consumer per shard (<code>SubscribeToShard<\/code>).<\/li>\n<li><strong>Why it matters:<\/strong> Prevents one consumer from starving others; supports low-latency fan-out.<\/li>\n<li><strong>Practical benefit:<\/strong> Run analytics, fraud, and archiving consumers concurrently without contention by using EFO where needed.<\/li>\n<li><strong>Limitations\/caveats:<\/strong> EFO has its own pricing dimension and quotas; consumers must be designed to handle scale and retries.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">5) Server-side encryption (SSE) with AWS KMS<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Encrypts stream data at rest using AWS KMS keys (AWS-managed or customer-managed).<\/li>\n<li><strong>Why it matters:<\/strong> Meets many security and compliance requirements.<\/li>\n<li><strong>Practical benefit:<\/strong> Enforce encryption by policy and control access via KMS key policies.<\/li>\n<li><strong>Limitations\/caveats:<\/strong> KMS usage can add cost and may require key policy\/IAM alignment to avoid access issues.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6) IAM integration for fine-grained access<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Controls who can create streams, put records, get records, describe streams, and manage encryption.<\/li>\n<li><strong>Why it matters:<\/strong> Streaming systems often become central shared infrastructure.<\/li>\n<li><strong>Practical benefit:<\/strong> Separate producer and consumer permissions; restrict access per environment (dev\/test\/prod).<\/li>\n<li><strong>Limitations\/caveats:<\/strong> Cross-account access needs careful IAM role design; verify whether resource-based policies are applicable in your setup in official docs.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">7) Scaling and resharding (provisioned mode)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Adjusts shard count (split\/merge) to match throughput needs.<\/li>\n<li><strong>Why it matters:<\/strong> Right-size capacity and cost.<\/li>\n<li><strong>Practical benefit:<\/strong> Scale up during business hours and scale down later (if your workload supports it).<\/li>\n<li><strong>Limitations\/caveats:<\/strong> Resharding changes shard topology; consumers must handle shard closures and new shards (KCL does this).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">8) Monitoring with Amazon CloudWatch<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Exposes metrics like incoming bytes\/records, throttles, iterator age, and per-shard metrics (enhanced monitoring options).<\/li>\n<li><strong>Why it matters:<\/strong> Streaming failures often show up as consumer lag or throttling.<\/li>\n<li><strong>Practical benefit:<\/strong> Alarm on <code>IteratorAgeMilliseconds<\/code> to detect falling-behind consumers.<\/li>\n<li><strong>Limitations\/caveats:<\/strong> You still need application-level logs\/metrics for end-to-end troubleshooting.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">9) Integrations: Lambda, Firehose, Flink, SDKs, KCL\/KPL<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Connects to common AWS consumers and producer libraries.<\/li>\n<li><strong>Why it matters:<\/strong> Reduces custom plumbing.<\/li>\n<li><strong>Practical benefit:<\/strong> Use Lambda for simple event processing; use Flink for complex stateful analytics; use Firehose for managed delivery to S3\/OpenSearch\/Redshift.<\/li>\n<li><strong>Limitations\/caveats:<\/strong> Each integration has its own limits and cost model; design for backpressure and retries.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">10) Exactly-once semantics are not inherent<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Kinesis provides at-least-once delivery; duplicates can occur (e.g., retries).<\/li>\n<li><strong>Why it matters:<\/strong> Downstream systems must handle duplicates.<\/li>\n<li><strong>Practical benefit:<\/strong> Design idempotent consumers and deduplication keys.<\/li>\n<li><strong>Limitations\/caveats:<\/strong> If you require strict exactly-once end-to-end, you must build it using transactional sinks, idempotency, or frameworks that support it (and confirm constraints).<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">7. Architecture and How It Works<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">High-level architecture<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Producers<\/strong> call <code>PutRecord<\/code>\/<code>PutRecords<\/code> to write events with a <strong>partition key<\/strong>.<\/li>\n<li>Kinesis hashes the partition key and routes the record to a <strong>shard<\/strong>.<\/li>\n<li>Records are durably stored for the retention period.<\/li>\n<li><strong>Consumers<\/strong> read from shards:\n   &#8211; Shared throughput consumers poll with <code>GetShardIterator<\/code> + <code>GetRecords<\/code>\n   &#8211; EFO consumers use <code>SubscribeToShard<\/code> for push-style delivery per consumer per shard<\/li>\n<li>Consumers process records and write to downstream stores or trigger actions.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Data flow vs control flow<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Control plane:<\/strong> create stream, update retention, update shard count (provisioned), enable encryption, tagging.<\/li>\n<li><strong>Data plane:<\/strong> put records, get records, subscribe to shards.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Integrations with related AWS services<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>AWS Lambda<\/strong>: event source mapping polls Kinesis and invokes your function with batches.<\/li>\n<li><strong>Amazon Data Firehose<\/strong>: can read from Kinesis Data Streams and deliver to destinations (S3, Redshift, OpenSearch, Splunk).<\/li>\n<li><strong>Amazon Kinesis Data Analytics for Apache Flink<\/strong>: reads streams for stateful processing.<\/li>\n<li><strong>AWS Glue \/ Lake Formation<\/strong>: often used downstream for cataloging and governance (data stored in S3).<\/li>\n<li><strong>Amazon CloudWatch<\/strong>: metrics\/alarms; logs for consumers\/producers are in CloudWatch Logs if you configure them.<\/li>\n<li><strong>AWS CloudTrail<\/strong>: records API calls for auditing.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Dependency services (common)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>AWS KMS<\/strong> for encryption at rest (SSE-KMS).<\/li>\n<li><strong>Amazon DynamoDB<\/strong> (for KCL checkpointing\/leases).<\/li>\n<li><strong>Amazon VPC<\/strong> endpoints for private connectivity (where supported).<\/li>\n<li><strong>IAM<\/strong> for authentication\/authorization.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Security\/authentication model<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Requests are authenticated via <strong>AWS Signature Version 4<\/strong> using IAM principals (roles\/users).<\/li>\n<li>Authorization is enforced via <strong>IAM policies<\/strong> (and KMS policies if SSE-KMS is enabled).<\/li>\n<li>Use least privilege: producers can <code>kinesis:PutRecord*<\/code>; consumers can <code>kinesis:Get*<\/code>, <code>kinesis:Describe*<\/code>, and checkpoint store permissions if using KCL.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Networking model<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Kinesis endpoints are public by default (AWS service endpoint).<\/li>\n<li>For private access from a VPC, use <strong>VPC interface endpoints (AWS PrivateLink)<\/strong> for Kinesis Data Streams where available in your region. Verify endpoint name and availability in the VPC documentation for your region.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Monitoring\/logging\/governance<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>CloudWatch metrics<\/strong>: throughput, throttles, iterator age, etc.<\/li>\n<li><strong>CloudTrail logs<\/strong>: who changed stream configuration or called APIs.<\/li>\n<li><strong>Tagging<\/strong>: tag streams with <code>Environment<\/code>, <code>Owner<\/code>, <code>CostCenter<\/code>, <code>DataClassification<\/code>.<\/li>\n<li><strong>Schema governance<\/strong>: Kinesis itself is schema-agnostic; enforce schemas at producers\/consumers using a schema registry pattern (e.g., AWS Glue Schema Registry\u2014verify suitability for your serialization formats).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Simple architecture diagram (Mermaid)<\/h3>\n\n\n\n<pre><code class=\"language-mermaid\">flowchart LR\n  P[Producers&lt;br\/&gt;(apps, devices, agents)] --&gt;|PutRecord\/PutRecords| KDS[Amazon Kinesis Data Streams]\n  KDS --&gt;|GetRecords (shared)| C1[Consumer A&lt;br\/&gt;(custom app)]\n  KDS --&gt;|Enhanced Fan-Out| C2[Consumer B&lt;br\/&gt;(analytics app)]\n  C1 --&gt; S3[Amazon S3&lt;br\/&gt;(raw\/archive)]\n  C2 --&gt; DB[(Operational Store&lt;br\/&gt;DynamoDB\/OpenSearch)]\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Production-style architecture diagram (Mermaid)<\/h3>\n\n\n\n<pre><code class=\"language-mermaid\">flowchart TB\n  subgraph VPC[\"VPC (private subnets)\"]\n    EKS[EKS \/ EC2 Consumers&lt;br\/&gt;KCL-based apps]\n    LMB[AWS Lambda&lt;br\/&gt;stream processor]\n  end\n\n  subgraph AWS[\"AWS Analytics Platform (Region)\"]\n    KDS[Amazon Kinesis Data Streams&lt;br\/&gt;(SSE-KMS enabled)]\n    CW[Amazon CloudWatch&lt;br\/&gt;Metrics &amp; Alarms]\n    CT[AWS CloudTrail]\n    KMS[AWS KMS CMK]\n    DDB[(Amazon DynamoDB&lt;br\/&gt;KCL checkpoints)]\n    FH[Amazon Data Firehose]\n    S3[Amazon S3 Data Lake]\n    OS[(Amazon OpenSearch Service)]\n  end\n\n  Producers[Producers&lt;br\/&gt;(microservices\/IoT\/agents)] --&gt;|PutRecords| KDS\n  KMS --&gt; KDS\n  KDS --&gt;|EFO \/ GetRecords| EKS\n  KDS --&gt;|Event Source Mapping| LMB\n  EKS --&gt; DDB\n  KDS --&gt; FH --&gt; S3\n  LMB --&gt; OS\n  KDS --&gt; CW\n  KDS --&gt; CT\n  EKS --&gt; CW\n  LMB --&gt; CW\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">8. Prerequisites<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">AWS account and billing<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>An <strong>AWS account<\/strong> with billing enabled.<\/li>\n<li>Ability to create IAM roles\/policies, Kinesis streams, and (optionally) Lambda functions.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Permissions \/ IAM<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">At minimum for the hands-on lab:\n&#8211; <code>kinesis:CreateStream<\/code>, <code>kinesis:DeleteStream<\/code>, <code>kinesis:DescribeStreamSummary<\/code>, <code>kinesis:ListShards<\/code>\n&#8211; <code>kinesis:PutRecord<\/code>, <code>kinesis:PutRecords<\/code>\n&#8211; <code>kinesis:GetShardIterator<\/code>, <code>kinesis:GetRecords<\/code>\n&#8211; If enabling encryption with a customer-managed key: <code>kms:CreateKey<\/code> (optional), <code>kms:Encrypt<\/code>, <code>kms:Decrypt<\/code>, <code>kms:GenerateDataKey<\/code>, and key policy permissions<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">If you also do the optional Lambda integration:\n&#8211; <code>lambda:CreateFunction<\/code>, <code>lambda:CreateEventSourceMapping<\/code>, <code>iam:CreateRole<\/code>, <code>iam:PassRole<\/code>, plus CloudWatch Logs permissions.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Tools<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Choose one:\n&#8211; <strong>AWS CloudShell<\/strong> (recommended for beginners): includes AWS CLI and common tools, no local setup.\n&#8211; Or local workstation with:\n  &#8211; AWS CLI v2 configured (<code>aws configure<\/code>)\n  &#8211; Python 3.9+ and <code>boto3<\/code> (if you use the Python scripts)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Region availability<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Amazon Kinesis Data Streams is available in many AWS Regions, but always verify availability in your target region (especially for VPC endpoints and enhanced fan-out quotas).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Quotas\/limits (must verify for your account\/region)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Check <strong>Service Quotas<\/strong> for Amazon Kinesis Data Streams, including:\n&#8211; Streams per region\n&#8211; Shards per stream (provisioned) or on-demand limits\n&#8211; API call rates\n&#8211; Enhanced fan-out consumer limits<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Prerequisite services (optional)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>AWS KMS<\/strong> if using customer-managed encryption keys<\/li>\n<li><strong>Amazon DynamoDB<\/strong> if using KCL applications for checkpointing<\/li>\n<li><strong>Amazon CloudWatch<\/strong> (available by default) for metrics\/alarms<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">9. Pricing \/ Cost<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Amazon Kinesis Data Streams pricing is <strong>usage-based<\/strong> and depends on capacity mode and enabled features. Exact prices vary by <strong>region<\/strong>, so use official sources for numbers.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Official pricing references<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Pricing page: https:\/\/aws.amazon.com\/kinesis\/data-streams\/pricing\/<\/li>\n<li>AWS Pricing Calculator: https:\/\/calculator.aws\/#\/<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Pricing dimensions (typical)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Pricing differs by <strong>capacity mode<\/strong>:<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Provisioned mode (common dimensions)<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Shard hours<\/strong>: number of shards \u00d7 hours running.<\/li>\n<li><strong>PUT payload units<\/strong>: ingestion requests are billed in units based on payload size (typically per 25 KB unit, aggregated across records; verify in pricing page).<\/li>\n<li><strong>Extended data retention<\/strong>: additional charge for retention beyond the default (often per shard-hour or per GB-month equivalent depending on model; verify current pricing).<\/li>\n<li><strong>Enhanced fan-out<\/strong>:<\/li>\n<li><strong>Consumer-shard hours<\/strong> (per consumer per shard per hour)<\/li>\n<li><strong>Data retrieval<\/strong> (per GB retrieved via EFO; verify current dimension)<\/li>\n<li><strong>Optional features<\/strong> and API usage can add cost (confirm on pricing page).<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">On-demand mode (common dimensions)<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Stream hours<\/strong> (time the stream exists) and\/or <strong>ingested data volume<\/strong> and <strong>retrieved data volume<\/strong>, depending on the current AWS pricing model for on-demand in your region.<\/li>\n<li>EFO and extended retention may have additional charges.<\/li>\n<li>Because on-demand pricing has evolved over time, <strong>verify current on-demand pricing dimensions<\/strong> on the official pricing page before committing.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Free tier<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">AWS occasionally offers limited free tier usage for some services, but <strong>do not assume<\/strong> Kinesis Data Streams is meaningfully free for production. Check the AWS Free Tier page and the Kinesis pricing page for current eligibility.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Main cost drivers<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Throughput and volume<\/strong>: total GB ingested and retrieved.<\/li>\n<li><strong>Shard count<\/strong> (provisioned) and how long streams run.<\/li>\n<li><strong>Number of consumers<\/strong> (especially with EFO).<\/li>\n<li><strong>Retention period<\/strong> (extended retention can become significant).<\/li>\n<li><strong>Downstream services<\/strong>:<\/li>\n<li>Lambda invocation and duration (if using Lambda)<\/li>\n<li>Firehose delivery charges<\/li>\n<li>S3 storage and requests<\/li>\n<li>OpenSearch indexing\/storage<\/li>\n<li>DynamoDB read\/write capacity (KCL checkpoints)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Hidden\/indirect costs to plan for<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Data transfer<\/strong>:<\/li>\n<li>Intra-region data transfer is often lower, but cross-AZ or cross-region patterns can add cost depending on architecture.<\/li>\n<li>Internet egress applies if consumers are outside AWS.<\/li>\n<li>Always validate with AWS data transfer pricing for your case.<\/li>\n<li><strong>KMS<\/strong>: customer-managed key usage can add KMS request costs.<\/li>\n<li><strong>Operational duplication<\/strong>: multiple consumers reading full streams multiplies retrieval.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">How to optimize cost (practical)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Prefer <strong>on-demand<\/strong> for spiky, unknown workloads; prefer <strong>provisioned<\/strong> when you can predict and optimize shard count.<\/li>\n<li>Use <strong>KPL aggregation<\/strong> (where appropriate) to reduce PUT payload units and API overhead (verify KPL fit and language support).<\/li>\n<li>Avoid unnecessary <strong>EFO<\/strong> consumers; use EFO only for consumers that truly need dedicated throughput\/low latency.<\/li>\n<li>Set retention to the <strong>minimum<\/strong> that meets recovery and replay requirements.<\/li>\n<li>Downsample or filter early (e.g., filter noisy events in a first-stage consumer).<\/li>\n<li>Use CloudWatch alarms to detect over-provisioning (low utilization) or under-provisioning (throttles).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Example low-cost starter estimate (conceptual)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">A dev\/test setup might include:\n&#8211; 1 on-demand stream (or 1 shard provisioned stream)\n&#8211; Low ingestion volume (KBs\/sec)\n&#8211; 24-hour retention\n&#8211; One consumer app polling at low rate<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">To estimate:\n1. Choose region in the pricing calculator.\n2. Add Kinesis Data Streams.\n3. Enter expected GB ingested\/day, GB retrieved\/day, stream hours, and any EFO usage.\n4. Add downstream services (Lambda, S3, Firehose) if used.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Because pricing differs by region and mode, <strong>use the calculator<\/strong> rather than copying numbers from blogs.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Example production cost considerations<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">For production, the biggest levers are:\n&#8211; Sustained shard-hours (provisioned) or sustained GB ingestion (on-demand)\n&#8211; Number of parallel consumers and EFO adoption\n&#8211; Extended retention (especially multi-day or months)\n&#8211; Downstream indexing\/search (OpenSearch) and data lake storage (S3) growth<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">A practical cost review checklist:\n&#8211; Can you reduce consumer fan-out by sharing a derived stream?\n&#8211; Are partition keys balanced (to avoid adding shards just for hotspots)?\n&#8211; Are you over-retaining in Kinesis instead of writing to S3 for long-term?<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">10. Step-by-Step Hands-On Tutorial<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Objective<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Create an Amazon Kinesis Data Streams stream, publish sample events, read them back as a consumer, monitor key metrics, and clean up\u2014all using AWS CloudShell and Python (<code>boto3<\/code>) to keep the lab executable and low cost.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Lab Overview<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">You will:\n1. Create a stream in <strong>on-demand mode<\/strong> (minimal planning).\n2. Write sample JSON events into the stream.\n3. Discover shards and read events back.\n4. Verify ingestion\/consumption using CloudWatch metrics.\n5. Clean up resources.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Estimated time: 30\u201345 minutes.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 1: Choose a region and open AWS CloudShell<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>In the AWS Console, switch to a region you want to use (e.g., <code>us-east-1<\/code>).<\/li>\n<li>Open <strong>CloudShell<\/strong>.<\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Expected outcome:<\/strong> You have a terminal with AWS CLI credentials already set.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Run:<\/p>\n\n\n\n<pre><code class=\"language-bash\">aws sts get-caller-identity\naws configure list\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">If <code>get-caller-identity<\/code> fails, your session or permissions are not ready.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 2: Create an Amazon Kinesis Data Streams stream (on-demand)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Set variables:<\/p>\n\n\n\n<pre><code class=\"language-bash\">export AWS_REGION=\"$(aws configure get region)\"\nexport STREAM_NAME=\"kds-lab-stream\"\necho \"Region: $AWS_REGION  Stream: $STREAM_NAME\"\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Create the stream in <strong>on-demand<\/strong> mode:<\/p>\n\n\n\n<pre><code class=\"language-bash\">aws kinesis create-stream \\\n  --stream-name \"$STREAM_NAME\" \\\n  --stream-mode-details StreamMode=ON_DEMAND\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Wait until it becomes active:<\/p>\n\n\n\n<pre><code class=\"language-bash\">aws kinesis wait stream-exists --stream-name \"$STREAM_NAME\"\n\naws kinesis describe-stream-summary --stream-name \"$STREAM_NAME\" \\\n  --query 'StreamDescriptionSummary.{Status:StreamStatus,Mode:StreamModeDetails.StreamMode,ARN:StreamARN}'\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Expected outcome:<\/strong> <code>Status<\/code> is <code>ACTIVE<\/code> and mode shows <code>ON_DEMAND<\/code>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">If you prefer provisioned mode for learning shard concepts, you can create a stream with shards instead (verify CLI parameters in your version\/docs), but on-demand keeps the lab simpler.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 3: Put sample records into the stream (Python producer)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Create a producer script:<\/p>\n\n\n\n<pre><code class=\"language-bash\">cat &gt; producer.py &lt;&lt;'PY'\nimport json, time, uuid, random\nimport boto3\n\nSTREAM_NAME = \"kds-lab-stream\"\n\nkinesis = boto3.client(\"kinesis\")\n\ndef make_event(i: int):\n    return {\n        \"eventId\": str(uuid.uuid4()),\n        \"eventType\": \"sensor_reading\",\n        \"deviceId\": f\"device-{random.randint(1,5)}\",\n        \"reading\": round(random.random() * 100, 3),\n        \"seq\": i,\n        \"ts\": int(time.time() * 1000)\n    }\n\ndef main():\n    for i in range(1, 21):\n        event = make_event(i)\n        data = json.dumps(event).encode(\"utf-8\")\n        partition_key = event[\"deviceId\"]  # preserves ordering per deviceId (within a shard)\n        resp = kinesis.put_record(\n            StreamName=STREAM_NAME,\n            Data=data,\n            PartitionKey=partition_key\n        )\n        print(f\"PutRecord: seq={event['seq']} deviceId={partition_key} -&gt; ShardId={resp['ShardId']} SequenceNumber={resp['SequenceNumber']}\")\n        time.sleep(0.2)\n\nif __name__ == \"__main__\":\n    main()\nPY\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Run it:<\/p>\n\n\n\n<pre><code class=\"language-bash\">python3 producer.py\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Expected outcome:<\/strong> You see 20 successful <code>PutRecord<\/code> outputs with <code>ShardId<\/code> and <code>SequenceNumber<\/code>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">If you receive <code>AccessDeniedException<\/code>, your IAM principal lacks <code>kinesis:PutRecord<\/code>.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 4: Read records back (Python consumer using shard iterators)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Create a consumer script that:\n&#8211; Lists shards\n&#8211; Starts from <code>TRIM_HORIZON<\/code> (beginning of retention window)\n&#8211; Reads records for a short period<\/p>\n\n\n\n<pre><code class=\"language-bash\">cat &gt; consumer.py &lt;&lt;'PY'\nimport time, json\nimport boto3\n\nSTREAM_NAME = \"kds-lab-stream\"\nkinesis = boto3.client(\"kinesis\")\n\ndef list_shard_ids():\n    shard_ids = []\n    resp = kinesis.list_shards(StreamName=STREAM_NAME)\n    for s in resp.get(\"Shards\", []):\n        shard_ids.append(s[\"ShardId\"])\n    return shard_ids\n\ndef read_from_shard(shard_id, seconds=15):\n    it = kinesis.get_shard_iterator(\n        StreamName=STREAM_NAME,\n        ShardId=shard_id,\n        ShardIteratorType=\"TRIM_HORIZON\"\n    )[\"ShardIterator\"]\n\n    end = time.time() + seconds\n    total = 0\n\n    while time.time() &lt; end and it:\n        out = kinesis.get_records(ShardIterator=it, Limit=100)\n        it = out.get(\"NextShardIterator\")\n\n        records = out.get(\"Records\", [])\n        for r in records:\n            data = r[\"Data\"]\n            try:\n                evt = json.loads(data.decode(\"utf-8\"))\n                print(f\"Got: deviceId={evt.get('deviceId')} seq={evt.get('seq')} ts={evt.get('ts')} pk={r.get('PartitionKey')} sn={r.get('SequenceNumber')}\")\n            except Exception:\n                print(f\"Got non-JSON record: {data!r}\")\n            total += 1\n\n        # polite polling\n        time.sleep(0.5)\n\n    print(f\"Shard {shard_id}: read {total} records\")\n\ndef main():\n    shard_ids = list_shard_ids()\n    print(\"Shards:\", shard_ids)\n    for sid in shard_ids:\n        read_from_shard(sid)\n\nif __name__ == \"__main__\":\n    main()\nPY\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Run it:<\/p>\n\n\n\n<pre><code class=\"language-bash\">python3 consumer.py\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Expected outcome:<\/strong> You see the events you wrote in Step 3, printed as they\u2019re read from the stream. Order is guaranteed within each shard, but not globally across shards.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 5: Verify using CloudWatch metrics<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">In the AWS Console:\n1. Go to <strong>CloudWatch \u2192 Metrics<\/strong>\n2. Browse to <strong>Kinesis \u2192 Stream Metrics<\/strong>\n3. Select your stream and view:\n   &#8211; <code>IncomingBytes<\/code>, <code>IncomingRecords<\/code>\n   &#8211; <code>GetRecords.Bytes<\/code>, <code>GetRecords.Records<\/code>\n   &#8211; <code>IteratorAgeMilliseconds<\/code> (important for lag)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Expected outcome:<\/strong> Spikes in <code>IncomingRecords<\/code> after running the producer and activity in read metrics after running the consumer.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Tip: <code>IteratorAgeMilliseconds<\/code> should stay low in this lab. If it grows, consumers are falling behind.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Validation<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Use the CLI to confirm stream status and basic attributes:<\/p>\n\n\n\n<pre><code class=\"language-bash\">aws kinesis describe-stream-summary --stream-name \"$STREAM_NAME\" \\\n  --query 'StreamDescriptionSummary.{Status:StreamStatus,RetentionHours:RetentionPeriodHours,Mode:StreamModeDetails.StreamMode}'\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Re-run:\n&#8211; <code>python3 producer.py<\/code> then <code>python3 consumer.py<\/code>\nto confirm repeatability.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Troubleshooting<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>1) <code>AccessDeniedException<\/code><\/strong>\n&#8211; Cause: missing IAM permissions.\n&#8211; Fix: ensure your role\/user has Kinesis actions for create\/put\/get\/list\/describe. For CloudShell, confirm which IAM role is used.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>2) <code>ResourceNotFoundException<\/code><\/strong>\n&#8211; Cause: wrong region or wrong stream name.\n&#8211; Fix: verify region and list streams:\n  <code>bash\n  aws kinesis list-streams<\/code><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>3) Throttling \/ <code>ProvisionedThroughputExceededException<\/code><\/strong>\n&#8211; Cause: too much read\/write for shard capacity (mostly in provisioned mode; on-demand can also throttle under some conditions\/quotas).\n&#8211; Fix: reduce producer rate, improve partition key distribution, increase capacity (provisioned shards), or verify quotas.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>4) Consumer reads nothing<\/strong>\n&#8211; Causes:\n  &#8211; You used <code>LATEST<\/code> iterator type (reads only new records)\n  &#8211; Retention expired (unlikely in this lab)\n&#8211; Fix: ensure <code>TRIM_HORIZON<\/code> and run consumer soon after producing.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>5) JSON decode errors<\/strong>\n&#8211; Cause: producer sent non-JSON or binary.\n&#8211; Fix: standardize encoding and include schema\/version fields.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Cleanup<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Delete the stream to avoid ongoing charges:<\/p>\n\n\n\n<pre><code class=\"language-bash\">aws kinesis delete-stream --stream-name \"$STREAM_NAME\"\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Verify it no longer appears:<\/p>\n\n\n\n<pre><code class=\"language-bash\">aws kinesis list-streams --query \"StreamNames[?@=='$STREAM_NAME']\"\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Expected outcome:<\/strong> The stream is deleted (may take a short time).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Also remove local files (optional):<\/p>\n\n\n\n<pre><code class=\"language-bash\">rm -f producer.py consumer.py\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">11. Best Practices<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Architecture best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Start with clear consumer requirements<\/strong>: latency, number of consumers, replay needs, retention, ordering boundaries.<\/li>\n<li><strong>Design partition keys intentionally<\/strong>:<\/li>\n<li>Use keys that distribute load (avoid a single \u201chot\u201d key).<\/li>\n<li>Keep ordering requirements realistic (ordering is per shard, not global).<\/li>\n<li><strong>Separate raw and derived streams<\/strong>:<\/li>\n<li>Keep a \u201craw events\u201d stream.<\/li>\n<li>Optionally produce derived\/filtered streams for specific use cases to reduce fan-out costs.<\/li>\n<li><strong>Use S3 as the long-term source of truth<\/strong>:<\/li>\n<li>Kinesis is for streaming + short\/medium retention. For long-term retention, write to S3.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">IAM\/security best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use <strong>least privilege<\/strong> for producers vs consumers.<\/li>\n<li>Prefer <strong>IAM roles<\/strong> (for EC2\/ECS\/EKS\/Lambda) over long-lived access keys.<\/li>\n<li>If using customer-managed KMS keys:<\/li>\n<li>Align <strong>KMS key policy<\/strong> and IAM policies.<\/li>\n<li>Restrict key usage to required principals and contexts.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Cost best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Choose <strong>on-demand<\/strong> for unknown\/spiky workloads; <strong>provisioned<\/strong> for predictable steady throughput.<\/li>\n<li>Right-size retention\u2014don\u2019t use extended retention as a substitute for archival.<\/li>\n<li>Use aggregation where appropriate (e.g., KPL) to reduce request overhead.<\/li>\n<li>Avoid unnecessary EFO consumers; use shared throughput if acceptable.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Performance best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use <code>PutRecords<\/code> for batching where possible to reduce overhead.<\/li>\n<li>Keep record sizes efficient; compress payloads where it makes sense (but balance CPU cost).<\/li>\n<li>Monitor and fix <strong>hot shards<\/strong> by improving key distribution (prefix randomization, composite keys, or shard-mapping strategies).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Reliability best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Build consumers to handle:<\/li>\n<li>Retries and exponential backoff<\/li>\n<li>Partial batch failures<\/li>\n<li>Duplicates (idempotency)<\/li>\n<li>Resharding events (KCL handles many cases)<\/li>\n<li>Track consumer lag using <code>IteratorAgeMilliseconds<\/code>.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Operations best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Tag streams consistently: <code>App<\/code>, <code>Env<\/code>, <code>Team<\/code>, <code>CostCenter<\/code>, <code>DataClass<\/code>.<\/li>\n<li>Create CloudWatch alarms for:<\/li>\n<li><code>WriteProvisionedThroughputExceeded<\/code> \/ <code>ReadProvisionedThroughputExceeded<\/code> (provisioned)<\/li>\n<li><code>IteratorAgeMilliseconds<\/code> (consumer lag)<\/li>\n<li>Sudden drop in <code>IncomingRecords<\/code> (producer failure)<\/li>\n<li>Use runbooks:<\/li>\n<li>What to do when consumers fall behind<\/li>\n<li>How to scale (provisioned)<\/li>\n<li>How to rotate keys\/permissions safely<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Governance\/tagging\/naming best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Naming convention example:<\/li>\n<li><code>org-env-domain-purpose-stream<\/code> (e.g., <code>acme-prod-orders-events-stream<\/code>)<\/li>\n<li>Tagging convention:<\/li>\n<li><code>Environment=prod<\/code>, <code>Owner=data-platform<\/code>, <code>PII=false<\/code>, <code>RetentionClass=24h<\/code><\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">12. Security Considerations<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Identity and access model<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>IAM policies<\/strong> control Kinesis API actions (create, put, get, describe, list, delete).<\/li>\n<li>Producers and consumers should use separate roles.<\/li>\n<li>For cross-account patterns, typically use <strong>assume-role<\/strong> with explicit trust and permissions; verify whether <strong>resource-based policies<\/strong> are supported\/appropriate for your use case in the latest Kinesis Data Streams documentation.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Encryption<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>In transit:<\/strong> Use TLS (default for AWS SDK\/CLI endpoints).<\/li>\n<li><strong>At rest:<\/strong> Enable <strong>server-side encryption (SSE)<\/strong> using AWS KMS.<\/li>\n<li>AWS-managed key is simplest.<\/li>\n<li>Customer-managed key provides stronger control and audit boundaries.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Network exposure<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>By default, Kinesis uses AWS public endpoints.<\/li>\n<li>For private connectivity from VPC workloads, use <strong>VPC interface endpoints<\/strong> (PrivateLink) where available. Combine with:<\/li>\n<li>Security groups (endpoint ENIs)<\/li>\n<li>VPC endpoint policies (when supported) to restrict allowed actions<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Secrets handling<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Avoid static credentials in code.<\/li>\n<li>Use IAM roles (instance profiles, task roles, IRSA for EKS).<\/li>\n<li>If you must use credentials (not recommended), store in AWS Secrets Manager and rotate.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Audit\/logging<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use <strong>AWS CloudTrail<\/strong> to audit management and data-plane API usage (confirm which events are logged and how in CloudTrail docs).<\/li>\n<li>Log consumer processing outcomes (success\/failure counts, poison-pill events).<\/li>\n<li>Consider a structured logging approach with correlation IDs.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Compliance considerations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Kinesis is often part of pipelines handling regulated data. Controls commonly required:<\/li>\n<li>Encryption at rest and in transit<\/li>\n<li>IAM least privilege and separation of duties<\/li>\n<li>Data classification and tagging<\/li>\n<li>Retention controls (avoid retaining sensitive data longer than necessary)<\/li>\n<li>Centralized audit logs (CloudTrail + SIEM)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Common security mistakes<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Over-broad IAM permissions like <code>kinesis:*<\/code> on <code>*<\/code>.<\/li>\n<li>Not restricting who can <strong>read<\/strong> streams (data exfiltration risk).<\/li>\n<li>Misconfigured KMS key policies causing outages during consumer deployment.<\/li>\n<li>Sending sensitive data unencrypted at the application layer when required by policy (Kinesis encrypts at rest, but you may need field-level encryption\/tokenization).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Secure deployment recommendations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Enforce encryption at rest via policy and guardrails (e.g., SCPs where appropriate).<\/li>\n<li>Use dedicated streams per environment and data sensitivity tier.<\/li>\n<li>Use VPC endpoints and restrict egress for private workloads.<\/li>\n<li>Implement idempotent consumers and dead-letter handling downstream (even though Kinesis itself is not a queue with DLQ semantics).<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">13. Limitations and Gotchas<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Known limitations \/ behavioral gotchas<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Ordering is per shard<\/strong>, not global across the stream.<\/li>\n<li><strong>At-least-once delivery<\/strong>: duplicates can occur; consumers must be idempotent.<\/li>\n<li><strong>Record size limit<\/strong>: a single record is limited (commonly 1 MB). Large payloads should be stored in S3 with pointers in the stream.<\/li>\n<li><strong>Retention is time-based<\/strong>: once expired, data cannot be replayed from the stream.<\/li>\n<li><strong>Hot shards<\/strong>: poor partition key design can throttle a single shard while others are idle.<\/li>\n<li><strong>Resharding complexity (provisioned)<\/strong>: shard IDs change; consumers must handle shard closure and discovery. KCL helps.<\/li>\n<li><strong>Multiple consumers cost\/throughput<\/strong>: shared throughput consumers contend; EFO improves throughput isolation but adds cost.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Quotas to watch (verify in Service Quotas)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Max streams per region<\/li>\n<li>Shards per stream (provisioned)<\/li>\n<li>EFO consumer registrations per stream<\/li>\n<li>API rate limits<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Regional constraints<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Some features (or VPC endpoint availability) can vary by region. Verify for your chosen region.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Pricing surprises<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Extended retention for long durations can become expensive compared to archiving in S3.<\/li>\n<li>Enhanced fan-out charges scale with consumers \u00d7 shards \u00d7 hours.<\/li>\n<li>Downstream services (OpenSearch, Lambda, data transfer) often exceed the Kinesis line item.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Compatibility issues<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>KCL requires DynamoDB for checkpointing; locking down DynamoDB can break consumers.<\/li>\n<li>Some serialization formats require careful schema evolution practices (e.g., Avro\/Protobuf); Kinesis doesn\u2019t enforce schema.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Migration challenges<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Migrating from Kafka to Kinesis requires rethinking partitioning, offset management, and consumer group semantics.<\/li>\n<li>Re-partitioning strategy changes can break ordering assumptions.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Vendor-specific nuances<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Kinesis APIs and semantics are AWS-specific (though conceptually similar to a distributed log).<\/li>\n<li>Some consumer approaches (EFO vs polling) require different operational tuning.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">14. Comparison with Alternatives<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">How Amazon Kinesis Data Streams compares<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Kinesis Data Streams is best understood as a managed, durable streaming log with ordering-per-shard and retention-based replay. Here\u2019s how it stacks up.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Option<\/th>\n<th>Best For<\/th>\n<th>Strengths<\/th>\n<th>Weaknesses<\/th>\n<th>When to Choose<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>Amazon Kinesis Data Streams<\/strong><\/td>\n<td>Durable ingestion + replay + multiple consumers in AWS<\/td>\n<td>Tight AWS integration, on-demand or provisioned capacity, EFO option, strong metrics<\/td>\n<td>Ordering only per shard, duplicates possible, retention cost, AWS-specific semantics<\/td>\n<td>You need a managed stream with replay and multi-consumer processing in AWS<\/td>\n<\/tr>\n<tr>\n<td><strong>Amazon Data Firehose<\/strong><\/td>\n<td>Managed delivery to S3\/Redshift\/OpenSearch\/Splunk<\/td>\n<td>Minimal ops, buffering\/retry, transforms, delivery destinations<\/td>\n<td>Not designed for custom multi-consumer replay like KDS<\/td>\n<td>You mainly need delivery to storage\/analytics destinations without custom consumers<\/td>\n<\/tr>\n<tr>\n<td><strong>Amazon MSK (Managed Kafka)<\/strong><\/td>\n<td>Kafka ecosystem compatibility<\/td>\n<td>Kafka APIs\/tooling, consumer groups, broad ecosystem<\/td>\n<td>More operational overhead and cost\/complexity than KDS for some teams<\/td>\n<td>You need Kafka semantics\/tools or multi-cloud portability of Kafka clients<\/td>\n<\/tr>\n<tr>\n<td><strong>Amazon SQS<\/strong><\/td>\n<td>Work queues with acknowledgements<\/td>\n<td>Simple, robust queueing, DLQs, per-message visibility<\/td>\n<td>Not an ordered log with replay; fan-out needs SNS<\/td>\n<td>You need task distribution and per-message processing with ack\/DLQ<\/td>\n<\/tr>\n<tr>\n<td><strong>Amazon EventBridge<\/strong><\/td>\n<td>Event routing\/integration<\/td>\n<td>Routing rules, SaaS integrations, event buses<\/td>\n<td>Not a durable replay log; throughput\/latency profile differs<\/td>\n<td>You need routing to many targets and integration patterns<\/td>\n<\/tr>\n<tr>\n<td><strong>Azure Event Hubs<\/strong><\/td>\n<td>Azure-native streaming ingestion<\/td>\n<td>Similar log-style ingestion, partitions, replay<\/td>\n<td>Different cloud ecosystem<\/td>\n<td>You\u2019re primarily on Azure<\/td>\n<\/tr>\n<tr>\n<td><strong>Google Cloud Pub\/Sub<\/strong><\/td>\n<td>GCP event ingestion<\/td>\n<td>Simple pub\/sub, global scale<\/td>\n<td>Semantics differ from durable shard log; replay model differs<\/td>\n<td>You\u2019re primarily on GCP<\/td>\n<\/tr>\n<tr>\n<td><strong>Apache Kafka (self-managed)<\/strong><\/td>\n<td>Full control, on-prem, custom needs<\/td>\n<td>Maximum control, ecosystem<\/td>\n<td>High ops burden, scaling, upgrades, security<\/td>\n<td>You require self-managed control or on-prem constraints<\/td>\n<\/tr>\n<tr>\n<td><strong>Apache Pulsar \/ Redpanda (self-managed\/managed)<\/strong><\/td>\n<td>Alternative streaming platforms<\/td>\n<td>Performance\/feature advantages depending on product<\/td>\n<td>Operational and ecosystem tradeoffs<\/td>\n<td>You need a non-AWS-native streaming platform<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">15. Real-World Example<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Enterprise example: Global e-commerce event backbone<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> A global retailer needs near-real-time insight into user behavior and operational KPIs, while also archiving raw events for compliance and ML training. Multiple teams want to consume events independently (fraud, personalization, analytics, data lake ingestion).<\/li>\n<li><strong>Proposed architecture:<\/strong><\/li>\n<li>Producers (web\/mobile\/backend) publish to Amazon Kinesis Data Streams with partition key = <code>customerId<\/code> or <code>sessionId<\/code> (depending on ordering needs).<\/li>\n<li>A Flink application (Amazon Kinesis Data Analytics for Apache Flink) performs stateful enrichment and aggregates.<\/li>\n<li>Amazon Data Firehose delivers raw events to Amazon S3 (data lake) with prefix partitioning by date\/hour.<\/li>\n<li>A dedicated consumer indexes selected events into Amazon OpenSearch Service for near-real-time search\/analytics.<\/li>\n<li>CloudWatch alarms track iterator age and throttling; CloudTrail audits access.<\/li>\n<li><strong>Why Kinesis Data Streams was chosen:<\/strong><\/li>\n<li>Durable, replayable ingestion; multiple consumers; tight AWS Analytics integration; strong operational metrics.<\/li>\n<li><strong>Expected outcomes:<\/strong><\/li>\n<li>Seconds-level latency for dashboards and fraud signals<\/li>\n<li>Reliable backfills by replaying retention window<\/li>\n<li>Reduced operational overhead compared to self-managed streaming clusters<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Startup\/small-team example: IoT telemetry + alerting<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> A startup collects telemetry from thousands of devices. Traffic is bursty and unpredictable, and they need to alert on anomalies quickly without running complex infrastructure.<\/li>\n<li><strong>Proposed architecture:<\/strong><\/li>\n<li>Devices publish telemetry to an API service which batches and writes to a Kinesis Data Streams stream (on-demand).<\/li>\n<li>AWS Lambda consumes the stream, performs simple threshold checks, and sends alerts (SNS\/email\/Slack integration via webhook).<\/li>\n<li>Raw telemetry is periodically copied to S3 for later analysis.<\/li>\n<li><strong>Why Kinesis Data Streams was chosen:<\/strong><\/li>\n<li>On-demand scaling reduces capacity planning; replay supports debugging; managed service fits a small team.<\/li>\n<li><strong>Expected outcomes:<\/strong><\/li>\n<li>Near-real-time alerts<\/li>\n<li>Simple operations and clear cost model tied to usage<\/li>\n<li>Ability to add new consumers later (e.g., ML scoring) without changing producers<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">16. FAQ<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">1) <strong>Is Amazon Kinesis Data Streams a message queue?<\/strong><br\/>\nNot exactly. It\u2019s a durable streaming log with retention and replay. Queues (like SQS) focus on message acknowledgment and work distribution; Kinesis focuses on streaming ingestion, ordered shards, and multi-consumer replay.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">2) <strong>How is ordering guaranteed?<\/strong><br\/>\nOrdering is guaranteed <strong>within a shard<\/strong>. Records with the same partition key typically map to the same shard (depending on hashing), which preserves per-key ordering. There is no global ordering across all shards.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">3) <strong>Can multiple applications read the same stream?<\/strong><br\/>\nYes. Multiple consumers can read simultaneously. With shared throughput polling, they share shard read throughput. With enhanced fan-out, each registered consumer gets dedicated throughput per shard (with separate pricing).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">4) <strong>What\u2019s the difference between on-demand and provisioned mode?<\/strong><br\/>\nProvisioned mode requires you to choose shard count (capacity). On-demand mode lets AWS manage scaling. Pricing dimensions differ\u2014verify current details on the official pricing page.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">5) <strong>How long is data retained?<\/strong><br\/>\nDefault is commonly 24 hours. Extended retention (up to 365 days) is available with additional cost. Always confirm current limits and pricing for your region.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">6) <strong>Is Kinesis Data Streams serverless?<\/strong><br\/>\nIt\u2019s a managed service. You don\u2019t manage servers, but you do manage stream configuration (or choose on-demand) and consumer\/producers.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">7) <strong>Does it support exactly-once delivery?<\/strong><br\/>\nKinesis provides at-least-once delivery. Consumers can see duplicates, so design idempotency\/deduplication downstream.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">8) <strong>What happens if my consumer falls behind?<\/strong><br\/>\nIt can catch up by reading from earlier positions as long as records are still within the retention window. Monitor <code>IteratorAgeMilliseconds<\/code> to detect lag.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">9) <strong>How do I scale Kinesis Data Streams?<\/strong><br\/>\nIn provisioned mode, you scale by increasing shard count (resharding). In on-demand mode, scaling is handled by AWS, within service quotas.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">10) <strong>How do I avoid hot shards?<\/strong><br\/>\nUse partition keys with good cardinality and distribution. Avoid a single constant key. Consider composite keys (e.g., <code>customerId#randomBucket<\/code>) if strict per-customer ordering isn\u2019t required.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">11) <strong>Can I write compressed data?<\/strong><br\/>\nYes\u2014records are opaque blobs. Many teams compress JSON (gzip) to reduce bytes, but this increases CPU cost and complicates debugging; test carefully.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">12) <strong>How do I send very large events?<\/strong><br\/>\nStore the payload in S3 and put a pointer (S3 bucket\/key, version, checksum) into the stream.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">13) <strong>Does AWS Lambda support Kinesis Data Streams as an event source?<\/strong><br\/>\nYes. Lambda can poll Kinesis and invoke your function with batches. You must design for retries, duplicates, and partial failures.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">14) <strong>How do consumers coordinate shard processing?<\/strong><br\/>\nIf you use KCL, it uses DynamoDB for lease coordination and checkpoints. Without KCL, you must implement shard discovery and checkpointing yourself.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">15) <strong>Can I replicate a stream across regions?<\/strong><br\/>\nNot automatically as a built-in feature. Common patterns include consumer-based replication (read in one region, write to another) or writing to S3 and using cross-region replication. Verify best practices for your latency\/compliance goals.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">16) <strong>Is Kinesis Data Streams part of AWS Analytics?<\/strong><br\/>\nYes. It\u2019s a foundational AWS Analytics ingestion service used to build streaming analytics pipelines.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">17) <strong>Should I choose Amazon MSK instead?<\/strong><br\/>\nChoose MSK when you need Kafka compatibility and Kafka ecosystem tools. Choose Kinesis Data Streams when you prefer a native AWS managed stream with simple integration and don\u2019t require Kafka APIs.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">17. Top Online Resources to Learn Amazon Kinesis Data Streams<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Resource Type<\/th>\n<th>Name<\/th>\n<th>Why It Is Useful<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Official Documentation<\/td>\n<td>Amazon Kinesis Data Streams Developer Guide: https:\/\/docs.aws.amazon.com\/streams\/latest\/dev\/introduction.html<\/td>\n<td>Canonical reference for concepts (shards, retention, iterators), APIs, limits, and patterns<\/td>\n<\/tr>\n<tr>\n<td>Official API Reference<\/td>\n<td>Kinesis API Reference: https:\/\/docs.aws.amazon.com\/kinesis\/latest\/APIReference\/Welcome.html<\/td>\n<td>Exact API behavior, request\/response fields, error codes<\/td>\n<\/tr>\n<tr>\n<td>Official Pricing<\/td>\n<td>Pricing page: https:\/\/aws.amazon.com\/kinesis\/data-streams\/pricing\/<\/td>\n<td>Current pricing dimensions (mode-specific) and region-specific rates<\/td>\n<\/tr>\n<tr>\n<td>Pricing Tool<\/td>\n<td>AWS Pricing Calculator: https:\/\/calculator.aws\/#\/<\/td>\n<td>Build estimates including downstream services like Lambda, S3, OpenSearch<\/td>\n<\/tr>\n<tr>\n<td>Monitoring Docs<\/td>\n<td>CloudWatch metrics for Kinesis (navigate from docs): https:\/\/docs.aws.amazon.com\/streams\/latest\/dev\/monitoring.html<\/td>\n<td>Metric definitions and monitoring recommendations (iterator age, throttling)<\/td>\n<\/tr>\n<tr>\n<td>Security Docs<\/td>\n<td>Security in Kinesis Data Streams: https:\/\/docs.aws.amazon.com\/streams\/latest\/dev\/security.html<\/td>\n<td>IAM, encryption, VPC endpoints, compliance considerations<\/td>\n<\/tr>\n<tr>\n<td>Architecture Center<\/td>\n<td>AWS Architecture Center: https:\/\/aws.amazon.com\/architecture\/<\/td>\n<td>Patterns and reference architectures for streaming and Analytics pipelines<\/td>\n<\/tr>\n<tr>\n<td>Learning Path<\/td>\n<td>AWS Streaming Data Solutions (AWS docs\/architecture content\u2014search within AWS): https:\/\/aws.amazon.com\/streaming-data\/<\/td>\n<td>Service selection guidance across Kinesis, MSK, Firehose<\/td>\n<\/tr>\n<tr>\n<td>SDK Documentation<\/td>\n<td>Boto3 Kinesis Client: https:\/\/boto3.amazonaws.com\/v1\/documentation\/api\/latest\/reference\/services\/kinesis.html<\/td>\n<td>Practical API usage examples for Python<\/td>\n<\/tr>\n<tr>\n<td>Official Samples (AWS)<\/td>\n<td>AWS Samples on GitHub: https:\/\/github.com\/aws-samples<\/td>\n<td>Find maintained examples for Kinesis consumers\/producers (verify repo relevance\/updates)<\/td>\n<\/tr>\n<tr>\n<td>Workshops<\/td>\n<td>AWS Workshops: https:\/\/workshops.aws\/<\/td>\n<td>Hands-on labs; search for Kinesis\/streaming workshops<\/td>\n<\/tr>\n<tr>\n<td>Video (Official)<\/td>\n<td>AWS YouTube Channel: https:\/\/www.youtube.com\/@amazonwebservices<\/td>\n<td>Recorded sessions on Kinesis patterns, streaming Analytics, best practices<\/td>\n<\/tr>\n<tr>\n<td>Service Overview<\/td>\n<td>Product page: https:\/\/aws.amazon.com\/kinesis\/data-streams\/<\/td>\n<td>Feature overview and links to docs\/announcements<\/td>\n<\/tr>\n<tr>\n<td>CLI Reference<\/td>\n<td>AWS CLI Command Reference (kinesis): https:\/\/docs.aws.amazon.com\/cli\/latest\/reference\/kinesis\/<\/td>\n<td>Exact CLI syntax for create\/put\/get\/list operations<\/td>\n<\/tr>\n<tr>\n<td>Reputable Community<\/td>\n<td>AWS re:Post (Kinesis topics): https:\/\/repost.aws\/tags\/TAo6LZxYQjQ9y3Kp1mQmQWZg\/amazon-kinesis<\/td>\n<td>Practical troubleshooting patterns from AWS community (validate answers)<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">18. Training and Certification Providers<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Institute<\/th>\n<th>Suitable Audience<\/th>\n<th>Likely Learning Focus<\/th>\n<th>Mode<\/th>\n<th>Website URL<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>DevOpsSchool.com<\/td>\n<td>DevOps engineers, cloud engineers, architects, developers<\/td>\n<td>AWS + DevOps + cloud-native tooling; may include streaming\/Analytics modules<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.devopsschool.com\/<\/td>\n<\/tr>\n<tr>\n<td>ScmGalaxy.com<\/td>\n<td>Beginners to intermediate practitioners<\/td>\n<td>Software configuration management, DevOps, CI\/CD; may touch AWS operations<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.scmgalaxy.com\/<\/td>\n<\/tr>\n<tr>\n<td>CLoudOpsNow.in<\/td>\n<td>Cloud operations and platform teams<\/td>\n<td>CloudOps practices, operations, monitoring, cost awareness<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.cloudopsnow.in\/<\/td>\n<\/tr>\n<tr>\n<td>SreSchool.com<\/td>\n<td>SREs, reliability engineers, platform engineers<\/td>\n<td>Reliability engineering, observability, incident response, production readiness<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.sreschool.com\/<\/td>\n<\/tr>\n<tr>\n<td>AiOpsSchool.com<\/td>\n<td>Ops teams adopting AIOps<\/td>\n<td>AIOps concepts, monitoring analytics, automation<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.aiopsschool.com\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">19. Top Trainers<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Platform\/Site<\/th>\n<th>Likely Specialization<\/th>\n<th>Suitable Audience<\/th>\n<th>Website URL<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>RajeshKumar.xyz<\/td>\n<td>DevOps\/cloud training content (verify specific offerings)<\/td>\n<td>Learners seeking guided training resources<\/td>\n<td>https:\/\/rajeshkumar.xyz\/<\/td>\n<\/tr>\n<tr>\n<td>devopstrainer.in<\/td>\n<td>DevOps and cloud training<\/td>\n<td>Individuals\/teams looking for practical DevOps upskilling<\/td>\n<td>https:\/\/www.devopstrainer.in\/<\/td>\n<\/tr>\n<tr>\n<td>devopsfreelancer.com<\/td>\n<td>Freelance DevOps consulting\/training style resources<\/td>\n<td>Teams needing targeted help or mentoring<\/td>\n<td>https:\/\/www.devopsfreelancer.com\/<\/td>\n<\/tr>\n<tr>\n<td>devopssupport.in<\/td>\n<td>DevOps support\/training resources<\/td>\n<td>Practitioners needing operational guidance<\/td>\n<td>https:\/\/www.devopssupport.in\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">20. Top Consulting Companies<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Company<\/th>\n<th>Likely Service Area<\/th>\n<th>Where They May Help<\/th>\n<th>Consulting Use Case Examples<\/th>\n<th>Website URL<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>cotocus.com<\/td>\n<td>Cloud\/DevOps consulting (verify exact service catalog)<\/td>\n<td>Architecture reviews, cloud migrations, platform modernization<\/td>\n<td>Design a streaming ingestion pipeline; implement monitoring and cost controls for Kinesis-based Analytics<\/td>\n<td>https:\/\/cotocus.com\/<\/td>\n<\/tr>\n<tr>\n<td>DevOpsSchool.com<\/td>\n<td>DevOps and cloud consulting\/training services<\/td>\n<td>Skills enablement + implementation support<\/td>\n<td>Build producer\/consumer patterns; set up CI\/CD for stream processors; define runbooks and alarms<\/td>\n<td>https:\/\/www.devopsschool.com\/<\/td>\n<\/tr>\n<tr>\n<td>DEVOPSCONSULTING.IN<\/td>\n<td>DevOps consulting<\/td>\n<td>DevOps process\/tooling and cloud operations<\/td>\n<td>Implement secure IAM patterns for stream access; set up observability for streaming workloads<\/td>\n<td>https:\/\/www.devopsconsulting.in\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">21. Career and Learning Roadmap<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">What to learn before Amazon Kinesis Data Streams<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Core AWS fundamentals: IAM, VPC basics, CloudWatch, KMS<\/li>\n<li>Basic distributed systems concepts: throughput, latency, backpressure<\/li>\n<li>Data formats and serialization: JSON, Avro\/Protobuf basics (optional but helpful)<\/li>\n<li>A programming SDK: Python (boto3), Java, or Node.js<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">What to learn after<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Consumer frameworks<\/strong>: Kinesis Client Library (KCL), enhanced fan-out patterns<\/li>\n<li><strong>Stream processing<\/strong>: Amazon Kinesis Data Analytics for Apache Flink (stateful processing, windows, checkpoints)<\/li>\n<li><strong>Delivery pipelines<\/strong>: Amazon Data Firehose to S3\/Redshift\/OpenSearch<\/li>\n<li><strong>Data lake governance<\/strong>: AWS Glue Data Catalog, Lake Formation, partitioning strategies<\/li>\n<li><strong>Observability<\/strong>: end-to-end tracing\/logging\/metrics for streaming systems<\/li>\n<li><strong>Security<\/strong>: cross-account patterns, SCPs, KMS key management, data classification<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Job roles that use it<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Data Engineer (streaming)<\/li>\n<li>Cloud Engineer \/ Platform Engineer<\/li>\n<li>DevOps \/ SRE (observability pipelines)<\/li>\n<li>Solutions Architect<\/li>\n<li>Backend Engineer (event-driven systems)<\/li>\n<li>Security Engineer (security event pipelines)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Certification path (AWS)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">There is no \u201cKinesis-only\u201d certification, but it appears across AWS exams and real-world roles. Relevant AWS certifications to consider:\n&#8211; AWS Certified Solutions Architect \u2013 Associate\/Professional\n&#8211; AWS Certified Developer \u2013 Associate\n&#8211; AWS Certified Data Engineer \u2013 Associate (if available in your region\/timeframe; verify current AWS certification catalog)\n&#8211; AWS Certified DevOps Engineer \u2013 Professional<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Project ideas for practice<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Build a clickstream pipeline: producer \u2192 Kinesis \u2192 Lambda \u2192 DynamoDB \u2192 dashboard<\/li>\n<li>Implement KCL consumer with DynamoDB checkpoints and scale-out workers<\/li>\n<li>Compare shared throughput vs enhanced fan-out consumer latency<\/li>\n<li>Implement schema versioning with a <code>schemaVersion<\/code> field and backward-compatible consumer parsing<\/li>\n<li>Archive raw stream events to S3 and query with Athena (downstream)<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">22. Glossary<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Analytics (AWS category):<\/strong> Services and patterns that help ingest, process, store, and analyze data for insights.<\/li>\n<li><strong>Amazon Kinesis Data Streams:<\/strong> AWS managed service for ingesting and storing streaming data with ordered shards and replay.<\/li>\n<li><strong>Stream:<\/strong> A named Kinesis resource that stores a sequence of records.<\/li>\n<li><strong>Shard:<\/strong> A capacity and ordering unit inside a stream.<\/li>\n<li><strong>Record:<\/strong> A data blob plus metadata (partition key, sequence number, timestamps).<\/li>\n<li><strong>Partition key:<\/strong> A string used to route records to shards; determines ordering boundaries and affects load distribution.<\/li>\n<li><strong>Sequence number:<\/strong> A unique identifier assigned to each record within a shard.<\/li>\n<li><strong>Retention period:<\/strong> Time window during which records remain available to read.<\/li>\n<li><strong>Consumer:<\/strong> Application that reads records from the stream and processes them.<\/li>\n<li><strong>Producer:<\/strong> Application that writes records into the stream.<\/li>\n<li><strong>Shard iterator:<\/strong> A pointer used by consumers to read records from a shard.<\/li>\n<li><strong>TRIM_HORIZON:<\/strong> Iterator type to read from the oldest available record in the retention window.<\/li>\n<li><strong>LATEST:<\/strong> Iterator type to read only new records arriving after the iterator is created.<\/li>\n<li><strong>Enhanced fan-out (EFO):<\/strong> Consumer mode providing dedicated throughput per consumer per shard using subscribe-style APIs.<\/li>\n<li><strong>KCL (Kinesis Client Library):<\/strong> Library that simplifies building scalable consumers (shard discovery, leases, checkpoints).<\/li>\n<li><strong>Checkpointing:<\/strong> Storing progress (last processed sequence number) so consumers can resume after restart.<\/li>\n<li><strong>Hot shard:<\/strong> A shard receiving disproportionate traffic due to uneven partition key distribution.<\/li>\n<li><strong>At-least-once delivery:<\/strong> Delivery guarantee where messages may be delivered more than once; consumers must handle duplicates.<\/li>\n<li><strong>SSE-KMS:<\/strong> Server-side encryption at rest using AWS Key Management Service.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">23. Summary<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Amazon Kinesis Data Streams is an AWS Analytics service for ingesting, storing, and replaying streaming data with shard-based ordering and scalable throughput. It fits best as the durable ingestion backbone for real-time pipelines feeding Lambda, Flink streaming analytics, Firehose delivery, and downstream stores like S3 and OpenSearch.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Key points to keep in mind:\n&#8211; <strong>Cost<\/strong> is driven by capacity mode (on-demand vs provisioned), data volume, retention, and consumer fan-out (especially enhanced fan-out).\n&#8211; <strong>Security<\/strong> depends on least-privilege IAM, SSE-KMS encryption, audit via CloudTrail, and (where needed) private connectivity using VPC endpoints.\n&#8211; <strong>Correct partition key design<\/strong> is critical to avoid hot shards and throttling.\n&#8211; Use it when you need <strong>replayable streaming ingestion<\/strong> and multiple consumers; consider SQS\/EventBridge\/Firehose\/MSK when your primary needs are queuing, routing, delivery-only, or Kafka compatibility.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Next step: extend the lab by implementing a KCL-based consumer with DynamoDB checkpointing and adding CloudWatch alarms on iterator age to make your pipeline production-ready.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Analytics<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[21,20],"tags":[],"class_list":["post-129","post","type-post","status-publish","format-standard","hentry","category-analytics","category-aws"],"_links":{"self":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/posts\/129","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/comments?post=129"}],"version-history":[{"count":0,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/posts\/129\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/media?parent=129"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/categories?post=129"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/tags?post=129"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}