{"id":52306,"date":"2025-09-03T02:40:27","date_gmt":"2025-09-03T02:40:27","guid":{"rendered":"https:\/\/www.devopsschool.com\/blog\/?p=52306"},"modified":"2026-02-21T08:15:11","modified_gmt":"2026-02-21T08:15:11","slug":"kafka-consumers-partition-a-complete-guide","status":"publish","type":"post","link":"https:\/\/www.devopsschool.com\/blog\/kafka-consumers-partition-a-complete-guide\/","title":{"rendered":"Kafka: Consumers &amp; Partition: A Complete Guide"},"content":{"rendered":"\n<p>Here\u2019s your end-to-end, no-gaps guide. I\u2019ll keep the vocabulary crisp and the mental model consistent.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The layers <\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Topic<\/strong>: named stream of records.<\/li>\n\n\n\n<li><strong>Partition<\/strong>: an ordered shard of a topic. Ordering is guaranteed <em>within<\/em> a partition.<\/li>\n\n\n\n<li><strong>Consumer group<\/strong>: a logical application reading a topic. As a group, each record is processed once.<\/li>\n\n\n\n<li><strong>Consumer process (a.k.a. worker \/ pod \/ container \/ JVM)<\/strong>: a running OS process that joins a consumer group.<\/li>\n\n\n\n<li><strong>KafkaConsumer instance<\/strong>: the <strong>client object<\/strong> in your code (e.g., <code>new KafkaConsumer&lt;&gt;(props)<\/code> in Java) that talks to brokers, polls, tracks offsets. Kafka assigns partitions to <strong>instances<\/strong>.<\/li>\n\n\n\n<li><strong>Thread<\/strong>: execution lane inside a process. You may run zero, one, or many KafkaConsumer instances per process\u2014typically one per thread.<\/li>\n<\/ul>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><strong>Rule of exclusivity:<\/strong> At any moment, <strong>one partition is owned by exactly one KafkaConsumer instance in a group<\/strong>. No two instances in the same group read the same partition concurrently.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Architecture: Topic \u2192 Partitions \u2192 Consumer Group \u2192 Processes \u2192 Threads \u2192 KafkaConsumer instances<\/h2>\n\n\n\n<div class=\"wp-block-merpress-mermaidjs diagram-source-mermaid\"><pre class=\"mermaid\">flowchart TB\n  %% ===== Topic &amp; Partitions =====\n  subgraph T[\"Topic: orders (6 partitions)\"]\n    direction LR\n    P0([\"Partition P0n(offsets: 0..\u221e)\"])\n    P1([\"Partition P1n(offsets: 0..\u221e)\"])\n    P2([\"Partition P2n(offsets: 0..\u221e)\"])\n    P3([\"Partition P3n(offsets: 0..\u221e)\"])\n    P4([\"Partition P4n(offsets: 0..\u221e)\"])\n    P5([\"Partition P5n(offsets: 0..\u221e)\"])\n  end\n\n  %% ===== Consumer Group =====\n  subgraph G[\"Consumer Group: orders-service\"]\n    direction TB\n\n    %% ---- Process A (Worker\/Pod\/Container) ----\n    subgraph A[\"Consumer Process A (Worker A \/ Pod)\"]\n      direction TB\n      ATh1[\"Thread 1nKafkaConsumer #1\"]\n      ATh2[\"Thread 2nKafkaConsumer #2\"]\n    end\n\n    %% ---- Process B (Worker\/Pod\/Container) ----\n    subgraph B[\"Consumer Process B (Worker B \/ Pod)\"]\n      direction TB\n      BTh1[\"Thread 1nKafkaConsumer #3\"]\n    end\n  end\n\n  %% ===== Active assignment (example) =====\n  P0 -- \"assigned to\" --&gt; ATh1\n  P3 -- \"assigned to\" --&gt; ATh1\n\n  P1 -- \"assigned to\" --&gt; ATh2\n  P4 -- \"assigned to\" --&gt; ATh2\n\n  P2 -- \"assigned to\" --&gt; BTh1\n  P5 -- \"assigned to\" --&gt; BTh1\n\n  %% ===== Notes \/ Rules =====\n  Rules[\"Rules:\n  \u2022 One partition \u2192 at most one active KafkaConsumer instance in the group.\n  \u2022 A KafkaConsumer instance may own 1..N partitions.\n  \u2022 KafkaConsumer is NOT thread-safe \u2192 one thread owns poll\/commit for that instance.\n  \u2022 Ordering is guaranteed per partition (by offset), not across partitions.\"]\n  classDef note fill:#fff,stroke:#bbb,stroke-dasharray: 3 3,color:#333;\n  Rules:::note\n  G --- Rules\n<\/pre><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Partition assignment scenarios (how partitions are spread across instances)<\/h2>\n\n\n\n<div class=\"wp-block-merpress-mermaidjs diagram-source-mermaid\"><pre class=\"mermaid\">flowchart LR\n  %% ===== Scenario A =====\n  subgraph S1[\"Scenario A: 7 partitions (P0..P6), 3 consumer instances (C1..C3)\"]\n    direction TB\n    C1A[\"C1 \u279c P0, P3, P6\"]\n    C2A[\"C2 \u279c P1, P4\"]\n    C3A[\"C3 \u279c P2, P5\"]\n  end\n\n  %% ===== Scenario B =====\n  subgraph S2[\"Scenario B: Add C4 (cooperative-sticky rebalance)\"]\n    direction TB\n    C1B[\"C1 \u279c P0, P6\"]\n    C2B[\"C2 \u279c P1, P4\"]\n    C3B[\"C3 \u279c P2\"]\n    C4B[\"C4 \u279c P3, P5\"]\n  end\n\n  %% ===== Scenario C =====\n  subgraph S3[\"Scenario C: 4 partitions (P0..P3), 6 consumer instances (C1..C6)\"]\n    direction TB\n    C1C[\"C1 \u279c P0\"]\n    C2C[\"C2 \u279c P1\"]\n    C3C[\"C3 \u279c P2\"]\n    C4C[\"C4 \u279c P3\"]\n    C5C[\"C5 \u279c (idle)\"]\n    C6C[\"C6 \u279c (idle)\"]\n  end\n\n  classDef idle fill:#fafafa,stroke:#ccc,color:#999;\n  C5C:::idle; C6C:::idle\n<\/pre><img decoding=\"async\" src=\"https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2025\/09\/merpress-1.png\" alt=\"\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Best practices &amp; limitations (mind map)<\/h2>\n\n\n\n<div class=\"wp-block-merpress-mermaidjs diagram-source-mermaid\"><pre class=\"mermaid\">mindmap\n  root((Partitioning and assignment))\n    Assignment strategies\n      Range\n        Contiguous ranges per topic\n        Can skew when topics or partition counts differ\n      Round robin\n        Even spread over time\n        Less stable across rebalances\n      Sticky\n        Keeps prior mapping to reduce churn\n      Cooperative sticky\n        Incremental rebalance with fewer pauses\n        Recommended default in modern clients\n    Best practices\n      Plan partition count\n        Estimate peak throughput in MB per second\n        Target around 1 to 3 MB per second per partition\n        Add 1.5 to 2x headroom\n      Keying\n        Key by entity to preserve per key order\n        Avoid hot keys and ensure good spread\n        Use a custom consistent hash partitioner if needed\n      Stability\n        Use static membership with group.instance.id\n        Use cooperative sticky assignor\n      Ordering safe processing\n        Per partition single thread executors\n        Or per key serial queues on top of a worker pool\n        Track contiguous offsets and commit after processing\n      Backpressure\n        Bounded queues with pause and resume when saturated\n        Keep poll loop fast and do IO on worker threads\n      Consumer safety\n        KafkaConsumer is not thread safe\n        One consumer instance per poll and commit thread\n        Disable auto commit and commit on completion\n    Limitations and trade offs\n      No global order across partitions\n      Max parallelism per group equals number of partitions\n      Hot partitions bottleneck throughput\n      Too many partitions increase metadata and rebalance time\n      Changing partition count remaps hash of key modulo N\n<\/pre><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">How messages land in partitions (with\/without keys)<\/h2>\n\n\n\n<div class=\"wp-block-merpress-mermaidjs diagram-source-mermaid\"><pre class=\"mermaid\">flowchart TB\n  subgraph Prod[\"Producer\"]\n    direction TB\n    K1[\"send(key=A, value=m1)\"]\n    K2[\"send(key=B, value=m2)\"]\n    K3[\"send(key=A, value=m3)\"]\n    K4[\"send(key=C, value=m4)\"]\n  end\n\n  subgraph Partitions[\"Topic: orders (3 partitions)\"]\n    direction LR\n    X0[\"P0noffsets 0..\u221e\"]\n    X1[\"P1noffsets 0..\u221e\"]\n    X2[\"P2noffsets 0..\u221e\"]\n  end\n\n  note1[\"Default partitioner:npartition = hash(key) % numPartitionsn(no key \u21d2 sticky batching across partitions)\"]\n  classDef note fill:#fff,stroke:#bbb,stroke-dasharray: 3 3,color:#333;\n  note1:::note\n\n  Prod --- note1\n  K1 --&gt; X1\n  K2 --&gt; X0\n  K3 --&gt; X1\n  K4 --&gt; X2\n\n  subgraph Order[\"Per-partition order\"]\n    direction LR\n    X1o[\"P1 offsets:n0: m1(A)n1: m3(A)\"]\n    X0o[\"P0 offsets:n0: m2(B)\"]\n    X2o[\"P2 offsets:n0: m4(C)\"]\n  end\n\n  X1 --&gt; X1o\n  X0 --&gt; X0o\n  X2 --&gt; X2o\n<\/pre><\/div>\n\n\n\n<h1 class=\"wp-block-heading\">What is a partition?<\/h1>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Core idea:<\/strong> A partition is <strong>one append-only log<\/strong> that belongs to a topic.<\/li>\n\n\n\n<li><strong>Physical &amp; logical:<\/strong>\n<ul class=\"wp-block-list\">\n<li><em>Logical<\/em> shard for routing and parallelism (clients see \u201cP0, P1, \u2026\u201d).<\/li>\n\n\n\n<li><em>Physical<\/em> files on a broker (leader) with <strong>replicas<\/strong> on other brokers for HA.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Offsets:<\/strong> Every record in a partition gets a strictly increasing <strong>offset<\/strong> (0,1,2,\u2026).<br>Ordering is guaranteed <strong>within that partition<\/strong> (by offset). There is <strong>no global order across partitions<\/strong>.<\/li>\n<\/ul>\n\n\n\n<h1 class=\"wp-block-heading\">How messages are placed into partitions<\/h1>\n\n\n\n<p><strong>Producers<\/strong> decide the target partition using a <strong>partitioner<\/strong>:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>If you set a key<\/strong> (<code>producer.send(topic, key, value)<\/code>):<br>\u2192 Kafka uses a hash of the key: <code>partition = hash(key) % numPartitions<\/code>.\n<ul class=\"wp-block-list\">\n<li>All messages with the <strong>same key<\/strong> go to the <strong>same partition<\/strong> \u2192 preserves per-key order.<\/li>\n\n\n\n<li>Watch out for <strong>hot keys<\/strong> (one key getting huge traffic) \u2192 one partition becomes a bottleneck.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>If you don\u2019t set a key<\/strong> (key = null):\n<ul class=\"wp-block-list\">\n<li>The default Java producer uses a <strong>sticky partitioner<\/strong> (since Kafka 2.4): it sends a <em>batch<\/em> to one partition to maximize throughput, then switches.<\/li>\n\n\n\n<li>Older behavior was round-robin. Either way, <strong>distribution is roughly even over time<\/strong> but <strong>not ordered across partitions<\/strong>.<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>You can also plug a <strong>custom partitioner<\/strong> if you need special routing (e.g., consistent hashing).<\/p>\n<\/blockquote>\n\n\n\n<h1 class=\"wp-block-heading\">Example: 7 messages into a topic with 3 partitions<\/h1>\n\n\n\n<p>Assume topic <code>orders<\/code> has partitions: <strong>P0, P1, P2<\/strong>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">A) No keys (illustrative round-robin for clarity)<\/h2>\n\n\n\n<p>Messages: <code>m1, m2, m3, m4, m5, m6, m7<\/code><\/p>\n\n\n\n<p>A simple spread might look like:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-1\" data-shcb-language-name=\"HTTP\" data-shcb-language-slug=\"http\"><span><code class=\"hljs language-http\"><span class=\"hljs-attribute\">P0<\/span>: m1 (offset 0), m4 (1), m7 (2)\n<span class=\"hljs-attribute\">P1<\/span>: m2 (offset 0), m5 (1)\n<span class=\"hljs-attribute\">P2<\/span>: m3 (offset 0), m6 (1)\n<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-1\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">HTTP<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">http<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>Notes:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Each partition has its <strong>own offset sequence<\/strong> starting at 0.<\/li>\n\n\n\n<li>Consumers reading P0 will always see m1 \u2192 m4 \u2192 m7.<br>But <strong>there\u2019s no defined order between<\/strong> (say) m2 on P1 and m3 on P2.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">B) With keys (so per-key order holds)<\/h2>\n\n\n\n<p>Suppose keys: <code>A,B,A,C,B,A,G<\/code> for messages <code>m1..m7<\/code>.<br>(Using a simple illustration of <code>hash(key)%3<\/code>, say A\u2192P1, B\u2192P0, C\u2192P2, G\u2192P2.)<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-2\" data-shcb-language-name=\"HTTP\" data-shcb-language-slug=\"http\"><span><code class=\"hljs language-http\"><span class=\"hljs-attribute\">P0<\/span>: m2(B, offset 0), m5(B, 1)\n<span class=\"hljs-attribute\">P1<\/span>: m1(A, offset 0), m3(A, 1), m6(A, 2)\n<span class=\"hljs-attribute\">P2<\/span>: m4(C, offset 0), m7(G, 1)\n<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-2\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">HTTP<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">http<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>Notes:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>All <strong>A<\/strong> events land on <strong>P1<\/strong>, so A\u2019s order is preserved (m1 \u2192 m3 \u2192 m6).<\/li>\n\n\n\n<li>All <strong>B<\/strong> events land on <strong>P0<\/strong>, etc.<\/li>\n\n\n\n<li>If you later <strong>change the partition count<\/strong>, <code>hash(key) % numPartitions<\/code> changes \u2192 <strong>future records for the same key may map to a different partition<\/strong> (ordering across time can \u201cjump\u201d partitions). Plan for that (see best practices).<\/li>\n<\/ul>\n\n\n\n<h1 class=\"wp-block-heading\">Why partitions matter (consumers)<\/h1>\n\n\n\n<ul class=\"wp-block-list\">\n<li>In a <strong>consumer group<\/strong>, <strong>one partition \u2192 one active consumer instance<\/strong> at a time.<br>Max parallelism per topic per group = <strong>number of partitions<\/strong>.<\/li>\n\n\n\n<li>A single <strong>KafkaConsumer instance<\/strong> can own <strong>multiple partitions<\/strong>, and it will deliver records from each partition in the correct per-partition order.<\/li>\n<\/ul>\n\n\n\n<h1 class=\"wp-block-heading\">Best practices for partitioning<\/h1>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Choose keys deliberately<\/strong>\n<ul class=\"wp-block-list\">\n<li>If you need per-entity order (e.g., per user\/order\/account), <strong>use that entity ID as the key<\/strong>.<\/li>\n\n\n\n<li>Ensure keys are well-distributed to avoid <strong>hot partitions<\/strong>.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Plan partition count with headroom<\/strong>\n<ul class=\"wp-block-list\">\n<li>Start with enough partitions for peak throughput + <strong>1.5\u20132\u00d7<\/strong> headroom.<\/li>\n\n\n\n<li>Rule of thumb capacity: <strong>~1\u20133 MB\/s per partition<\/strong> (varies by infra).<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Avoid frequent partition increases<\/strong> if strict per-key order must hold over long time frames.\n<ul class=\"wp-block-list\">\n<li>Adding partitions can remap keys \u2192 future events for a key may go to a new partition.<\/li>\n\n\n\n<li>If you must scale up, consider creating a <strong>new topic<\/strong> with more partitions and migrating producers\/consumers in a controlled way (or implement a <strong>custom consistent-hash partitioner<\/strong>).<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Monitor skew<\/strong>\n<ul class=\"wp-block-list\">\n<li>Watch per-partition <strong>bytes in\/out<\/strong>, <strong>lag<\/strong>, and <strong>message rate<\/strong>.<\/li>\n\n\n\n<li>If a few partitions are hot, re-evaluate your <strong>key choice<\/strong> or <strong>increase partitions<\/strong> (with the caveat above).<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h1 class=\"wp-block-heading\">Limitations &amp; gotchas<\/h1>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>No global ordering:<\/strong> Only within a partition. Don\u2019t build logic that needs cross-partition order.<\/li>\n\n\n\n<li><strong>Hot keys:<\/strong> One key can bottleneck an entire partition (and thus your group\u2019s parallelism).<\/li>\n\n\n\n<li><strong>Too many partitions:<\/strong> Faster parallelism, but increases broker memory\/files, metadata load, rebalance time, recovery time.<\/li>\n\n\n\n<li><strong>Changing partition count:<\/strong> Remaps keys (default partitioner) \u2192 can disrupt long-lived per-key ordering assumptions.<\/li>\n<\/ul>\n\n\n\n<h1 class=\"wp-block-heading\">TL;DR<\/h1>\n\n\n\n<ul class=\"wp-block-list\">\n<li>A <strong>partition<\/strong> is the <strong>storage + routing unit<\/strong> of a topic: an append-only log with its own offsets.<\/li>\n\n\n\n<li><strong>Producers<\/strong> choose the partition (by <strong>key hash<\/strong> or <strong>sticky\/round-robin<\/strong> when keyless).<\/li>\n\n\n\n<li><strong>Consumers<\/strong> read partitions independently; <strong>order only exists within a partition<\/strong>.<\/li>\n\n\n\n<li>Pick <strong>keys<\/strong> and <strong>partition counts<\/strong> intentionally to balance <strong>order<\/strong>, <strong>throughput<\/strong>, and <strong>operational overhead<\/strong>.<\/li>\n\n\n\n<li>\n<\/ul>\n<\/blockquote>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<h2 class=\"wp-block-heading\">\u201cKafkaConsumer is NOT thread-safe\u201d \u2014 what this actually means<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>You <strong>must not<\/strong> call <code>poll()<\/code>, <code>commitSync()<\/code>, <code>seek()<\/code>, etc. on the <strong>same<\/strong> KafkaConsumer instance from multiple threads.<\/li>\n\n\n\n<li>Safe patterns:\n<ol class=\"wp-block-list\">\n<li><strong>One consumer instance on one thread<\/strong> (single-threaded polling).<\/li>\n\n\n\n<li><strong>One consumer instance per thread<\/strong> (multi-consumer, multi-thread).<\/li>\n\n\n\n<li><strong>One polling thread + worker pool<\/strong>: the poller thread owns the consumer and hands work to other threads; only the poller commits.<\/li>\n<\/ol>\n<\/li>\n<\/ul>\n\n\n\n<p>Unsafe patterns (don\u2019t do these): two threads calling <code>poll()<\/code> on the same instance; one thread polling while another commits on the same instance.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<h2 class=\"wp-block-heading\">How partitions get assigned (and re-assigned)<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>When a consumer group starts (or changes), the <strong>group coordinator<\/strong> performs an <strong>assignment<\/strong>: it maps topic partitions \u2192 consumer instances.<\/li>\n\n\n\n<li><strong>Triggers for rebalance:<\/strong> a consumer joins\/leaves\/crashes, subscriptions change, partitions change, a consumer stops heartbeating (e.g., blocked poll loop).<\/li>\n\n\n\n<li><strong>Lifecycle callbacks<\/strong> (Java): <code>onPartitionsRevoked<\/code> \u2192 <strong>you must finish\/flush and commit<\/strong> work for those partitions \u2192 then <code>onPartitionsAssigned<\/code>.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Assignment strategies (high-level)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Range<\/strong>: assigns contiguous ranges per topic\u2014can skew if topics\/partitions per topic differ.<\/li>\n\n\n\n<li><strong>RoundRobin<\/strong>: cycles through partitions \u2192 consumers\u2014more even across topics.<\/li>\n\n\n\n<li><strong>Sticky<\/strong>: tries to keep prior assignments stable across rebalances (reduces churn).<\/li>\n\n\n\n<li><strong>CooperativeSticky<\/strong>: <strong>best default<\/strong> today\u2014incremental rebalances; consumers give up partitions gradually, minimizing stop-the-world pauses.<\/li>\n<\/ul>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><strong>Best practice:<\/strong> Use <strong>CooperativeSticky<\/strong> + <strong>static membership<\/strong> (stable <code>group.instance.id<\/code>) so transient restarts don\u2019t trigger full rebalances.<\/p>\n<\/blockquote>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<h2 class=\"wp-block-heading\">What an assignment can look like (concrete scenarios)<\/h2>\n\n\n\n<p><strong>Example A: 7 partitions, 3 consumer instances (C1, C2, C3)<\/strong><br>A balanced round-robin\/sticky outcome might be:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>C1: P0, P3, P6<\/li>\n\n\n\n<li>C2: P1, P4<\/li>\n\n\n\n<li>C3: P2, P5<\/li>\n<\/ul>\n\n\n\n<p><strong>Example B: add a 4th consumer (C4) \u2192 incremental rebalance<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Previous owner of a partition will relinquish one:\n<ul class=\"wp-block-list\">\n<li>C1: P0, P6<\/li>\n\n\n\n<li>C2: P1, P4<\/li>\n\n\n\n<li>C3: P2<\/li>\n\n\n\n<li><strong>C4: P3, P5<\/strong><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<p><strong>Example C: more consumers than partitions<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>4 partitions, 6 consumers \u2192 2 consumers sit <strong>idle<\/strong> (assigned nothing). Adding instances beyond partition count <strong>does not<\/strong> increase read parallelism.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<h2 class=\"wp-block-heading\">Concurrency <em>inside<\/em> a worker (safe blueprints)<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Pattern A \u2014 <strong>Per-partition serial executors<\/strong> (simple &amp; safe)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Each assigned partition is processed by its <strong>own single-thread lane<\/strong>.<\/li>\n\n\n\n<li>Preserves partition ordering trivially.<\/li>\n\n\n\n<li>Good when you key by entity (e.g., <code>orderId<\/code>) so ordering matters.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Pattern B \u2014 <strong>Per-key serialization over a shared pool<\/strong> (higher throughput)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Maintain tiny FIFO queues per <strong>key<\/strong> (or per partition+key) and execute them on a larger thread pool.<\/li>\n\n\n\n<li>Each key is processed serially; many keys run in parallel.<\/li>\n\n\n\n<li>Add LRU capping to avoid unbounded key maps.<\/li>\n\n\n\n<li>Good for I\/O-bound workloads with hot partitions\/keys.<\/li>\n<\/ul>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><strong>Don\u2019t<\/strong> spray records from the same partition to an unkeyed pool\u2014this breaks ordering.<\/p>\n<\/blockquote>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<h2 class=\"wp-block-heading\">CPU-bound vs I\/O-bound sizing (practical math)<\/h2>\n\n\n\n<p>Let:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>R<\/code> = records\/sec <em>per consumer instance<\/em><\/li>\n\n\n\n<li><code>T<\/code> = average processing time per record (seconds)<\/li>\n<\/ul>\n\n\n\n<p><strong>Required concurrency \u2248 <code>R \u00d7 T<\/code><\/strong> (Little\u2019s Law).<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>CPU-bound<\/strong> \u2192 thread pool \u2248 <strong>#vCPUs<\/strong> of the machine (oversubscribing just burns context switches). Consider batching records to use caches well.<\/li>\n\n\n\n<li><strong>I\/O-bound<\/strong> \u2192 pool \u2248 <code>R \u00d7 T \u00d7 safety<\/code> (safety = 1.5\u20133\u00d7), with <strong>per-partition or per-key caps<\/strong> to preserve order.<\/li>\n<\/ul>\n\n\n\n<p><strong>Example (I\/O-bound):<\/strong><br>R = 400 rec\/s, T = 50 ms (0.05s) \u2192 <code>R\u00d7T = 20<\/code> concurrent.<br>With safety \u00d72 \u2192 ~40 async slots\/threads total. If the instance has 8 partitions, cap \u2248 5 in-flight per partition (8\u00d75=40).<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<h2 class=\"wp-block-heading\">Commit strategy (don\u2019t lose or duplicate)<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>enable.auto.commit=false<\/code><\/li>\n\n\n\n<li>Track <strong>highest contiguous processed offset<\/strong> per partition.<\/li>\n\n\n\n<li>Only <strong>commit<\/strong> when all records up to that offset are completed.<\/li>\n\n\n\n<li>In <code>onPartitionsRevoked<\/code>, <strong>finish in-flight work for those partitions<\/strong> (or fail fast to DLQ) and <strong>commit<\/strong>; otherwise you\u2019ll reprocess on reassignment.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<h2 class=\"wp-block-heading\">Backpressure &amp; stability<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Keep the poll loop responsive; don\u2019t block it on I\/O\/CPU.<\/li>\n\n\n\n<li>Use <strong>bounded<\/strong> executor queues. If full, <strong>pause<\/strong> the partition(s) (<code>consumer.pause(tps)<\/code>) and <strong>resume<\/strong> when drained.<\/li>\n\n\n\n<li>Tune:\n<ul class=\"wp-block-list\">\n<li><code>max.poll.records<\/code> (e.g., 100\u20131000)<\/li>\n\n\n\n<li><code>max.poll.interval.ms<\/code> high enough for bursts (e.g., 10\u201330 min)<\/li>\n\n\n\n<li><code>session.timeout.ms<\/code> \/ <code>heartbeat.interval.ms<\/code> for network realities<\/li>\n\n\n\n<li><code>fetch.max.bytes<\/code> \/ <code>max.partition.fetch.bytes<\/code> to avoid giant polls<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Prefer <strong>cooperative-sticky<\/strong> assignor. Enable <strong>static membership<\/strong> to avoid rebalances on restarts.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<h2 class=\"wp-block-heading\">Partition count: choosing, best practices, and limits<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Choosing a number<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Estimate peak throughput: msgs\/sec \u00d7 avg size \u2192 MB\/s.<\/li>\n\n\n\n<li>Target <strong>per-partition<\/strong> throughput (conservative baseline): <strong>1\u20133 MB\/s<\/strong> (varies by infra).<\/li>\n\n\n\n<li>Partitions \u2248 <code>(needed MB\/s) \/ (target MB\/s per partition)<\/code> \u2192 round up.<\/li>\n\n\n\n<li>Add <strong>1.5\u20132\u00d7 headroom<\/strong> for spikes and rebalances.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Parallelism ceiling<\/strong> is <strong>#partitions<\/strong>. Don\u2019t run more active consumers than partitions.<\/li>\n\n\n\n<li><strong>Pre-create<\/strong> partitions with growth headroom (increasing later is possible but usually triggers rebalances and requires producer\/consumer awareness).<\/li>\n\n\n\n<li><strong>Key by entity<\/strong> to distribute load uniformly and preserve per-key order (avoid hot keys).<\/li>\n\n\n\n<li><strong>Monitor skew<\/strong>: per-partition lag, bytes in\/out, processing latency. Re-key or add partitions if one partition is hot.<\/li>\n\n\n\n<li><strong>Keep counts reasonable<\/strong>: very high partition counts increase broker memory\/FD usage, recovery time, metadata size, and rebalance duration.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Limitations and trade-offs<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Too few partitions<\/strong> \u2192 can\u2019t scale reads; hotspots.<\/li>\n\n\n\n<li><strong>Too many partitions<\/strong> \u2192 longer leader elections, slower startup, large metadata, longer catch-up after failures, more open files on brokers.<\/li>\n\n\n\n<li><strong>One partition \u21d2 one active consumer in a group<\/strong>: extra consumers sit idle.<\/li>\n\n\n\n<li><strong>Ordering vs. throughput<\/strong>: strict ordering reduces concurrency (by partition or by key).<\/li>\n\n\n\n<li><strong>Exactly-once<\/strong> is only native <strong>within Kafka<\/strong> (transactions). When writing to external DBs, use idempotent upserts\/outbox patterns for \u201ceffectively once.\u201d<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<h2 class=\"wp-block-heading\">Configuration checklist (consumer-side)<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>group.id=your-app-group<\/code><\/li>\n\n\n\n<li><code>enable.auto.commit=false<\/code><\/li>\n\n\n\n<li><code>partition.assignment.strategy=org.apache.kafka.clients.consumer.CooperativeStickyAssignor<\/code><\/li>\n\n\n\n<li><code>max.poll.records=100..1000<\/code> (tune)<\/li>\n\n\n\n<li><code>max.poll.interval.ms=600000..1800000<\/code> (10\u201330 min if you have bursts)<\/li>\n\n\n\n<li><code>session.timeout.ms=10000..45000<\/code>, <code>heartbeat.interval.ms\u2248(session\/3)<\/code><\/li>\n\n\n\n<li><code>fetch.max.bytes<\/code>, <code>max.partition.fetch.bytes<\/code> sized to your records<\/li>\n\n\n\n<li><strong>Static membership<\/strong> (Java): set <code>group.instance.id<\/code> to a stable value per instance<\/li>\n\n\n\n<li>For backpressure: use <code>pause()\/resume()<\/code> and bounded queues<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<h2 class=\"wp-block-heading\">Two canonical designs (pick one and stick to it)<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">A) One consumer instance per thread (strict order, simple)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>N threads \u2192 N KafkaConsumer instances \u2192 Kafka assigns partitions to each.<\/li>\n\n\n\n<li>Each instance processes assigned partitions <strong>serially<\/strong> or via <strong>per-partition single-thread executors<\/strong>.<\/li>\n\n\n\n<li>Easiest way to keep ordering and keep rebalances straightforward.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">B) One consumer thread + worker pool (I\/O heavy)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Single KafkaConsumer instance on a dedicated poller thread.<\/li>\n\n\n\n<li>Poller dispatches to a <strong>bounded<\/strong> worker pool.<\/li>\n\n\n\n<li>Maintain <strong>per-partition or per-key serialization<\/strong> layer to preserve required order.<\/li>\n\n\n\n<li>Commit only after workers complete up to the commit barrier.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<h2 class=\"wp-block-heading\">Worked sizing example<\/h2>\n\n\n\n<p><strong>Goal:<\/strong> ~10 MB\/s input. Avg record size 5 KB \u2192 ~2,000 records\/sec total.<br>Plan for 2 consumer processes, expect fairly even split \u2192 ~1,000 rec\/s per process.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Target 2 MB\/s per partition \u2192 need ~5 partitions. Add headroom \u00d72 \u2192 <strong>10 partitions<\/strong>.<\/li>\n\n\n\n<li>With 2 processes, expect ~5 partitions per process.<\/li>\n\n\n\n<li>If workload is I\/O-bound with T \u2248 40 ms: <code>R\u00d7T = 1000\u00d70.04 = 40<\/code> concurrent.<\/li>\n\n\n\n<li>Pool \u2248 40\u201380 (safety 1.5\u20132\u00d7). Cap <strong>per partition<\/strong> in-flight at 8\u201312.<\/li>\n\n\n\n<li>Set <code>max.poll.records=500<\/code>, cooperative-sticky assignor, static membership, bounded queues, pause\/resume.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<h2 class=\"wp-block-heading\">Do\u2019s and Don\u2019ts (fast checklist)<\/h2>\n\n\n\n<p><strong>Do<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use <strong>CooperativeStickyAssignor<\/strong> + <strong>static membership<\/strong>.<\/li>\n\n\n\n<li>Keep the poll loop responsive; <strong>commit manually<\/strong> after processing.<\/li>\n\n\n\n<li><strong>Bound queues<\/strong> and use <strong>pause\/resume<\/strong> for backpressure.<\/li>\n\n\n\n<li>Use <strong>per-partition<\/strong> or <strong>per-key<\/strong> serialization to preserve order.<\/li>\n\n\n\n<li>Monitor <strong>lag per partition<\/strong>, <strong>processing latency<\/strong>, <strong>rebalance events<\/strong>.<\/li>\n<\/ul>\n\n\n\n<p><strong>Don\u2019t<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Share a KafkaConsumer instance across threads calling <code>poll()<\/code>\/<code>commit()<\/code>.<\/li>\n\n\n\n<li>Over-provision consumers beyond the number of partitions (they\u2019ll idle).<\/li>\n\n\n\n<li>Commit offsets <strong>before<\/strong> the corresponding work is finished.<\/li>\n\n\n\n<li>Let long work starve heartbeats (<code>max.poll.interval.ms<\/code> timeouts \u2192 rebalances).<\/li>\n\n\n\n<li>Jump to hundreds of partitions without a real need; it raises broker and ops overhead.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<h2 class=\"wp-block-heading\">TL;DR mental model<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Group<\/strong> = the team.<\/li>\n\n\n\n<li><strong>Process (worker)<\/strong> = a team member.<\/li>\n\n\n\n<li><strong>KafkaConsumer instance<\/strong> = the <em>identity<\/em> that Kafka assigns partitions to.<\/li>\n\n\n\n<li><strong>Thread<\/strong> = how the member does the work.<\/li>\n\n\n\n<li><strong>Partitions<\/strong> = the parallel lanes. Max lane count = max read parallelism.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Here\u2019s your end-to-end, no-gaps guide. I\u2019ll keep the vocabulary crisp and the mental model consistent. The layers \u201cKafkaConsumer is NOT thread-safe\u201d \u2014 what this actually means Unsafe patterns (don\u2019t do&#8230; <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"_joinchat":[],"footnotes":""},"categories":[2],"tags":[],"class_list":["post-52306","post","type-post","status-publish","format-standard","hentry","category-uncategorised"],"_links":{"self":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/52306","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/comments?post=52306"}],"version-history":[{"count":5,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/52306\/revisions"}],"predecessor-version":[{"id":59563,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/52306\/revisions\/59563"}],"wp:attachment":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/media?parent=52306"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/categories?post=52306"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/tags?post=52306"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}