Find the Best Cosmetic Hospitals

Explore trusted cosmetic hospitals and make a confident choice for your transformation.

โ€œInvest in yourself โ€” your confidence is always worth it.โ€

Explore Cosmetic Hospitals

Start your journey today โ€” compare options in one place.

Kafka: Consumers & Partition: A Complete Guide

Hereโ€™s your end-to-end, no-gaps guide. Iโ€™ll keep the vocabulary crisp and the mental model consistent.

The layers

  • Topic: named stream of records.
  • Partition: an ordered shard of a topic. Ordering is guaranteed within a partition.
  • Consumer group: a logical application reading a topic. As a group, each record is processed once.
  • Consumer process (a.k.a. worker / pod / container / JVM): a running OS process that joins a consumer group.
  • KafkaConsumer instance: the client object in your code (e.g., new KafkaConsumer<>(props) in Java) that talks to brokers, polls, tracks offsets. Kafka assigns partitions to instances.
  • Thread: execution lane inside a process. You may run zero, one, or many KafkaConsumer instances per processโ€”typically one per thread.

Rule of exclusivity: At any moment, one partition is owned by exactly one KafkaConsumer instance in a group. No two instances in the same group read the same partition concurrently.

Architecture: Topic โ†’ Partitions โ†’ Consumer Group โ†’ Processes โ†’ Threads โ†’ KafkaConsumer instances

flowchart TB
  %% ===== Topic & Partitions =====
  subgraph T["Topic: orders (6 partitions)"]
    direction LR
    P0(["Partition P0n(offsets: 0..โˆž)"])
    P1(["Partition P1n(offsets: 0..โˆž)"])
    P2(["Partition P2n(offsets: 0..โˆž)"])
    P3(["Partition P3n(offsets: 0..โˆž)"])
    P4(["Partition P4n(offsets: 0..โˆž)"])
    P5(["Partition P5n(offsets: 0..โˆž)"])
  end

  %% ===== Consumer Group =====
  subgraph G["Consumer Group: orders-service"]
    direction TB

    %% ---- Process A (Worker/Pod/Container) ----
    subgraph A["Consumer Process A (Worker A / Pod)"]
      direction TB
      ATh1["Thread 1nKafkaConsumer #1"]
      ATh2["Thread 2nKafkaConsumer #2"]
    end

    %% ---- Process B (Worker/Pod/Container) ----
    subgraph B["Consumer Process B (Worker B / Pod)"]
      direction TB
      BTh1["Thread 1nKafkaConsumer #3"]
    end
  end

  %% ===== Active assignment (example) =====
  P0 -- "assigned to" --> ATh1
  P3 -- "assigned to" --> ATh1

  P1 -- "assigned to" --> ATh2
  P4 -- "assigned to" --> ATh2

  P2 -- "assigned to" --> BTh1
  P5 -- "assigned to" --> BTh1

  %% ===== Notes / Rules =====
  Rules["Rules:
  โ€ข One partition โ†’ at most one active KafkaConsumer instance in the group.
  โ€ข A KafkaConsumer instance may own 1..N partitions.
  โ€ข KafkaConsumer is NOT thread-safe โ†’ one thread owns poll/commit for that instance.
  โ€ข Ordering is guaranteed per partition (by offset), not across partitions."]
  classDef note fill:#fff,stroke:#bbb,stroke-dasharray: 3 3,color:#333;
  Rules:::note
  G --- Rules

Partition assignment scenarios (how partitions are spread across instances)

flowchart LR
  %% ===== Scenario A =====
  subgraph S1["Scenario A: 7 partitions (P0..P6), 3 consumer instances (C1..C3)"]
    direction TB
    C1A["C1 โžœ P0, P3, P6"]
    C2A["C2 โžœ P1, P4"]
    C3A["C3 โžœ P2, P5"]
  end

  %% ===== Scenario B =====
  subgraph S2["Scenario B: Add C4 (cooperative-sticky rebalance)"]
    direction TB
    C1B["C1 โžœ P0, P6"]
    C2B["C2 โžœ P1, P4"]
    C3B["C3 โžœ P2"]
    C4B["C4 โžœ P3, P5"]
  end

  %% ===== Scenario C =====
  subgraph S3["Scenario C: 4 partitions (P0..P3), 6 consumer instances (C1..C6)"]
    direction TB
    C1C["C1 โžœ P0"]
    C2C["C2 โžœ P1"]
    C3C["C3 โžœ P2"]
    C4C["C4 โžœ P3"]
    C5C["C5 โžœ (idle)"]
    C6C["C6 โžœ (idle)"]
  end

  classDef idle fill:#fafafa,stroke:#ccc,color:#999;
  C5C:::idle; C6C:::idle

Best practices & limitations (mind map)

mindmap
  root((Partitioning and assignment))
    Assignment strategies
      Range
        Contiguous ranges per topic
        Can skew when topics or partition counts differ
      Round robin
        Even spread over time
        Less stable across rebalances
      Sticky
        Keeps prior mapping to reduce churn
      Cooperative sticky
        Incremental rebalance with fewer pauses
        Recommended default in modern clients
    Best practices
      Plan partition count
        Estimate peak throughput in MB per second
        Target around 1 to 3 MB per second per partition
        Add 1.5 to 2x headroom
      Keying
        Key by entity to preserve per key order
        Avoid hot keys and ensure good spread
        Use a custom consistent hash partitioner if needed
      Stability
        Use static membership with group.instance.id
        Use cooperative sticky assignor
      Ordering safe processing
        Per partition single thread executors
        Or per key serial queues on top of a worker pool
        Track contiguous offsets and commit after processing
      Backpressure
        Bounded queues with pause and resume when saturated
        Keep poll loop fast and do IO on worker threads
      Consumer safety
        KafkaConsumer is not thread safe
        One consumer instance per poll and commit thread
        Disable auto commit and commit on completion
    Limitations and trade offs
      No global order across partitions
      Max parallelism per group equals number of partitions
      Hot partitions bottleneck throughput
      Too many partitions increase metadata and rebalance time
      Changing partition count remaps hash of key modulo N

How messages land in partitions (with/without keys)

flowchart TB
  subgraph Prod["Producer"]
    direction TB
    K1["send(key=A, value=m1)"]
    K2["send(key=B, value=m2)"]
    K3["send(key=A, value=m3)"]
    K4["send(key=C, value=m4)"]
  end

  subgraph Partitions["Topic: orders (3 partitions)"]
    direction LR
    X0["P0noffsets 0..โˆž"]
    X1["P1noffsets 0..โˆž"]
    X2["P2noffsets 0..โˆž"]
  end

  note1["Default partitioner:npartition = hash(key) % numPartitionsn(no key โ‡’ sticky batching across partitions)"]
  classDef note fill:#fff,stroke:#bbb,stroke-dasharray: 3 3,color:#333;
  note1:::note

  Prod --- note1
  K1 --> X1
  K2 --> X0
  K3 --> X1
  K4 --> X2

  subgraph Order["Per-partition order"]
    direction LR
    X1o["P1 offsets:n0: m1(A)n1: m3(A)"]
    X0o["P0 offsets:n0: m2(B)"]
    X2o["P2 offsets:n0: m4(C)"]
  end

  X1 --> X1o
  X0 --> X0o
  X2 --> X2o

What is a partition?

  • Core idea: A partition is one append-only log that belongs to a topic.
  • Physical & logical:
    • Logical shard for routing and parallelism (clients see โ€œP0, P1, โ€ฆโ€).
    • Physical files on a broker (leader) with replicas on other brokers for HA.
  • Offsets: Every record in a partition gets a strictly increasing offset (0,1,2,โ€ฆ).
    Ordering is guaranteed within that partition (by offset). There is no global order across partitions.

How messages are placed into partitions

Producers decide the target partition using a partitioner:

  1. If you set a key (producer.send(topic, key, value)):
    โ†’ Kafka uses a hash of the key: partition = hash(key) % numPartitions.
    • All messages with the same key go to the same partition โ†’ preserves per-key order.
    • Watch out for hot keys (one key getting huge traffic) โ†’ one partition becomes a bottleneck.
  2. If you donโ€™t set a key (key = null):
    • The default Java producer uses a sticky partitioner (since Kafka 2.4): it sends a batch to one partition to maximize throughput, then switches.
    • Older behavior was round-robin. Either way, distribution is roughly even over time but not ordered across partitions.

You can also plug a custom partitioner if you need special routing (e.g., consistent hashing).

Example: 7 messages into a topic with 3 partitions

Assume topic orders has partitions: P0, P1, P2.

A) No keys (illustrative round-robin for clarity)

Messages: m1, m2, m3, m4, m5, m6, m7

A simple spread might look like:

P0: m1 (offset 0), m4 (1), m7 (2)
P1: m2 (offset 0), m5 (1)
P2: m3 (offset 0), m6 (1)
Code language: HTTP (http)

Notes:

  • Each partition has its own offset sequence starting at 0.
  • Consumers reading P0 will always see m1 โ†’ m4 โ†’ m7.
    But thereโ€™s no defined order between (say) m2 on P1 and m3 on P2.

B) With keys (so per-key order holds)

Suppose keys: A,B,A,C,B,A,G for messages m1..m7.
(Using a simple illustration of hash(key)%3, say Aโ†’P1, Bโ†’P0, Cโ†’P2, Gโ†’P2.)

P0: m2(B, offset 0), m5(B, 1)
P1: m1(A, offset 0), m3(A, 1), m6(A, 2)
P2: m4(C, offset 0), m7(G, 1)
Code language: HTTP (http)

Notes:

  • All A events land on P1, so Aโ€™s order is preserved (m1 โ†’ m3 โ†’ m6).
  • All B events land on P0, etc.
  • If you later change the partition count, hash(key) % numPartitions changes โ†’ future records for the same key may map to a different partition (ordering across time can โ€œjumpโ€ partitions). Plan for that (see best practices).

Why partitions matter (consumers)

  • In a consumer group, one partition โ†’ one active consumer instance at a time.
    Max parallelism per topic per group = number of partitions.
  • A single KafkaConsumer instance can own multiple partitions, and it will deliver records from each partition in the correct per-partition order.

Best practices for partitioning

  • Choose keys deliberately
    • If you need per-entity order (e.g., per user/order/account), use that entity ID as the key.
    • Ensure keys are well-distributed to avoid hot partitions.
  • Plan partition count with headroom
    • Start with enough partitions for peak throughput + 1.5โ€“2ร— headroom.
    • Rule of thumb capacity: ~1โ€“3 MB/s per partition (varies by infra).
  • Avoid frequent partition increases if strict per-key order must hold over long time frames.
    • Adding partitions can remap keys โ†’ future events for a key may go to a new partition.
    • If you must scale up, consider creating a new topic with more partitions and migrating producers/consumers in a controlled way (or implement a custom consistent-hash partitioner).
  • Monitor skew
    • Watch per-partition bytes in/out, lag, and message rate.
    • If a few partitions are hot, re-evaluate your key choice or increase partitions (with the caveat above).

Limitations & gotchas

  • No global ordering: Only within a partition. Donโ€™t build logic that needs cross-partition order.
  • Hot keys: One key can bottleneck an entire partition (and thus your groupโ€™s parallelism).
  • Too many partitions: Faster parallelism, but increases broker memory/files, metadata load, rebalance time, recovery time.
  • Changing partition count: Remaps keys (default partitioner) โ†’ can disrupt long-lived per-key ordering assumptions.

TL;DR

  • A partition is the storage + routing unit of a topic: an append-only log with its own offsets.
  • Producers choose the partition (by key hash or sticky/round-robin when keyless).
  • Consumers read partitions independently; order only exists within a partition.
  • Pick keys and partition counts intentionally to balance order, throughput, and operational overhead.

โ€œKafkaConsumer is NOT thread-safeโ€ โ€” what this actually means

  • You must not call poll(), commitSync(), seek(), etc. on the same KafkaConsumer instance from multiple threads.
  • Safe patterns:
    1. One consumer instance on one thread (single-threaded polling).
    2. One consumer instance per thread (multi-consumer, multi-thread).
    3. One polling thread + worker pool: the poller thread owns the consumer and hands work to other threads; only the poller commits.

Unsafe patterns (donโ€™t do these): two threads calling poll() on the same instance; one thread polling while another commits on the same instance.


How partitions get assigned (and re-assigned)

  • When a consumer group starts (or changes), the group coordinator performs an assignment: it maps topic partitions โ†’ consumer instances.
  • Triggers for rebalance: a consumer joins/leaves/crashes, subscriptions change, partitions change, a consumer stops heartbeating (e.g., blocked poll loop).
  • Lifecycle callbacks (Java): onPartitionsRevoked โ†’ you must finish/flush and commit work for those partitions โ†’ then onPartitionsAssigned.

Assignment strategies (high-level)

  • Range: assigns contiguous ranges per topicโ€”can skew if topics/partitions per topic differ.
  • RoundRobin: cycles through partitions โ†’ consumersโ€”more even across topics.
  • Sticky: tries to keep prior assignments stable across rebalances (reduces churn).
  • CooperativeSticky: best default todayโ€”incremental rebalances; consumers give up partitions gradually, minimizing stop-the-world pauses.

Best practice: Use CooperativeSticky + static membership (stable group.instance.id) so transient restarts donโ€™t trigger full rebalances.


What an assignment can look like (concrete scenarios)

Example A: 7 partitions, 3 consumer instances (C1, C2, C3)
A balanced round-robin/sticky outcome might be:

  • C1: P0, P3, P6
  • C2: P1, P4
  • C3: P2, P5

Example B: add a 4th consumer (C4) โ†’ incremental rebalance

  • Previous owner of a partition will relinquish one:
    • C1: P0, P6
    • C2: P1, P4
    • C3: P2
    • C4: P3, P5

Example C: more consumers than partitions

  • 4 partitions, 6 consumers โ†’ 2 consumers sit idle (assigned nothing). Adding instances beyond partition count does not increase read parallelism.

Concurrency inside a worker (safe blueprints)

Pattern A โ€” Per-partition serial executors (simple & safe)

  • Each assigned partition is processed by its own single-thread lane.
  • Preserves partition ordering trivially.
  • Good when you key by entity (e.g., orderId) so ordering matters.

Pattern B โ€” Per-key serialization over a shared pool (higher throughput)

  • Maintain tiny FIFO queues per key (or per partition+key) and execute them on a larger thread pool.
  • Each key is processed serially; many keys run in parallel.
  • Add LRU capping to avoid unbounded key maps.
  • Good for I/O-bound workloads with hot partitions/keys.

Donโ€™t spray records from the same partition to an unkeyed poolโ€”this breaks ordering.


CPU-bound vs I/O-bound sizing (practical math)

Let:

  • R = records/sec per consumer instance
  • T = average processing time per record (seconds)

Required concurrency โ‰ˆ R ร— T (Littleโ€™s Law).

  • CPU-bound โ†’ thread pool โ‰ˆ #vCPUs of the machine (oversubscribing just burns context switches). Consider batching records to use caches well.
  • I/O-bound โ†’ pool โ‰ˆ R ร— T ร— safety (safety = 1.5โ€“3ร—), with per-partition or per-key caps to preserve order.

Example (I/O-bound):
R = 400 rec/s, T = 50 ms (0.05s) โ†’ Rร—T = 20 concurrent.
With safety ร—2 โ†’ ~40 async slots/threads total. If the instance has 8 partitions, cap โ‰ˆ 5 in-flight per partition (8ร—5=40).


Commit strategy (donโ€™t lose or duplicate)

  • enable.auto.commit=false
  • Track highest contiguous processed offset per partition.
  • Only commit when all records up to that offset are completed.
  • In onPartitionsRevoked, finish in-flight work for those partitions (or fail fast to DLQ) and commit; otherwise youโ€™ll reprocess on reassignment.

Backpressure & stability

  • Keep the poll loop responsive; donโ€™t block it on I/O/CPU.
  • Use bounded executor queues. If full, pause the partition(s) (consumer.pause(tps)) and resume when drained.
  • Tune:
    • max.poll.records (e.g., 100โ€“1000)
    • max.poll.interval.ms high enough for bursts (e.g., 10โ€“30 min)
    • session.timeout.ms / heartbeat.interval.ms for network realities
    • fetch.max.bytes / max.partition.fetch.bytes to avoid giant polls
  • Prefer cooperative-sticky assignor. Enable static membership to avoid rebalances on restarts.

Partition count: choosing, best practices, and limits

Choosing a number

  1. Estimate peak throughput: msgs/sec ร— avg size โ†’ MB/s.
  2. Target per-partition throughput (conservative baseline): 1โ€“3 MB/s (varies by infra).
  3. Partitions โ‰ˆ (needed MB/s) / (target MB/s per partition) โ†’ round up.
  4. Add 1.5โ€“2ร— headroom for spikes and rebalances.

Best practices

  • Parallelism ceiling is #partitions. Donโ€™t run more active consumers than partitions.
  • Pre-create partitions with growth headroom (increasing later is possible but usually triggers rebalances and requires producer/consumer awareness).
  • Key by entity to distribute load uniformly and preserve per-key order (avoid hot keys).
  • Monitor skew: per-partition lag, bytes in/out, processing latency. Re-key or add partitions if one partition is hot.
  • Keep counts reasonable: very high partition counts increase broker memory/FD usage, recovery time, metadata size, and rebalance duration.

Limitations and trade-offs

  • Too few partitions โ†’ canโ€™t scale reads; hotspots.
  • Too many partitions โ†’ longer leader elections, slower startup, large metadata, longer catch-up after failures, more open files on brokers.
  • One partition โ‡’ one active consumer in a group: extra consumers sit idle.
  • Ordering vs. throughput: strict ordering reduces concurrency (by partition or by key).
  • Exactly-once is only native within Kafka (transactions). When writing to external DBs, use idempotent upserts/outbox patterns for โ€œeffectively once.โ€

Configuration checklist (consumer-side)

  • group.id=your-app-group
  • enable.auto.commit=false
  • partition.assignment.strategy=org.apache.kafka.clients.consumer.CooperativeStickyAssignor
  • max.poll.records=100..1000 (tune)
  • max.poll.interval.ms=600000..1800000 (10โ€“30 min if you have bursts)
  • session.timeout.ms=10000..45000, heartbeat.interval.msโ‰ˆ(session/3)
  • fetch.max.bytes, max.partition.fetch.bytes sized to your records
  • Static membership (Java): set group.instance.id to a stable value per instance
  • For backpressure: use pause()/resume() and bounded queues

Two canonical designs (pick one and stick to it)

A) One consumer instance per thread (strict order, simple)

  • N threads โ†’ N KafkaConsumer instances โ†’ Kafka assigns partitions to each.
  • Each instance processes assigned partitions serially or via per-partition single-thread executors.
  • Easiest way to keep ordering and keep rebalances straightforward.

B) One consumer thread + worker pool (I/O heavy)

  • Single KafkaConsumer instance on a dedicated poller thread.
  • Poller dispatches to a bounded worker pool.
  • Maintain per-partition or per-key serialization layer to preserve required order.
  • Commit only after workers complete up to the commit barrier.

Worked sizing example

Goal: ~10 MB/s input. Avg record size 5 KB โ†’ ~2,000 records/sec total.
Plan for 2 consumer processes, expect fairly even split โ†’ ~1,000 rec/s per process.

  • Target 2 MB/s per partition โ†’ need ~5 partitions. Add headroom ร—2 โ†’ 10 partitions.
  • With 2 processes, expect ~5 partitions per process.
  • If workload is I/O-bound with T โ‰ˆ 40 ms: Rร—T = 1000ร—0.04 = 40 concurrent.
  • Pool โ‰ˆ 40โ€“80 (safety 1.5โ€“2ร—). Cap per partition in-flight at 8โ€“12.
  • Set max.poll.records=500, cooperative-sticky assignor, static membership, bounded queues, pause/resume.

Doโ€™s and Donโ€™ts (fast checklist)

Do

  • Use CooperativeStickyAssignor + static membership.
  • Keep the poll loop responsive; commit manually after processing.
  • Bound queues and use pause/resume for backpressure.
  • Use per-partition or per-key serialization to preserve order.
  • Monitor lag per partition, processing latency, rebalance events.

Donโ€™t

  • Share a KafkaConsumer instance across threads calling poll()/commit().
  • Over-provision consumers beyond the number of partitions (theyโ€™ll idle).
  • Commit offsets before the corresponding work is finished.
  • Let long work starve heartbeats (max.poll.interval.ms timeouts โ†’ rebalances).
  • Jump to hundreds of partitions without a real need; it raises broker and ops overhead.

TL;DR mental model

  • Group = the team.
  • Process (worker) = a team member.
  • KafkaConsumer instance = the identity that Kafka assigns partitions to.
  • Thread = how the member does the work.
  • Partitions = the parallel lanes. Max lane count = max read parallelism.

Find Trusted Cardiac Hospitals

Compare heart hospitals by city and services โ€” all in one place.

Explore Hospitals
Iโ€™m a DevOps/SRE/DevSecOps/Cloud Expert passionate about sharing knowledge and experiences. I have worked at <a href="https://www.cotocus.com/">Cotocus</a>. I share tech blog at <a href="https://www.devopsschool.com/">DevOps School</a>, travel stories at <a href="https://www.holidaylandmark.com/">Holiday Landmark</a>, stock market tips at <a href="https://www.stocksmantra.in/">Stocks Mantra</a>, health and fitness guidance at <a href="https://www.mymedicplus.com/">My Medic Plus</a>, product reviews at <a href="https://www.truereviewnow.com/">TrueReviewNow</a> , and SEO strategies at <a href="https://www.wizbrand.com/">Wizbrand.</a> Do you want to learn <a href="https://www.quantumuting.com/">Quantum Computing</a>? <strong>Please find my social handles as below;</strong> <a href="https://www.rajeshkumar.xyz/">Rajesh Kumar Personal Website</a> <a href="https://www.youtube.com/TheDevOpsSchool">Rajesh Kumar at YOUTUBE</a> <a href="https://www.instagram.com/rajeshkumarin">Rajesh Kumar at INSTAGRAM</a> <a href="https://x.com/RajeshKumarIn">Rajesh Kumar at X</a> <a href="https://www.facebook.com/RajeshKumarLog">Rajesh Kumar at FACEBOOK</a> <a href="https://www.linkedin.com/in/rajeshkumarin/">Rajesh Kumar at LINKEDIN</a> <a href="https://www.wizbrand.com/rajeshkumar">Rajesh Kumar at WIZBRAND</a> <a href="https://www.rajeshkumar.xyz/dailylogs">Rajesh Kumar DailyLogs</a>

Related Posts

Terraform Backend Tutorial

Terraform is a popular open-source infrastructure as code tool used to create and manage infrastructure resources. The state of the infrastructure resources managed by Terraform is stored…

Read More

Best Tools for Software Composition Analysis (SCA)

Hereโ€™s a clear and professional explanation of the three related concepts you asked about โ€” all of which are critical parts of secure software development, especially in…

Read More

Top 10 AI Code Review Tools in 2026: Features, Pros, Cons & Comparison

Introduction In 2026, AI code review tools have become essential for developers aiming to enhance code quality, streamline workflows, and accelerate software delivery. These tools leverage advanced…

Read More

Top 10 Expense Management Tools in 2026: Features, Pros, Cons & Comparison

Introduction Expense management tools are critical for businesses of all sizes in 2026 as they help streamline financial processes, improve budgeting, ensure compliance, and enhance financial visibility….

Read More

Top 10 Web Application Firewall (WAF) Tools in 2026: Features, Pros, Cons & Comparison

Introduction In the rapidly evolving landscape of cybersecurity, Web Application Firewalls (WAFs) have become a critical component in defending web applications from malicious attacks such as SQL…

Read More

Top 10 Endpoint Management Tools in 2026: Features, Pros, Cons & Comparison

Introduction In 2026, businesses of all sizes are increasingly reliant on a variety of devicesโ€”laptops, desktops, mobile devices, and other endpointsโ€”that connect to their networks. With the…

Read More
Subscribe
Notify of
guest
0 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x