Top 50 Kafka Interview Questions & Answer

Kafka Interview questions and answers

Table of Contents

1) What is Apache Kafka?

Apache Kafka is a publish-subscribe open source message broker application. This messaging application was coded in “Scala”. Basically, this project was started by the Apache software. Kafka’s design pattern is mainly based on the transactional logs design.

2) Enlist the several components in Kafka.

The most important elements of Kafka are:
Topic –
Kafka Topic is the bunch or a collection of messages.

Producer –
In Kafka, Producers issue communications as well as publishes messages to a Kafka topic.

Consumer –
Kafka Consumers subscribes to a topic(s) and also reads and processes messages from the topic(s).

Brokers –
While it comes to manage storage of messages in the topic(s) we use Kafka Brokers.

3) Explain the role of the offset.

There is a sequential ID number given to the messages in the partitions what we call, an offset. So, to identify each message in the partition uniquely, we use these offsets.

4) What is a Consumer Group?

The concept of Consumer Groups is exclusive to Apache Kafka. Basically, every Kafka consumer group consists of one or more consumers that jointly consume a set of subscribed topics.

5) What is the role of the ZooKeeper in Kafka?

Apache Kafka is a distributed system is built to use Zookeeper. Although, Zookeeper’s main role here is to build coordination between different nodes in a cluster. However, we also use Zookeeper to recover from previously committed offset if any node fails because it works as periodically commit offset.

6) Is it possible to use Kafka without ZooKeeper?

It is impossible to bypass Zookeeper and connect directly to the Kafka server, so the answer is no. If somehow, ZooKeeper is down, then it is impossible to service any client request.

7) What do you know about Partition in Kafka?

In every Kafka broker, there are few partitions available. And, here each partition in Kafka can be either a leader or a replica of a topic.

8) Why is Kafka technology significant to use?

There are some advantages of Kafka, which makes it significant to use:

  • High-throughput
  • We do not need any large hardware in Kafka, because it is capable of handling high-velocity and high-volume data. Moreover, it can also support message throughput of thousands of messages per second.
  • Low Latency
  • Kafka can easily handle these messages with the very low latency of the range of milliseconds, demanded by most of the new use cases.
  • Fault-Tolerant
  • Kafka is resistant to node/machine failure within a cluster.
  • Durability
  • As Kafka supports messages replication, so, messages are never lost. It is one of the reasons behind durability.
  • Scalability
  • Kafka can be scaled-out, without incurring any downtime on the fly by adding additional nodes.

9) What are main APIs of Kafka?

Apache Kafka has 4 main APIs:

  • Producer API
  • Consumer API
  • Streams API
  • Connector API

10) What are consumers or users?

Mainly, Kafka Consumer subscribes to a topic(s), and also reads and processes messages from the topic(s). Moreover, with a consumer group name, Consumers label themselves.

In other words, within each subscribing consumer group, each record published to a topic is delivered to one consumer instance. Make sure it is possible that Consumer instances can be in separate processes or on separate machines.

11) Explain the concept of Leader and Follower.

In every partition of Kafka, there is one server which acts as the Leader, and none or more servers plays the role as a Followers.

12) What ensures load balancing of the server in Kafka?

As the main role of the Leader is to perform the task of all read and write requests for the partition, whereas Followers passively replicate the leader.

Hence, at the time of Leader failing, one of the Followers takeover the role of the Leader. Basically, this entire process ensures load balancing of the servers.

13) What roles do Replicas and the ISR play?

Basically, a list of nodes that replicate the log is Replicas. Especially, for a particular partition. However, they are irrespective of whether they play the role of the Leader.

In addition, ISR refers to In-Sync Replicas. On defining ISR, it is a set of message replicas that are synced to the leaders.

14) Why are Replications critical in Kafka?

Because of Replication, we can be sure that published messages are not lost and can be consumed in the event of any machine error, program error or frequent software upgrades.

15) If a Replica stays out of the ISR for a long time, what does it signify?

Simply, it implies that the Follower cannot fetch data as fast as data accumulated by the Leader.

16) Can we use Apache Kafka without ZooKeeper? / Is it possible to use Kafka without ZooKeeper?

It is impossible to sideline Zookeeper and connect directly to the Kafka server. So, we cannot use Apache Kafka without ZooKeeper. If ZooKeeper is down, we cannot serve any client request in Kafka.

17) What is the role of offset in Apache Kafka?

Offset is a sequential ID number or a unique id assigned to the messages in the partitions. Offsets are used to identify each message in the partition uniquely with the id available within the partition.

18) What are the key benefits of Apache Kafka over the other traditional techniques?

Following is a list of key benefits of Apache Kafka above other traditional messaging techniques:

  • Kafka is Fast: Kafka is extremely fast because a single Kafka broker can serve thousands of clients by handling megabytes of reads and writes per second.
  • Kafka is Scalable: In Kafka, we can partition data and streamline over a cluster of machines to enable larger data.
  • Kafka is Durable: In Kafka, messages are persistent and are replicated within the cluster to prevent data loss. That’s why Kafka is durable.
  • Kafka is Distributed by Design: Kafka provides fault tolerance features, and its distributed design also guarantees durability.

19) What do you understand by the terms leader and follower in the Kafka environment?

The terms leader and follower are used in the Apache Kafka environment to maintain the overall system and ensure the load balancing on the servers. Following is a list of some important features of leader and follower in Kafka:

  • For every partition in the Kafka environment, one server plays the role of leader, and the remaining servers act as followers.
  • The leader level is responsible for executing the all data read and write commands, and the rest of the followers have to replicate the process.
  • Suppose any time any fault occurs and the leader is not able to function appropriately. In that case, one of the followers takes the place and responsibility of the leaders and makes the system stable and helps in the server’s load balancing.

20) Why is Kafka technology significant to use? / What are some key advantages of using Kafka?

Following are some key advantages of Kafka, which makes it significant to use:

  • Minimum Input High-throughput: Apache Kafka doesn’t require any large hardware to handle a huge amount of data. It can handle high-velocity and high-volume data by itself and support a message throughput of thousands of messages per second.
  • Fault-Tolerant: Kafka is fault-tolerant, and it is resistant to any node or machine failure within a cluster.
  • Scalability: Kafka is fully scalable. It can be scaled-out, without facing any downtime in its execution by adding some additional nodes.
  • Low Latency: Low latency is one of the biggest advantages of Kafka, and it can easily handle many messages with the very low latency of milliseconds demanded by most new use cases.
  • Durability: Kafka is a great example of durability. It supports messages replication to ensure that any messages are never lost, which is why its durability.

21) What would be if a replica stays out of the ISR for a very long time?

If a replica stays out of the ISR for a very long time, or if a replica is not in sync with the ISR, then it means that the follower server cannot receive and execute data as fast as possible the leader is doing. So, it specifies that the follower is not able to come up with the leader activities.

22) What is the role of Kafka producer API?

The Kafka procedure API does the producer functionality through one API call to the client. Especially, the Kafka producer API combines the efforts of Kafka.producer.SyncProducer and the Kafka.producer.async.Async Producer.

23) What is the maximum size of a message that Kafka can receive?

By default, the maximum size of a Kafka message is 1MB (megabyte), but we can modify it accordingly. The broker settings facilitate us to modify the size.

24) What do you understand by geo-replication in Kafka?

In Kafka, geo-replication is a feature that facilitates you to copy messages form one cluster to many other data centers or cloud regions. Using geo-replication, you can replicate all of the files and store them throughout the globe if required. We can accomplish geo-replication by using Kafka’s MirrorMaker Tool. By using the geo-replication technique, we can ensure data backup without any failure.

25) What is the purpose of the retention period in the Kafka cluster?

Within the Kafka cluster, the retention period is used to retain all the published records without checking whether they have been consumed or not. Using a configuration setting for the retention period, we can easily discard the records. The main purpose of discarding the records from the Kafka cluster is to free up some space.

26) When does the broker leave the ISR?

ISR is a set of message replicas that are completely synced up with the leaders. It means ISR contains all the committed messages, and ISR always includes all the replicas until it gets a real failure. An ISR can drop a replica if it deviates from the leader.

27) What do you understand by the term “Log Anatomy” in Apache Kafka?

Log Anatomy is a way to view a partition. We view the log as the partitions, and a data source writes messages to the log. It facilitates that one or more consumers read that data from the log at any time they want. It specifies that the data source can write a log, and the log is being read by consumers at different offsets simultaneously.

28) What are the ways to tune Kafka for optimal performance?

There are mainly three ways to tune Kafka for optimal performance:

  • Tuning Kafka Producers
  • Kafka Brokers Tuning
  • Tuning Kafka Consumers

29) What are the use cases of Kafka monitoring?

Following are the use cases of Apache Kafka monitoring:

  • Apache Kafka monitoring can keep track of system resources consumption such as memory, CPU, and disk utilization over time.
  • Apache Kafka monitoring is used to monitor threads and JVM usage. It relies on the Java garbage collector to free up memory, ensuring that it frequently runs, thereby guaranteeing that the Kafka cluster is more active.
  • It can be used to determine which applications are causing excessive demand, and identifying performance bottlenecks might help rapidly solve performance issues.
  • It always checks the broker, controller, and replication statistics to modify the partitions and replicas status if required.

30) Explain how you can get exactly once messaging from Kafka during data production?

During data, production to get exactly once messaging from Kafka you have to follow two things avoiding

31) Explain how you can reduce churn in ISR? When does broker leave the ISR?

ISR is a set of message replicas that are completely synced up with the leaders, in other word ISR has all messages that are committed. ISR should always include all replicas until there is a real failure. A replica will be dropped out of ISR if it deviates from the leader.

32) Why replication is required in Kafka?

Replication of message in Kafka ensures that any published message does not lose and can be consumed in case of machine error, program error or more common software upgrades.

33) What does it indicate if replica stays out of ISR for a long time?

If a replica remains out of ISR for an extended time, it indicates that the follower is unable to fetch data as fast as data accumulated at the leader.

34) Mention what happens if the preferred replica is not in the ISR?

If the preferred replica is not in the ISR, the controller will fail to move leadership to the preferred replica.

35) What is a consumer group?

Consumer Groups are an Apache Kafka-exclusive notion. Essentially, each Kafka consumer group comprises one or more consumers who consume a collection of committed topics in unison.

36) When does the Producer encounter a Queue-Full Exception?

Typically, a Queue-Full Exception arises when the Producer sends messages at a rate that the Broker might not manage. Due to the Producer’s lack of blocking capabilities, users will need to add sufficient brokers to handle the additional demand cooperatively.

37) How does Kafka define the terms “leader” and “follower”?

Each partition in Kafka contains a single server acting as the Leader and 0 or more servers acting as Followers.

The Leader is responsible for all read and writes operations to the partition, while the Followers are responsible for passively replicating the leader.

38) Describe Kafka’s Partition

Each Kafka broker comes with a limited number of partitions. Additionally, with Kafka, each partition can serve as a leader or a clone of a subject.

39) What is the ZooKeeper’s function in Kafka?

Apache Kafka is a decentralized database that was designed with Zookeeper in mind. However, Zookeeper’s primary function is to provide coordination amongst the many nodes in the network, in this case. However, because Zookeeper acts as a regularly committed offset, we can restore from previously committed offsets if any node fails.

40) Can Kafka be used without ZooKeeper?

Because bypassing Zookeeper and connecting directly to the Kafka server is not feasible, the answer is no. If ZooKeeper is unavailable for whatever reason, it is unable to serve any client request.

41) What is a topic in Kafka?

A topic is a category or feed name to which records are published. Topics in Kafka are always multi-subscriber; that is, a topic can have zero, one, or many consumers that subscribe to the data written to it. For each topic, the Kafka cluster maintains a partitioned log.

42) What is Geo-Replication in Kafka?

Kafka MirrorMaker provides geo-replication support for your clusters. With MirrorMaker, messages are replicated across multiple datacenters or cloud regions. You can use this in active/passive scenarios for backup and recovery, or inactive/active scenarios to place data closer to your users, or support data locality requirements.

43) Mention What Is The Maximum Size Of The Message Does Kafka Server Can Receive?

The maximum size of the message that Kafka server can receive is 1000000 bytes.

44) What is the traditional method of message transfer?

The traditional method of message transfer includes two methods
Queuing: In a queuing, a pool of consumers may read a message from the server and each message goes to one of them
Publish-Subscribe: In this model, messages are broadcasted to all consumers Kafka caters single consumer abstraction that generalized both of the above- the consumer group

45) What Is The Benefits Of Apache Kafka Over The Traditional Technique?

Apache Kafka has following benefits above traditional messaging technique:

  • Fast: A single Kafka broker can serve thousands of clients by handling megabytes of reads and writes per second
  • Scalable: Data are partitioned and streamlined over a cluster of machines to enable larger data
  • Durable: Messages are persistent and is replicated within the cluster to prevent data loss
  • Distributed by Design: It provides fault tolerance guarantees and durability.

46) What does ISR stand in Kafka environment?

ISR stands for In sync replicas.

They are classified as a set of message replicas which are synched to be leaders.

47) How does The process of Assigning partitions to broker Work?

When a consumer wants to join a group, it sends a JoinGroup request to the group coordinator. The first consumer to join the group becomes the group leader. The leader receives a list of all consumers in the group from the group coordinator and is responsible for assigning a subset of partitions to each consumer. It uses an implementation of PartitionAssignor to decide which partitions should be handled by which consumer.

After deciding on the partition assignment, the consumer group leader sends the list of assignments to the Group Coordinator, which sends this information to all the consumers. Each consumer only sees his own assignment—the leader is the only client process that has the full list of consumers in the group and their assignments. This process repeats every time a rebalance happens.

48) Why replication is required in Kafka?

Replication of message in Kafka ensures that any published message does not lose and can be consumed in case of machine error, program error or more common software upgrades.

49) What does it indicate if replica stays out of ISR for a long time?

If a replica remains out of ISR for an extended time, it indicates that the follower is unable to fetch data as fast as data accumulated at the leader.

50) Explain how you can get exactly once messaging from Kafka during data production?

During data, production to get exactly once messaging from Kafka you have to follow two things avoiding duplicates during data consumption and avoiding duplication during data production. Here are the two ways to get exactly one semantics while data production:

Avail a single writer per partition, every time you get a network error checks the last message in that partition to see if your last write succeeded
In the message include a primary key (UUID or something) and de-duplicate on the consumer.

Related videos:

Rajesh Kumar
Follow me
Subscribe
Notify of
guest
1 Comment
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
name
name
1 year ago

It’s possible to use Kafka without Zookeeper, this is outdated. The controller now takes on tracking consumer offsets in its raft based database.

1
0
Would love your thoughts, please comment.x
()
x