Apache Kudu is a free and open source columnar storage system developed for the Apache Hadoop. It is an engine intended for structured data that supports low-latency random access millisecond-scale access to individual rows together with great analytical access patterns. It is a Big Data engine created make the connection between the widely spread Hadoop Distributed File System [HDFS] and HBase NoSQL Database. Kudu is storage for fast analytics on fast data—providing a combination of fast inserts and updates alongside efficient columnar scans to enable multiple real-time analytic workloads across a single storage layer.
The open source project to build Apache Kudu began as internal project at Cloudera. The first version Apache Kudu 1.0 was released 19 September 2016.
What are the main advantages of Apache Kudu?
- Enables real-time analytics on fast data
Apache Kudu merges the upsides of HBase and Parquet. It is as fast as HBase at ingesting data and almost as quick as Parquet when it comes to analytics queries. It supports multiple query types, allowing you to perform the following operations:
- Lookup for a certain value through its key.
- Lookup for a range of keys that have been sorted in a key-order.
- Carry out arbitrary queries across as many columns as needed
- Fully distributed and fault tolerant
Apache Kudu uses the RAFT consensus algorithm, as a result, it can be scaled up or down as required horizontally. In addition it comes with a support for update-in-place feature.
- Kudu Supports SQL if used with Spark or Impala.
Do you want to access data via SQL? Then, you’ll be happy to hear that Apache Kudu has tight integration with Apache Impala as well as Spark. As a result, you will be able to use these tools to insert, query, update and delete data from Kudu tablets by using their SQL syntax. Moreover, you can use JDBC or ODBC to connect existing or new applications no matter the language they have been written in, frameworks, and even business intelligence tools to your Kudu data, using Impala as the tootle to do this.
- Simplified architecture
Kudu fills the gap between HDFS and Apache HBase formerly solved with complex hybrid architectures, easing the burden on both architects and developers.
Common use cases o Apache Kudu
Kudu is designed to excel at use cases that require the combination of random reads/writes and the ability to do fast analytic scans—which previously required the creation of complex Lambda architectures. When combined with the broader Hadoop ecosystem, Kudu enables a variety of use cases, including:
- IoT and time series data
- Machine data analytics (network security, health, etc.)
- Online reporting