Teradata Partitioned Primary Index (PPI) 

What is the Concept behind Partitioning a Table?

  • Each Table in Teradata has a Primary Index unless it is a NoPI table.
  • The Primary Index is the mechanism that allows Teradata to physically distribute the rows of a table across the AMPs.
  • AMPs Sort their rows by the Row-ID, so the system can perform a lightning fast Binary Search since the rows are in Row-ID Order.
  • Partitioning merely tells the AMP to sort its tables’ rows by the Partition first, but then sort the rows by Row-ID within the partition.
  • Partitioning queries will involve all AMPs, but partitioned tables are designed to prevent FULL Table Scans.

Advantages and Disadvantages of Partitioning

Advantages

  • Increase query efficiency by avoiding full table scans
  • PPI’s are defined on a table in order to increase query efficiency by avoiding full table scans without the overhead and maintenance costs of secondary indexes.
  • Deleting large volumes of rows in entire partitions can be extremely fast.
  • For range based queries we can effectively remove SI and use PPI, thus saving overhead of SI subtable.
  • It can access a subset of a large table quickly.
  • Maximum partitions allowed by Teradata – 65,535

Disadvantages

  • Partitioned rows are2 or 8 bytes longer. Table uses more PERM space.
  • A PI access may be degraded if the partitioning column is not part of the PI
  • Joins to non-partitioned tables with the same PI may be degraded.
  • The PI can’t be defined as unique when the partitioning column is not part of the PI.

Creating a PPI Table with Simple Partitioning

  • A PPI table has the AMPs sort (order) the rows on the table by the Partition.
  • This allows for people to avoid Full Table Scans.

A Visual Display of Simple Partitioning

An SQL Example that explains Simple Partitioning

Creating a PPI Table with RANGE_N

  • The expression is evaluated and is mapped into one of a list of specified ranges.
  • Ranges are listed in increasing order and must not overlap with each other.
  • Result is the data row being placed into a partition associated with that range.

Examples:

Example 1 – Creating a PPI Table with RANGE_N Partitioning per Month

Above Example sort the rows on each AMP by order_date and create a different partition every month

A Visual of One Year of Data with Range_N per Month

Each AMP sorts their rows by Month (of Order_Date).

An SQL Example explaining Range_N Partitioning per Month

The above query wants to see orders in the first quarter, so each AMP reads 3 partitions.

Example 2Creating a PPI Table with RANGE_N Partitioning per Day

Above Example sort the rows on each AMP by order_date and create a different partition every Day

A Visual of Range_N Partitioning Per Day

There would be 365 partitions in total for an entire year of Partitioning per Day. Each AMP holds the orders assigned to them in daily partitions on the day of the order date.

An SQL Example that explains Range_N Partitioning per Day

Creating a PPI Table with CASE_N

If an Order_Total is < 1000, it will go into Partition 1. If an Order_Total is between 1000 and 4999.99, it will go in Partition 2. The NO Case partition is for anything falling through and the UNKNOWN partition is for NULL values in the Total.

A Visual of Case_N Partitioning

There are six partitions for this table. Orders go into partitions based on Order_Total.

An SQL Example that explains CASE_N Partitioning

All AMPs retrieve, but each only reads partition 5 

Note: NO CASE, NO RANGE, and UNKNOWN options are also available which we can use in RANGE_N and CASE_N Partitioning

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x