{"id":27721,"date":"2022-02-25T14:09:51","date_gmt":"2022-02-25T14:09:51","guid":{"rendered":"https:\/\/www.devopsschool.com\/blog\/?p=27721"},"modified":"2022-12-23T06:48:39","modified_gmt":"2022-12-23T06:48:39","slug":"hashing-of-the-primary-index","status":"publish","type":"post","link":"https:\/\/www.devopsschool.com\/blog\/hashing-of-the-primary-index\/","title":{"rendered":"Hashing of the Teradata Primary Index"},"content":{"rendered":"\n<figure class=\"wp-block-table is-style-stripes\"><table class=\"has-vivid-cyan-blue-background-color has-background\"><thead><tr><th>#<\/th><th>Contents<\/th><\/tr><\/thead><tbody><tr><td>1.<\/td><td>The Hashing Formula Facts<\/td><\/tr><tr><td>2.<\/td><td>Important Hashing formula Terminology<\/td><\/tr><tr><td>3.<\/td><td>How Hashing Algorithm decides which AMP has the Row?<\/td><\/tr><tr><td>4.<\/td><td>Hashing Theory For Non-&nbsp;Unique Primary Indexes have Skewed Data<\/td><\/tr><tr><td>5.<\/td><td>What Uniqueness Value?<\/td><\/tr><tr><td>6.<\/td><td>What is ROW ID?<\/td><\/tr><tr><td>7.<\/td><td>Why each AMP Sorts their rows by the Row-ID?<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"the-hashing-formula-facts\">The Teradata Hashing Formula Facts <\/h2>\n\n\n\n<ul class=\"wp-block-list\"><li>There is one Hashing Formula in Teradata, and it is consistent.<\/li><li>The concept is to take the value of a row\u2019s Primary Index and run it through the Hash Formula.<\/li><li>It will produce a Row Hash number.<\/li><li>That Row Hash will stay with the row forever and reside as the first part of the row.<\/li><li>The Row Hash also determines which AMP owns the row.<\/li><\/ul>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-127.png\" alt=\"\" class=\"wp-image-27724\" width=\"452\" height=\"229\" srcset=\"https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-127.png 838w, https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-127-300x152.png 300w, https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-127-768x389.png 768w\" sizes=\"auto, (max-width: 452px) 100vw, 452px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"important-hashing-formula-terminology\">Important Hashing formula Terminology<\/h2>\n\n\n\n<p id=\"row-hash\"><strong>Row Hash<\/strong> :- The primary Index row goes through Secret hash Formula which gives output number as Row Hash<\/p>\n\n\n\n<p id=\"hash-map\"><strong>Hash Map<\/strong> :-The Row hash number goes to a bucket in the Hash Map and is assigned to an AMP.<\/p>\n\n\n\n<p id=\"hash-bucket\"><strong>Hash Bucket<\/strong>:- If a Row Hash number is 5. Teradata counted over to bucket 5 in the Hash Map, and it has the number two (2) inside that bucket. This means that this row will go to AMP 2.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"how-hashing-algorithm-decides-which-amp-has-the-row\">How Hashing Algorithm decides which  AMP has the Row?<\/h2>\n\n\n\n<p><strong>Hashing Algorithm Process:<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-128.png\" alt=\"\" class=\"wp-image-27729\" width=\"559\" height=\"335\" srcset=\"https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-128.png 823w, https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-128-300x180.png 300w, https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-128-768x460.png 768w\" sizes=\"auto, (max-width: 559px) 100vw, 559px\" \/><\/figure>\n\n\n\n<ul class=\"wp-block-list\"><li>A row will be placed on an AMP after the Parsing Engine (PE) hashes the row\u2019s Primary Index value.<\/li><li>The output of the Hashing Algorithm is a row\u2019s Row Hash.<\/li><li>The Row hash goes to a bucket in the Hash Map and is assigned to an AMP.<\/li><li>The above example hashed Emp_No 1002 (Primary Index value) and the output was a Row Hash of 5. Teradata counted over to bucket 5 in the Hash Map, and it has the number two (2) inside that bucket. This means that this row will go to AMP 2.<\/li><li>Than for examples if hashed Emp_No 1003 (Primary Index value) and the output was a Row Hash of 13. Teradata counted over to bucket 9 in the Hash Map, and it has the number one (3) inside that bucket. This means that this row will go to AMP 3<\/li><\/ul>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-129.png\" alt=\"\" class=\"wp-image-27731\" width=\"464\" height=\"249\" srcset=\"https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-129.png 803w, https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-129-300x161.png 300w, https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-129-768x412.png 768w\" sizes=\"auto, (max-width: 464px) 100vw, 464px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"hashing-theory-for-non-unique-primary-indexes-have-skewed-data\">Hashing Theory For Non-&nbsp;Unique Primary Indexes have Skewed Data<\/h2>\n\n\n\n<ul class=\"wp-block-list\"><li>For Example, if we made Last_Name the Primary Index for a table. Here is an example of how it would distribute. Notice all duplicates have the same Row Hash.<\/li><li>The Hash Formula is consistent so every Smith has the same Row Hash and the same goes for each Jones and each Patel. Therefore, duplicate values land on the same AMP.<\/li><\/ul>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-130.png\" alt=\"\" class=\"wp-image-27732\" width=\"484\" height=\"238\" srcset=\"https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-130.png 785w, https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-130-300x148.png 300w, https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-130-768x378.png 768w\" sizes=\"auto, (max-width: 484px) 100vw, 484px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"what-uniqueness-value\">What Uniqueness Value ?<\/h2>\n\n\n\n<p>Each AMP will place a Uniqueness Value after the row hash to track duplicate values.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-131.png\" alt=\"\" class=\"wp-image-27733\" width=\"525\" height=\"318\" srcset=\"https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-131.png 788w, https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-131-300x182.png 300w, https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-131-768x465.png 768w\" sizes=\"auto, (max-width: 525px) 100vw, 525px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"what-is-row-id\">What is ROW ID?<\/h2>\n\n\n\n<p>The&nbsp;Row Hash and Uniqueness Value make up the Row-ID.<\/p>\n\n\n\n<p>Row-ID equals the Row Hash of the Primary Index column and the Uniqueness Value<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-132.png\" alt=\"\" class=\"wp-image-27734\" width=\"280\" height=\"278\" srcset=\"https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-132.png 767w, https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-132-300x297.png 300w, https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-132-150x150.png 150w, https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-132-80x80.png 80w\" sizes=\"auto, (max-width: 280px) 100vw, 280px\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"a-row-id-example-for-a-unique-primary-index\">A&nbsp;Row-ID Example for a Unique Primary Index<\/h3>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-133.png\" alt=\"\" class=\"wp-image-27735\" width=\"536\" height=\"209\" srcset=\"https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-133.png 829w, https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-133-300x117.png 300w, https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-133-768x299.png 768w\" sizes=\"auto, (max-width: 536px) 100vw, 536px\" \/><\/figure>\n\n\n\n<p>The two things for this&nbsp;Unique&nbsp;Primary Index&nbsp;(UPI)&nbsp;example:<\/p>\n\n\n\n<p>1) The&nbsp;Uniqueness Value&nbsp;on each Row-ID is 1.<\/p>\n\n\n\n<p>2) Each AMP&nbsp;sorts&nbsp;their rows by the&nbsp;Row-ID&nbsp;.<\/p>\n\n\n\n<p>Row Hash and the Uniqueness Value make up the Row-ID. AMPs sort by the Row-ID.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"a-row-id-example-for-a-non-unique-primary-index-nupi\">A&nbsp;Row-ID Example for a Non-Unique Primary Index (NUPI)<\/h3>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-134.png\" alt=\"\" class=\"wp-image-27736\" width=\"470\" height=\"291\" srcset=\"https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-134.png 837w, https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-134-300x186.png 300w, https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-134-768x476.png 768w\" sizes=\"auto, (max-width: 470px) 100vw, 470px\" \/><\/figure>\n\n\n\n<p>Row Hash and the Uniqueness Value make up the Row-ID. AMPs sort by the Row-ID.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"why-each-amp-sorts-their-rows-by-the-row-id\">Why each AMP Sorts their rows by the Row-ID?<\/h2>\n\n\n\n<p>Two&nbsp;Reasons why each AMP Sorts their rows by the Row-ID<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>AMPs sort rows by Row-ID so like data is grouped together<\/li><li>For Binary searches<\/li><\/ul>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-135.png\" alt=\"\" class=\"wp-image-27737\" width=\"419\" height=\"283\" srcset=\"https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-135.png 809w, https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-135-300x203.png 300w, https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-135-768x519.png 768w\" sizes=\"auto, (max-width: 419px) 100vw, 419px\" \/><\/figure>\n\n\n\n<p><strong>AMPs&nbsp;sort their rows by Row-ID to Group like Data<\/strong><\/p>\n\n\n\n<p>All of the Smiths are lumped together because of the sorting by Row-ID.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-136.png\" alt=\"\" class=\"wp-image-27741\" width=\"529\" height=\"351\" srcset=\"https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-136.png 796w, https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-136-300x199.png 300w, https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-136-768x509.png 768w\" sizes=\"auto, (max-width: 529px) 100vw, 529px\" \/><\/figure>\n\n\n\n<p><strong>AMPs&nbsp;sort their rows by Row-ID to do a Binary Search<\/strong><\/p>\n\n\n\n<p>A Binary Search knows the Row-IDs are in numeric order. It&#8217;s like you using a phone book. Go to the middle first and then go up or down in chunks to find things quickly.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-137.png\" alt=\"\" class=\"wp-image-27742\" width=\"487\" height=\"309\" srcset=\"https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-137.png 749w, https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-137-300x190.png 300w\" sizes=\"auto, (max-width: 487px) 100vw, 487px\" \/><\/figure>\n\n\n\n<p><strong>Important Point:<\/strong><\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p><em><strong>Null&nbsp;Values all Hash to the Same AMP<\/strong><\/em><\/p><cite>If there are NULL values in the Primary Index, you could find this is the reason for your skew. A Table with a Unique Primary Index can have only one Null value, but a NUPI table can have many NULL values, and each NULL value hashes to the same AMP.<\/cite><\/blockquote>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-138.png\" alt=\"\" class=\"wp-image-27743\" width=\"460\" height=\"277\" srcset=\"https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-138.png 805w, https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-138-300x181.png 300w, https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/02\/image-138-768x463.png 768w\" sizes=\"auto, (max-width: 460px) 100vw, 460px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p><strong><em>Checking data distribution of given columns on the AMP<\/em><\/strong><\/p><cite>SELECT HASHAMP(HASHBUCKET(HASHROW([columnlist]))),COUNT(*) FROM [databasename].[tablename] GROUP BY 1 ORDER BY 2 DESC;<\/cite><\/blockquote>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p># Contents 1. The Hashing Formula Facts 2. Important Hashing formula Terminology 3. How Hashing Algorithm decides which AMP has the Row? 4. Hashing Theory For Non-&nbsp;Unique Primary Indexes have&#8230; <\/p>\n","protected":false},"author":48,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_joinchat":[],"footnotes":""},"categories":[7338],"tags":[],"class_list":["post-27721","post","type-post","status-publish","format-standard","hentry","category-teradata"],"_links":{"self":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/27721","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/users\/48"}],"replies":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/comments?post=27721"}],"version-history":[{"count":5,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/27721\/revisions"}],"predecessor-version":[{"id":27867,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/27721\/revisions\/27867"}],"wp:attachment":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/media?parent=27721"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/categories?post=27721"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/tags?post=27721"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}