Teradata Architecture

Contents

The Teradata database Architecture

  • PARSING ENGINE (PE)
  • Message Passing Layer(MPL)
  • The Access Module Processor (AMP)
  • VIRTUAL DISK (Vdisk)

What is Teradata SMP and MMP System?

BYNET and its Types?

What is inside a Teradata Node?

Additional Information

*****************************************************************************************************

The Teradata database Architecture:

The Teradata database architecture mainly consists of PARSING ENGINE (PE) , Message Passing Layer(MPL), The Access Module Processor (AMP) and VIRTUAL DISK (Vdisk) as shown in below Image:

1. The Parsing Engine

The Parsing Engine (PE) takes the User’s SQL and builds a Plan for each AMP to follow to retrieve the data. Parallel Processing is all about each AMP doing an equal amount of the work. If they start at the same time and end the same time, they are performing true Parallel Processing. All communication is done over the BYNET.

The Parsing Engine is responsible for:

  • Managing individual sessions (up to 120)
  • Parsing and Optimizing your SQL requests
  • Dispatching the optimized plan to the AMPs
  • Input conversion (EBCDIC / ASCII) -if necessary
  • Sending the answer set response back to the requesting client

The following are the processes performed by PE:

  • Parser: The Parser checks for the syntax, if true forward the query to Session Handler.
  • Session Handler: it does all the security checks, such as checking of logging credentials and whether the user has permission to execute the query or not.
  • Optimizer: It finds out the best possible and optimized plan to execute the query.
  • Dispatcher: The Dispatcher forwards the query to the AMPs.

2. Message Passing Layer

The Message Passing Layer (MPL) or Communications Layer handles the internal communication of the Teradata Database. All communication between PEs and AMPs is done via the Message Passing Layer.

The Message Passing Layer or Communications Layer is responsible for:

  • Carrying messages between the AMPs and PEs
  • Point-to-Point, Multi-Cast, and Broadcast communications
  • Merging answer sets back to the PE
  • Making Teradata parallelism possible

Important:

Depending on the nature of the dispatch request, the communication may be a:
Broadcast – message is routed to all AMPs and PEs on the system
Multi-Cast – message is routed to a group of AMPs
Point-to-Point – message is routed to one specific AMP or PE on the system

Message Passing Layer or Communications Layer is a combination of:

  • Parallel Database Extensions (PDE) Software
  • BYNET Software
  • BYNET Hardware for MPP systems

Important:

PDE and BYNET software – used for multi-node MPP systems and single-node SMP systems.

The Parallel Database Extension (PDE) controls the Access Module Processors (AMPs) and Parsing Engines (PEs) which are referred to as Virtual Processors (Vprocs) and they reside in the nodes memory. 

3. The Access Module Processor (AMP)

The Access Module Processor (AMP) is responsible for managing a portion of the database. An AMP will control some portion of each table on the system. AMPs do all of the physical work associated with generating an answer set including, sorting, aggregating, formatting and converting.

The AMPs are responsible for:

  • Accesses storage using Teradata’s File System Software
  • Lock management- Sorting rows
  • Aggregating columns
  • Join processing
  • Output conversion and formatting
  • Creating answer set for client
  • Disk space management
  • Accounting
  • Special utility protocols
  • Recovery processing

AMPs are also responsible for Teradata File System Software:

  • Translates DatabaseID/TableID/RowID into location on storage
  • Controls a portion of physical storage
  • Allocates storage space by “Cylinders”

4. Virtual Disks 

  • Teradata offers a set of Virtual Disks for each AMP. The storage area of each AMP is called as Virtual Disk or Vdisk.
  • These are actual physical data storage units which are accessed/manipulated by AMPs on user requests.
  • A Virtual Disk is disk space associated with an AMP. Tables/data rows are stored in this space. A virtual disk is usually assigned to two or more disk drives in a disk array. This concept will be discussed in detail later in the course.

What is Teradata SMP and MMP System?

To Know about Teradata SMP and MMP system, we have know first about Teradata Node.

What is Node?

  • The “compute or processing node” is the basic building block of the hardware for a Teradata system.
  • The processing node contains CPUs, memory, and I/O. Physically, a node is a computer that has its own CPUs, memory, I/O , power supplies, fans, etc. and has internal disks for its operating system (e.g., Linux) and Teradata software.
  • The physical nodes are independent of each other, but are interconnected with Teradata software and the BYNET.
  • Teradata is the database application that executes on one or more processing nodes and makes a multi-node system appear as a single database system to users.
  • Each node is effectively an SMP (Symmetrical Multi-Processing) node.
  • Depending on the configuration, a varying number of AMPs and/or PEs can execute on a node.

SMP System

  • SMP stands for symmetric multi-processing.
  • Single Node system is know as Teradata SMP system.
  • SMP is a symmetric multi-processing which means each CPU processor performs equally, and all CPUs share a pool of memory and operate under one operating system. Each node is designed to operate at maximum performance.
  • AMPs and PEs are called Virtual Processors because each is a process that lives inside a node’s memory.
  • Each Node is attached via a Network to a Disk farm

MMP System

  • MMP stands for Massively Parallel Processing
  • Two or more SMP Nodes connected become one MMP System.
  • When nodes are connected to the BYNETs, then they become part of one large Teradata system.
  • Each node is connected to the BYNETs so now in above figure our system has 8 Parsing Engines and 80 AMPs, but physically they are separate hardware nodes.
  • When a customer wants to grow their system, they add additional nodes, which in turn add additional Parsing Engines, AMPs and disks. Two SMP nodes connected via the BYNETs are now one Massively Parallel Processing (MPP) system.

What is BYNET and its Types?

BYNET

  • Message Passing Layer called as BYNET, is the networking layer in Teradata system. It allows the communication between PE and AMP and also between the nodes. It receives the execution plan from Parsing Engine and sends to AMP. Similarly, it receives the results from the AMPs and sends to Parsing Engine
  • Each node has an internal BYNET communication system within the node, so the PEs and AMPs can communicate. One node is called a Symmetric Multiprocessing Node (SMP), and if the Teradata system is a single node system, it won’t have a physical BYNET. Once multiple SMP nodes are connected to produce a Massively Parallel Processing system (MPP), then two physical BYNET boards connect the nodes together.

Types of BYNET

  • The Boardless BYNET
  • The Physical BYNET

The Boardless BYNET

  • The Boardless BYNET connect Parsing Engine (PE) and Access Modules Process (AMP) so they both can communicate between each others.
  • This Boardless BYNET is within Node

The Physical BYNET

The Physical BYNET connect Two SMP Node to produce a MMP.

In more details Once multiple SMP nodes are connected to produce a Massively Parallel Processing system (MPP), then two physical BYNET boards connect the nodes together.

There are two Physical BYNETs in Teradata

  • BYNET 0
  • BYNET 1

Benefit of these Two Physical BYNETs in Teradata are:

  1. If one BYNET fails, the second one can take its place.
  2. When data is large, both BYNETs can be made functional, which improves the communication between PE and AMPs, thus fastening the process.

What is inside a Teradata Node?

Gateway and Channel-drive software run as processes. Users connecting via the Mainframe access Teradata though the Channel and all other users utilize the LAN gateway. The Parallel Database Extension (PDE) controls the Access Module Processors (AMPs) and Parsing Engines (PEs) which are referred to as Virtual Processors (Vprocs) and they reside in the nodes memory. The operating system running the node is Linux.

Additional Information:

Important Features of Teradata Architecture:

Teradata Parallelism:

  • Each PE can handle up to 120 sessions in parallel.
  • Each Session can handle multiple REQUESTS.
  • The Message Passing Layer can handle all message activity in parallel.
  • Each AMP can perform up to 80 tasks in parallel (can be configured for more).
  • All AMPs can work together in parallel to service any request.
  • Each AMP can work on several requests in parallel.

Linear Growth and Expandability:

  • Double the number of AMPs and the number of users stays the same than Performance doubles.
  • Users is doubled, as well as the number of AMPs than performance stays the same.
  • Number of Parsing Engines increases, the number of SQL requests that can be supported increases
  • Add AMPs, the datum is spread out more even as you add processing power to handle the data.
  • Add disks, you add space for each AMP to store and process more information.

Subscribe
Notify of
guest
1 Comment
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
dfa
dfa
10 months ago

MPP not MMP

1
0
Would love your thoughts, please comment.x
()
x