IBM PureData System for Analytics, Version 7.1

Clustered base tables

A clustered base table (CBT) is a user table that contains data that is organized by using one to four organizing keys. An organizing key is a column of the table that you specify for clustering the table records; IBM® Netezza® uses the organizing keys to group records within the table and save them in the same or nearby extents. Netezza also creates zone maps for the organizing columns to accelerate the performance of queries on that table that restrict using the organizing keys.

The following figure shows a simple model of a table, such as a transaction table. In its unorganized form, the data is organized by the date and time that each transaction occurred, and the color indicates a unique transaction. If your queries on the table most often query by date/time restrictions, those queries run well because the date/time organization matches the common restrictions of the queries.

However, if most queries restrict on transaction type, you can increase query performance by organizing the records by transaction type. Queries that restrict on transaction type will have improved performance because the records are organized and grouped by the key restriction; the query can obtain the relevant records more quickly, whereas they would have to scan much more of the table in the date/time organization to find the relevant transactions. By organizing the data in the table so that commonly filtered data is located in the same or nearby disk extents, your queries can take advantage of zone maps to eliminate unnecessary disk scans to find the relevant records.

Figure 1. Organizing tables with CBTs

CBTs are most often used for large fact or event tables that can have millions or billions of rows. If the table does not have a record organization that matches the types of queries that run against it, scanning the records of such a large table requires a lengthy processing time as full disk scans can be needed to gather the relevant records. By reorganizing the table to match your queries against it, you can group the records to take advantage of zone maps and improve performance.

CBTs offer several benefits:



Feedback | Copyright IBM Corporation 2014 | Last updated: 2014-02-28