Preventing transitive data skew

Troubleshooting

Problem

Transitive skew is skew introduced during query execution. Transitive skew is evident by reviewing usage of the swap partition (as opposed to primary/mirror as is the case with fixed skew), which is cleared up once the query completes.

Diagnosing The Problem

How does transitive data skew affect system performance?

Transitive data skew, if significant enough and for a prolonged period of time, can seriously impact query performance by unevenly distributing query processing, thus negating some of the benefits of Netezza's massively parallel architecture. The best approach to preventing it is understanding why it can occur. Planning, updating statistics, and leveraging the sampling mechanisms provided by Netezza are all part of defending against it.

The most common cause of data skew is a lack of collocation between like objects. When two objects are joined in a query using a column that is not the distribution key for either, the Optimizer is forced to decide what is the best way to collocate these two objects such that every device will have visibility to the rows needed to accurately process the join. There are only two methods at the Optimizer's disposal to accomplish this task in this type of situation:

1. Broadcast one of the tables
2. Re-distribute both on the column(s) the tables are being joined on

If option 1 is chosen, we may not see the swap partition on one or a few devices grow larger than all the others because the broadcast will copy an equal amount of data to all of the devices in the system. If the table being broadcast is very large (total size greater than swap partition on a single device), the query could fail with the following error:

SPU swap partition : Disk temporary work space is full

However, if the table whose distribution was preserved is already skewed, then we would expect to see some processing skew, with the skewed devices taking longest to complete particular snippet operations.

If option 2 is chosen and the tables involved are large, it's easy to see how this could produce transitive data skew. Typically, the columns having the highest cardinality are chosen for distribution keys. Joining to another large table on a column without high cardinality could then result in a single device or a few devices being the target of the re-distribution task. When this happens, these devices will take inordinately longer to process this snippet and any other that might be running concurrently. Worse, if there is very low cardinality, you could again encounter the error disk temporary work space full error:

SPU swap partition : Disk temporary work space is full

Resolving The Problem

Recommendations for Defending Against Transitive Data Skew
1. Frequently review your query history paying close attention to longer-running queries. Run these queries during non-peak hours and review the plans produced. Are they taking a long time in a snippet having a re-distribution task? If so, it may be time to revisit the distribution keys chosen for the tables involved. If further analysis is required, open a support incident and be sure to attach the query plan and schema(s) involved.
2. Generate statistics regularly. In addition to cleaning up zone map information, generating statistics recalculates dispersion information when appropriate. When this information is available, the Optimizer can use it to recalculate the estimated cost of distributing a table on the column.
3. Ensure the Netezza-developed sampling techniques are enabled. The following features all use real-time sampling to better understand the data being queried to ensure that the appropriate plan is produced:
1. JIT Statistics -- sample scans are used to better estimate the number of rows surviving a particular restriction. This is only used when the table's row size is > jit_stats_min_rows and is involved in a join and has a restriction against it (where column1 = 5). To confirm it is enabled, run 'show enable_jit_stats'.
2. JIT Dispersion (NPS 4.6+) -- for tables larger than jit_disp_min_rows, the system will now use sample scans to estimate dispersion rather than calculating this statistic when statistics are run. To confirm it is enabled, run 'show enable_jit_dispersion'.
3. JIT Join (NPS 4.6+) -- this feature uses sampling to better estimate the number of rows that will survive a particular join. To confirm this is enabled, run 'show enable_jit_join'.
4. Try to avoid setting distribution keys for really small dimension tables. This will ensure that the Optimizer never inadvertently chooses to preserve the distribution of a smaller dimension table and instead redistribute the larger table.
5. Periodically review your Fact Relation Planner settings to make sure these properly account for increases in data volume. The setting of most concern is factrel_size_threshold; this parameter is the row count over which a table gets designated as fact. This does not prevent data movement of these tables but does increase the perceived cost. By increasing the cost of moving this data, the Optimizer is more likely to consider other alternatives in search of a more effective way to collocate data.

[{"Product":{"code":"SSULQD","label":"IBM PureData System"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Component":null,"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"1.0.0","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Historical Number

NZ735420

Was this topic helpful?

Document Information

Modified date:
17 October 2019

UID

swg21515681

Tips