Collecting statistics with the statistics manager

The collection of statistics is handled by a separate component called the statistics manager. Statistical information can be used by the query optimizer to determine the best access plan for a query. Since the query optimizer bases its choice of access plan on the statistical information found in the table, it is important that this information is current.

On many platforms, statistics collection is a manual process that is the responsibility of the database administrator. With IBM® i products, the database statistics collection process is handled automatically, and only rarely is it necessary to update statistics manually.

The statistics manager does not actually run or optimize the query. It controls the access to the metadata and other information that is required to optimize the query. It uses this information to answer questions posed by the query optimizer. The answers can either be derived from table header information, from existing indexes, or from single-column statistics.

The statistics manager must always provide an answer to the questions from the Optimizer. It uses the best method available to provide the answers. For example, it could use a single-column statistic or perform a key range estimate over an index. Along with the answer, the statistics manager returns a confidence level to the optimizer that the optimizer can use to provide greater latitude for sizing algorithms. If the statistics manager provides a low confidence in the number of groups estimated for a grouping request, the optimizer can increase the size of the temporary hash table allocated.