Statistics manager

In CQE, the retrieval of statistics is a function of the Optimizer. When the Optimizer needs to know information about a table, it looks at the table description to retrieve the row count and table size. If an index is available, the Optimizer might extract information about the data in the table. In SQE, the collection and management of statistics is handled by a separate component called the statistics manager. The statistics manager leverages all the same statistical sources as CQE, but adds more sources and capabilities.

The statistics manager does not actually run or optimize the query. Instead, it controls the access to the metadata and other information that is required to optimize the query. It uses this information to answer questions posed by the query optimizer. The statistics manager always provides answers to the optimizer. In cases where it cannot provide an answer based on actual existing statistics information, it is designed to provide a predefined answer.

The Statistics manager typically gathers and tracks the following information:

Cardinality of values
The number of unique or distinct occurrences of a specific value in a single column or multiple columns of a table.
Selectivity
Also known as a histogram, this information is an indication of how many rows are selected by any given selection predicate or combination of predicates. Using sampling techniques, it describes the selectivity and distribution of values in a given column of the table.
Frequent values
The top nn most frequent values of a column together with a count of how frequently each value occurs. This information is obtained by using statistical sampling techniques. Built-in algorithms eliminate the possibility of data skewing. For example, NULL values and default values that can influence the statistical values are not taken into account.
Metadata information
Includes the total number of rows in the table, indexes that exist over the table, and which indexes are useful for implementing the particular query.
Estimate of IO operation
An estimate of the amount of IO operations that are required to process the table or the identified index.

The Statistics manager uses a hybrid approach to manage database statistics. Most of this information can be obtained from existing indexes. In cases where the required statistics cannot be gathered from existing indexes, statistical information is constructed on single columns of a table and stored internally. By default, this information is collected automatically by the system, but you can manually control the collection of statistics. Unlike indexes, however, statistics are not maintained immediately as data in the tables change.