IBM InfoSphere Federation Server, Version 10.5

Code sets, collating sequences, and national language support

When you create a federated database, you specify the code set, territory, and collating sequence. This information controls the language in which data is stored and the sequence in which character data is sorted.

A code set is a set of unique bit patterns that map to the characters of a specific natural language. IBM® products use the term code page as a synonym for code set. A territory identifies a locale and specifies region-specific information for the specified code set. If you do not specify these options, the database uses the language and collating sequence of the DB2® client that you use to create the database.

Before you create the federated database, determine which code set and territory values to specify. After you create the database, you cannot change these values. To choose a code set for the federated database, evaluate the code set specified by the remote data source that the federated database will access. Choose a code set for the federated database that corresponds to the code set that the remote data sources use. If the federated database will access multiple data sources, evaluate the code sets specified by all of the remote data sources. If the data sources use different code sets or incompatible code sets, specify Unicode as the code set for the federated server.

For many data sources, the first time that a wrapper connects to a data source, the wrapper performs these tasks:

Determines the code page and territory of the federated database.
Maps the code set and territory to a data source client locale, if the data source supports one.
Sets an environment variable, calls a data source API to tell the data source what the client locale is, or prepares to perform code set conversion.

Code page conversion involves converting character data between the code page of the data source database and the code page of the federated database. Some data sources perform code page conversion. For some data sources that do not perform code page conversion, the wrapper performs the conversion. For example, if the federated database uses code page 819, territory US, the equivalent Oracle client locale is American_America.WE8ISO8859P1. The Oracle wrapper automatically sets the NLS_LANG environment variable to the Oracle client locale value. Then when data is sent from the Oracle database to the wrapper, the Oracle database converts the data from code set American_America.WE8ISO8859P1 to code page 819. When data is sent from the wrapper to the Oracle database, the Oracle server or client converts the data from code page 819 to the code set that the Oracle database uses.

The collating sequence is related to the language that the federated server supports and that the data source server supports. To specify the collating sequence for the federated database, include the COLLATE USING option on the CREATE DATABASE command. If the federated database and the data source use the same collating sequence, set the COLLATING_SEQUENCE server option to 'Y' when you issue the CREATE SERVER statement. The collating sequence that you specify for the federated database affects where queries that involve character sorting or character comparisons are performed. By default, the federated database uses a case-sensitive collating sequence. However, some data sources use a case-insensitive collating sequence as the default. A data source might also allow the collating sequence to be customized or might offer multiple options for setting the default code page. If the collating sequence of the federated database and the data source are different, a query might not return the results that you expect. For example, if the query involves character-sorting, the correct results are returned, but they are not in the order that you expect. If the query involves character comparisons, incorrect results might returned.

Where character-based operations are performed has an effect on query performance. When the collating sequences differ, the federated server performs character sorting and character comparisons locally to ensure that consistently sorted result sets are returned. For best results, set the federated database collating sequence to the same sequence that the data source uses. Then, where possible, the query optimizer pushes down character-based operations to the data source, so that the data source, not the federated database, performs the operations.

Feedback