DB2 Version 10.1 for Linux, UNIX, and Windows

Applications connected to Unicode databases

Applications from any code page environment can connect to a Unicode database. For applications that connect to a Unicode database, the database manager converts character string data between the application code page and the database code page (UTF-8).

When DB2® for Linux, UNIX, and Windows converts characters from a code page to UTF-8, the total number of bytes that represent the characters can expand or shrink, depending on the code page and the code points of the characters. 7-bit ASCII remains invariant in UTF-8, and each ASCII character requires one byte. Non-ASCII characters become more than one byte each. For more information about UTF-8 conversions, see the Unicode standard documents.

Note: The information that applies to applications in mixed code sets also applies to applications that connect to Unicode databases.

For a Unicode database, GRAPHIC data is in UTF-16 big-endian order. If you use the command line processor to retrieve graphic data, the graphic characters are also converted to the client code page. This conversion allows the command line processor to display graphic characters in the current font. Data loss can occur whenever the database manager converts UTF-16 characters to a client code page. Characters that the database manager cannot convert to a valid character in the client code page are replaced with the default substitution character in that code page.

Starting with DB2 Version 8, the database manager checks the code page setting of the client, and performs all required conversions for UTF-16 GRAPHIC data. For example, if a non-Unicode application sends GRAPHIC data, DB2 for Linux, UNIX, and Windows converts the GRAPHIC data to UTF-16 before the data is stored in a Unicode database. Conversely, if a non-Unicode application requests GRAPHIC data from a Unicode database, DB2 for Linux, UNIX, and Windows converts the GRAPHIC data to the code page of the application before the application can access the data.

Note: The following restrictions apply:

When the DB2 Export utility is used to export DBCLOB data to an EUC code page with the LOBSINFILE or LOBSINSEPFILE file type modifier, the resulting LOB file contains EUC data instead of UCS-2 data.
When GRAPHIC data is retrieved from a Unicode database to a non-SBCS, non-EUC, or non-Unicode application, DB2 for Linux, UNIX, and Windows substitutes an ASCII blank character (U+0020) for each blank that is padded to the UTF-16 GRAPHIC column. The substitution is performed because pure DBCS code pages have no equivalent to the UTF-16 blank.
When DATE, TIME, and TIMESTAMP data is retrieved from a Unicode database as a GRAPHIC data type to a non-SBCS, non-EUC, or non-Unicode application, DB2 for Linux, UNIX, and Windows converts these data types to the substitution character. The substitution is performed because the UTF-16 data types contain SBCS characters that have no equivalent in pure DBCS code pages.

Before Version 8, DB2 for Linux, UNIX, and Windows did not perform any automatic conversion of UTF-16 GRAPHIC data. Non-Unicode applications performed the necessary conversions to and from Unicode themselves, or set the WCHARTYPE CONVERT option and used wchar_t. If a Version 7 client connects to a DB2 Version 8 server, the database manager, by default, does not perform data conversion for UTF-16 GRAPHIC data. If you want to override this default behavior, you can set the DB2GRAPHICUNICODESERVER registry variable to OFF.

For applications that connect to DBCS databases, GRAPHIC data is converted between the application DBCS code page and the database DBCS code page.