Applications from any code page environment can connect
to a Unicode database. For applications that connect to a Unicode
database, the database manager converts character string data between
the application code page and the database code page (UTF-8).
When
DB2® for Linux, UNIX, and Windows converts
characters from a code page to UTF-8, the total number of bytes that
represent the characters can expand or shrink, depending on the code
page and the code points of the characters. 7-bit ASCII remains invariant
in UTF-8, and each ASCII character requires one byte. Non-ASCII characters
become more than one byte each. For more information about UTF-8 conversions,
see the Unicode standard documents.
Note: The information that applies
to applications in mixed code sets also applies to applications that
connect to Unicode databases.
For a Unicode database, GRAPHIC data is in UTF-16 big-endian order.
If you use the command
line processor to
retrieve graphic data, the graphic characters are also converted
to the client code page. This conversion allows the command
line processor to display
graphic characters in the current font. Data loss can occur whenever
the database manager converts UTF-16 characters to a client code page. Characters
that the database manager cannot convert to a valid character in the
client code page are replaced with the default substitution character
in that code page.
Starting with DB2 Version
8, the database manager checks the code page setting of the client,
and performs all required conversions for UTF-16 GRAPHIC data. For
example, if a non-Unicode application sends GRAPHIC data,
DB2 for Linux, UNIX, and Windows converts
the GRAPHIC data to UTF-16 before the data is stored in a Unicode
database. Conversely, if a non-Unicode application requests GRAPHIC
data from a Unicode database,
DB2 for Linux, UNIX, and Windows converts
the GRAPHIC data to the code page of the application before the application
can access the data.
Note: The following restrictions apply:
- When the DB2 Export utility
is used to export DBCLOB data to an EUC code page with the LOBSINFILE
or LOBSINSEPFILE file type modifier, the resulting LOB file contains
EUC data instead of UCS-2 data.
- When GRAPHIC data is retrieved from a Unicode database to a non-SBCS,
non-EUC, or non-Unicode application, DB2 for Linux, UNIX, and Windows substitutes
an ASCII blank character (U+0020) for each blank that is padded to
the UTF-16 GRAPHIC column. The substitution is performed because pure
DBCS code pages have no equivalent to the UTF-16 blank.
- When DATE, TIME, and TIMESTAMP data is retrieved from a Unicode
database as a GRAPHIC data type to a non-SBCS, non-EUC, or non-Unicode
application, DB2 for Linux, UNIX, and Windows converts
these data types to the substitution character. The substitution is
performed because the UTF-16 data types contain SBCS characters that
have no equivalent in pure DBCS code pages.
Before Version 8, DB2 for Linux, UNIX, and Windows did
not perform any automatic conversion of UTF-16 GRAPHIC data. Non-Unicode
applications performed the necessary conversions to and from Unicode
themselves, or set the WCHARTYPE CONVERT option and used wchar_t.
If a Version 7 client connects to a DB2 Version
8 server, the database manager, by default, does not perform
data conversion for UTF-16 GRAPHIC data. If you want to override this
default behavior, you can set the DB2GRAPHICUNICODESERVER registry
variable to OFF.
For applications that connect to DBCS databases, GRAPHIC data is
converted between the application DBCS code page and the database
DBCS code page.