Help with multibyte locales

Question & Answer

Question

What do you need to know to set up multi-byte or different locale databases in IBM Informix Servers?

Cause

You currently have a typical en_us (US English) database, which is a normal, default style database with no extra intervention needed for the setup. You want to expand your company database into another country and use a multi-byte character set, or a different locale for the database. What is needed for the setup?

Answer

There are several issues to consider when setting up a database with a locale which is non-US. Here is an overview of the considerations, which are discussed below:

An understanding of codesets and their impact on a database.
An understanding of default vs nondefault locale codesets.
Unicode vs language codeset usage

Codesets and their impact

An ASCII codeset is based on 128 printable characters that occupy 7 bits of a single byte representation. Some characters encoded in other languages can not be represented by a single byte value. One example of a multibyte codeset is the Japanese code set (sometimes known as Kanji). The technical name is SJIS (ja_jp.sjis). It is a multibyte codeset, with characters that are encoded in 2 or 3 bytes. Some codesets can have up to 4 bytes to represent a printable character. There are several issues to consider in using any codeset:

Column info: With variations in number of bytes per character, a column size for non-integers in a table must be either CHAR or NCHAR (or VARCHAR vs NVARCHAR). How the data type is defined for the column will determine collation order.

CHAR data type follows code set order.
NCHAR data type follows collation order determined by DB_LOCALE, and can be single or multibyte.
LVARCHAR can be single or multibyte and can hold data in the code set of the client or database locale -- if you write the input and output support functions to interpret the LVARCHAR data in the correct locale.

Default vs nondefault Codesets

A default codeset is relative to the language base in which is is used. Some languages have the same letter characters as English, but they also have character variations. The character variations cause the result that some codesets are nondefault.. Some considerations for default and nondefault codesets:

Default codesets depend on the platform, and allow for character variations. If your database connects with another default codeset , it is automatically supported.
Nondefault codesets may or may not support default codesets. If your database connects with a nondefault codeset it may or ay not work correctly. Nondefault locales that will work with the default english UNIX codeset (8859-1) include British English (en_gb.8859-1), French (fr_fr.8859-1), Spanish (es_es.8859-1), and German (de_de.8859-1).
Nondefault locales, such as Japanese SJIS (ja_jp.sjis), Korean (ko_kr.ksc), and Chinese (zh_cn.gb), contain multibyte codesets. (The unified Chinese codeset is GB18030-2000.)
When locales between client and server are nondefault locale, data movement can be complex. Characters may not have a mapped equivalent between client and server at the time of transfer. In some cases, the inserting of a nondefault character into a different locale could mean that the result is not going to display correctly, or it may have an incorrect character substitution. Informix Servers support only one locale per database. When words are transliterated to a different locale the process is handled by a conversion object (.cvo) file. Note that this can only happen if appropriate rules, locales, and .cvo files are present. See the IBM Informix GLS User's Guide (GLS=Global Language Service) for more information in the section Performing Code Set Conversion.
Unicode codesets permit much greater flexibility. Rather than using a single standard for each language, Unicode provides a unique number representation for every character, no matter what the platform, program, or language. If you want a database that handles 2 or more locales at the same time, you will want to use a Unicode locale. As an example, Unicode sets for English and European languages are mostly UTF-8 (Unicode, 8 bit). Some Asian locales are UTF-16, and some codesets have allowances for UTF-32. Informix Servers only work with UTF-8.
Collation order refers to the concept of how printable characters for the codeset will be arranged for sorting and index purposes. The default locale will determine how things will be arranged in terms of sorting. When Unicode is in use, GLS for Unicode (GLU) is a feature that allows your application to use the International Components for Unicode (ICU) libraries instead of the usual GLS libraries. The main advantage of using the ICU libraries is that they take the locale into account when collating Unicode characters, the GLS libraries do not. To force use of GLS for Unicode library collation, set GL_USEGLU=1 in the client and server environment, or compile your application using the -glu option with the esql command. For more information on compiling, see the ESQL/C Programmers Guide.

Codeset installation and setup

If you want an entire software instance in a specific language, set the OS environment variable (LANG) for your specific language first. For example, when prompted, a user can specify the French language as spoken in Canada during the installation process. The code set automatically defaults to the ISO8859-1 codeset. With this information, the system sets the value of the default locale, specified by the LANG environment variable, to fr_CA (fr for ISO8859-1 French and CA for Canada). Every process uses this locale unless the LC_* or LANG environment variables are modified. (Note: The default locale assumes 7 bit ASCII character set, however, Extended Parallel Server (XPS) rejects any filename usage that is not 7 bit ASCII.)
The available codesets are in the $INFORMIXDIR/gls directory. Many additional languages, especially UTF-8 language sets, are included in the International Language Support (ILS) product, available as a separate purchase.
There are 3 environment settings needed for any change in locale: CLIENT_LOCALE, DB_LOCALE, and SERVER_LOCALE.
The setup steps for using different languages and codesets are documented in the IBM Informix GLS User's Guide.

See also:
Esql/c Programmers Guide

Related Information

Changing the database locale

Enabling Code set conversion between replicates

Fundamentals for Global Language Support

Steps to set up multi-byte or different locale database

[{"Product":{"code":"SSGU8G","label":"Informix Servers"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Component":"--","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF010","label":"HP-UX"},{"code":"PF016","label":"Linux"},{"code":"PF022","label":"OS X"},{"code":"PF027","label":"Solaris"},{"code":"PF033","label":"Windows"}],"Version":"11.5;11.7;12.1","Edition":"","Line of Business":{"code":"","label":""}}]

Tips

Help with multibyte locales

Question & Answer

Question

Cause

Answer

Related Information

Was this topic helpful?

Document Information

UID

Share your feedback

Need support?