Unicode considerations for printer files

Be aware of these Unicode considerations for positional entries and keyword entries for printer files.

Unicode is a universal encoding scheme for written characters and text, which enables the exchange of data internationally. A Unicode field can contain all types of characters used on the IBM® i operating system, including ideographic (DBCS) characters. In this topic, the term code unit means the minimal bit combination that can represent a unit of encoded text for processing or interchange.

DDS printer files support two transformation formats (encoding forms) of Unicode:

  • UTF-16 is a 16-bit encoding form designed to provide code values for over a million characters and a superset of UCS-2. UTF-16 data is stored in graphic data types. The CCSID value for data in UTF-16 format is 1200.

    A UTF-16 code unit is 2 bytes in length. A UTF-16 character can be 1 or 2 code units (2 or 4 bytes) in length. A UTF-16 data string can contain any character, including UTF-16 surrogates and combining characters.

  • UCS-2 is the Universal Character Set coded in 2 octets, which means that characters are represented in 16 bits per character. UCS-2 data is stored in graphic data types. The CCSID value for data in UCS-2 format is 13488.

    UCS-2 is a subset of UTF-16 and can no longer support all of the characters defined by Unicode. UCS-2 is identical to UTF-16 except that UTF-16 also supports combining characters and surrogates. If you do not need combining characters and surrogates, you might choose to use UCS-2.