z/OS XL C Support for the double-byte character set

The number of characters in some languages such as Japanese or Korean is larger than 256, the number of distinct values that can be encoded in a single byte. The characters in such languages are represented in computers by a sequence of bytes, and are called multibyte characters. This topic explains how the z/OS® XL C compiler supports multibyte characters.

Note: The z/OS XL C++ compiler does not have native support for multibyte characters. The support described here is what z/OS XL C provides; for C++, you can take advantage of this support by using interlanguage calls to C code. Please refer to Using Linkage Specifications in C or C++ for more information.

The z/OS XL C compiler supports the IBM® EBCDIC encoding of multibyte characters, in which each natural language character is uniquely represented by one to four bytes. The number of bytes that encode a single character depends on the global shift state information. If a stream is in initial shift state, one multibyte character is represented by a byte or sequence of bytes that has the following characteristics:

It starts with the byte containing the shift-out (0x0e) character.
The shift-out character is followed by 2 bytes that encode the value of the character.
These bytes may be followed by a byte containing the shift-in (0x0f) character.

If the sequence of bytes ends with the shift-in character, the state remains initial, making this sequence represent a 4-byte multibyte character. Multibyte characters of various lengths can be normalized by the set of z/OS XL C library functions and encoded in units of one length. Such normalized characters are called wide characters; in z/OS XL C they are represented by two bytes. Conversions between multibyte format and wide character format can be performed by string conversion functions such as wcstombs(), mbstowcs(), wcsrtombs(), and mbsrtowcs(), as well by the family of the wide character I/O functions. MB_CUR_MAX is defined in the stdlib.h header file. Depending on its value, either of the following happens:

When MB_CUR_MAX is 1, all bytes are considered single-byte characters; shift-out and shift-in characters are treated as data as well.
When MB_CUR_MAX is 4:
- On input, the wide character I/O functions read the multibyte character from the streams, and convert them to the wide characters.
- On output, they convert wide characters to multibyte characters and write them to the output streams.

Both binary and text streams have orientation. Streams opened with type=record or type=blocked do not. There are three possible orientations of a stream:

Non-oriented: A stream that has been associated with an open file before any I/O operation is performed. The first I/O operation on a non-oriented stream will set the orientation of the stream. The fwide() function may be used to set the orientation of a stream before any I/O operation is performed. You can use the setbuf() and setvbuf() functions only when I/O has not yet been performed on a stream. When you use these functions, the orientation of the stream is not affected. When you perform one of the wide character input/output operations on a non-oriented stream, the stream becomes wide-oriented. When you perform one of the byte input/output operations on a non-oriented stream, the stream becomes byte-oriented.
Wide-oriented: A stream on which any wide character input/output functions are guaranteed to operate correctly. Conceptually, wide-oriented streams are sequences of wide characters. The external file associated with a wide-oriented stream is a sequence of multibyte characters. Using byte I/O functions on a wide-oriented stream results in undefined behavior. A stream opened for record I/O or blocked I/O cannot be wide-oriented.
Byte-oriented: A stream on which any byte input/output functions are guaranteed to operate properly. Using wide character I/O functions on a byte input/output stream results in undefined behavior. Byte-oriented streams have minimal support for multibyte characters.

Calls to the clearerr(), feof(), ferror(), fflush(), fgetpos(), or ftell() functions do not change the orientation. Other functions that do not change the orientation are ftello(), fsetpos(), fseek(), fseeko(), rewind(), fldata(), and fileno(). Also, the perror() function does not affect the orientation of the stderr stream.

Once you have established a stream's orientation, the only way to change it is to make a successful call to the freopen() function, which removes a stream's orientation.

The wchar.h header file declares the WEOF macro and the functions that support wide character input and output. The macro expands to a constant expression of type wint_t. Certain functions return WEOF type when the end-of-file is reached on the stream.

Note: The behavior of the wide character I/O functions is affected by the LC_CTYPE category of the current locale, and the setting of MB_CUR_MAX. Wide-character input and output should be performed under the same LC_CTYPE setting. If you change the setting between when you read from a file and when you write to it, or vice versa, you may get undefined behavior. If you change it back to the original setting, however, you will get the behavior that is documented. See the introduction of this topic for a discussion of the effects of MB_CUR_MAX.