The number of characters in some languages such as Japanese or
Korean is larger than 256, the number of distinct values that can
be encoded in a single byte. The characters in such languages are
represented in computers by a sequence of bytes, and are called multibyte
characters. This
topic explains
how the
z/OS® XL C compiler supports multibyte characters.
Note: The
z/OS XL C++ compiler
does not have native support for multibyte characters. The support
described here is what
z/OS XL C provides; for
C++, you can take advantage
of this support by using interlanguage calls to C code. Please refer
to
Using Linkage Specifications in C or C++ for more information.
The
z/OS XL C compiler supports the IBM® EBCDIC
encoding of multibyte characters, in which each natural language character
is uniquely represented by one to four bytes. The number of bytes
that encode a single character depends on the
global
shift state information. If a stream is in initial shift state,
one multibyte character is represented by a byte or sequence of bytes
that has the following characteristics:
- It starts with the byte containing the shift-out (0x0e)
character.
- The shift-out character is followed by 2 bytes that encode the
value of the character.
- These bytes may be followed by a byte containing the shift-in
(0x0f) character.
If the sequence of bytes ends with the shift-in character, the
state remains initial, making this sequence represent a 4-byte multibyte
character. Multibyte characters of various lengths can be normalized
by the set of
z/OS XL C library functions and encoded in units of one
length. Such normalized characters are called wide characters; in
z/OS XL C they
are represented by two bytes. Conversions between multibyte format
and wide character format can be performed by string conversion functions
such as
wcstombs(),
mbstowcs(),
wcsrtombs(), and
mbsrtowcs(), as well by
the family of the wide character I/O functions.
MB_CUR_MAX is
defined in the
stdlib.h header file. Depending on its
value, either of the following happens:
- When MB_CUR_MAX is 1, all bytes are considered single-byte
characters; shift-out and shift-in characters are treated as data
as well.
- When MB_CUR_MAX is 4:
- On input, the wide character I/O functions read the multibyte
character from the streams, and convert them to the wide characters.
- On output, they convert wide characters to multibyte characters
and write them to the output streams.
Both binary and text streams have
orientation.
Streams opened with
type=record or
type=blocked do
not. There are three possible orientations of a stream:
- Non-oriented
- A stream that has been associated with an open file before any
I/O operation is performed. The first I/O operation on a non-oriented
stream will set the orientation of the stream. The fwide() function
may be used to set the orientation of a stream before any I/O operation
is performed. You can use the setbuf() and setvbuf() functions
only when I/O has not yet been performed on a stream. When you use
these functions, the orientation of the stream is not affected. When
you perform one of the wide character input/output operations on a
non-oriented stream, the stream becomes wide-oriented.
When you perform one of the byte input/output operations on a non-oriented
stream, the stream becomes byte-oriented.
- Wide-oriented
- A stream on which any wide character input/output functions
are guaranteed to operate correctly. Conceptually, wide-oriented streams
are sequences of wide characters. The external file associated with
a wide-oriented stream is a sequence of multibyte characters.
Using byte I/O functions on a wide-oriented stream results in undefined
behavior. A stream opened for record I/O or blocked I/O cannot be
wide-oriented.
- Byte-oriented
- A stream on which any byte input/output functions are guaranteed
to operate properly. Using wide character I/O functions on a byte
input/output stream results in undefined behavior. Byte-oriented streams
have minimal support for multibyte characters.
Calls to the clearerr(), feof(), ferror(), fflush(), fgetpos(), or ftell() functions
do not change the orientation. Other functions that do not change
the orientation are ftello(), fsetpos(), fseek(), fseeko(), rewind(), fldata(),
and fileno(). Also, the perror() function does
not affect the orientation of the stderr stream.
Once you have established a stream's orientation, the only way
to change it is to make a successful call to the freopen() function,
which removes a stream's orientation.
The wchar.h header file declares the WEOF macro
and the functions that support wide character input and output. The
macro expands to a constant expression of type wint_t.
Certain functions return WEOF type when the end-of-file
is reached on the stream.
Note: The behavior of the wide character I/O functions is
affected by the LC_CTYPE category of the current locale,
and the setting of MB_CUR_MAX. Wide-character input and
output should be performed under the same LC_CTYPE setting.
If you change the setting between when you read from a file and when
you write to it, or vice versa, you may get undefined behavior. If
you change it back to the original setting, however, you will get
the behavior that is documented. See the introduction of this topic for a discussion
of the effects of MB_CUR_MAX.