Some string manipulation functions operate in terms of bytes instead of characters. When we treat characters as bytes, it is easy to inadvertently truncate a non-single-byte character, split a non-single-byte character into its individual bytes, or lose the other half of a control code pair such as the Shift In control code.
Guideline G3

|
Never split a multibyte character into bytes.
|
There are several ways to compensate for the resultant string to ensure the integrity of every character. If the string length is fixed, you can pad neutral single-byte characters such as SPACE or NULL to the string. Otherwise, you can shorten the resultant string.
Example: To extract a mixed single-byte and double-byte string starting at the second byte for six consecutive bytes on the IBM mainframe host computer, you can do the following:
| Function |
Intermediate Result |
Final Result |
|
Extract( "ssSOd1d2d3SIs", 2, 6 )
where
- s is a single-byte character
- d1 to d3 are three double-byte characters
- SO is the Shift Out control character
- SI is the Shift In control character
|
sSOd1d2 |
sSOd1SI_ or
sSOd1SI\0 or
sSOd1SI if the resultant string length is not fixed to 6 bytes
where
- _ is the single-byte SPACE character
- \0 is the single-byte NULL character
|
Example: To extract a mixed single-byte and double-byte string starting at the second byte for six consecutive bytes on the IBM PC, you can do the following:
| Function |
Intermediate Result |
Final Result |
|
Extract( "ssd1d2d3s", 2, 6 )
where
- s is a single-byte character
- d1 to d3 are three double-byte characters
|
sd1d2 |
sd1d2d sd1d2_ or
sd1d2\0 or
sd1d2 if the resultant string length is not fixed to 6 bytes
where
- _ is the single-byte SPACE character
- \0 is the single-byte NULL character
|
To aid the developer in manipulating MBCS strings, X/Open and ISO/IEC have defined a series of C runtime library string manipulation functions that process wchar_t strings, as opposed to the regular C string functions that process char (or byte) strings. |