G3: Manipulating MBCS data
Manipulating MBCS data stream
Some string manipulation functions operate in terms of bytes instead of characters. When we treat characters as bytes, it is easy to inadvertently truncate a non-single-byte character, split a non-single-byte character into its individual bytes, or lose the other half of a control code pair such as the Shift In control code.
Guideline G3
There are several ways to compensate for the resultant string to ensure the integrity of every character. If the string length is fixed, you can pad neutral single-byte characters such as SPACE or NULL to the string. Otherwise, you can shorten the resultant string.
Example: To extract a mixed single-byte and double-byte string starting at the second byte for six consecutive bytes on the IBM mainframe host computer, you can do the following:
| Function | Intermediate result | Final result |
|---|---|---|
|
Extract( "ssSOd1d2d3SIs", 2, 6 ) where: - s is a single-byte character - d1 to d3 are three double-byte characters - SO is the Shift Out control character - SI is the Shift In control character |
sSOd1d2 |
sSOd1SI_ or sSOd1SI\0 or sSOd1SI if the resultant string length is not fixed to 6 bytes where: - _ is the single-byte SPACE character - \0 is the single-byte NULL character |
Example: To extract a mixed single-byte and double-byte string starting at the second byte for six consecutive bytes on the IBM PC, you can do the following:
| Function | Intermediate result | Final result |
|---|---|---|
|
Extract( "ssd1d2d3s", 2, 6 ) where: - s is a single-byte character - d1 to d3 are three double-byte characters |
sd1d2 |
sd1d2d sd1d2_ or sd1d2\0 or sd1d2 if the resultant string length is not fixed to 6 bytes where: - _ is the single-byte SPACE character - \0 is the single-byte NULL character |
To aid the developer in manipulating MBCS strings, X/Open and ISO/IEC have defined a series of C runtime library string manipulation functions that process wchar_t strings, as opposed to the regular C string functions that process char (or byte) strings.
Guidelines
- Guidelines quick reference
- A: User interface
- B: Writing for an international audience
- C: Respect for culture and conventions
- D: Product structure in a globalized environment
- E: Input and output interfaces
- F: Coded character sets
- G: Introducing Asian ideographic scripts
- H: Languages with a bidirectional script
- I: The cursive Arabic script