Skip to main content

Software  > Globalization > Guidelines overview > Introducing Asian ideographic scripts > 

Globalize your On Demand Business

MBCS and SBCS Coexistence | Recognizing Multibyte Characters | Manipulating MBCS Data | Converting Multibyte Characters | Buffer Space Considerations | Switching Character Interpretation | Adding New Multibyte Characters
Manipulating MBCS data stream

Some string manipulation functions operate in terms of bytes instead of characters. When we treat characters as bytes, it is easy to inadvertently truncate a non-single-byte character, split a non-single-byte character into its individual bytes, or lose the other half of a control code pair such as the Shift In control code.

Guideline G3


Never split a multibyte character into bytes.


There are several ways to compensate for the resultant string to ensure the integrity of every character. If the string length is fixed, you can pad neutral single-byte characters such as SPACE or NULL to the string. Otherwise, you can shorten the resultant string.

Example: To extract a mixed single-byte and double-byte string starting at the second byte for six consecutive bytes on the IBM mainframe host computer, you can do the following:

Function Intermediate Result Final Result

Extract( "ssSOd1d2d3SIs", 2, 6 )

where

  • s is a single-byte character
  • d1 to d3 are three double-byte characters
  • SO is the Shift Out control character
  • SI is the Shift In control character

sSOd1d2 sSOd1SI_ or
sSOd1SI\0 or
sSOd1SI if the resultant string length is not fixed to 6 bytes

where

  • _ is the single-byte SPACE character
  • \0 is the single-byte NULL character


Example: To extract a mixed single-byte and double-byte string starting at the second byte for six consecutive bytes on the IBM PC, you can do the following:

Function Intermediate Result Final Result

Extract( "ssd1d2d3s", 2, 6 )

where

  • s is a single-byte character
  • d1 to d3 are three double-byte characters

sd1d2 sd1d2d sd1d2_ or
sd1d2\0 or
sd1d2 if the resultant string length is not fixed to 6 bytes

where

  • _ is the single-byte SPACE character
  • \0 is the single-byte NULL character


To aid the developer in manipulating MBCS strings, X/Open and ISO/IEC have defined a series of C runtime library string manipulation functions that process wchar_t strings, as opposed to the regular C string functions that process char (or byte) strings.


We're here to help
Easy ways to get the answers you need.
E-mail IBM