G5: Buffer space considerations
Buffer space considerations
When uploading an MBCS string from the IBM PC to the IBM Z/OS using the EBCDIC encoding scheme, the resultant string length may increase because of the insertion of the Shift Out and Shift In control codes. Conversely when downloading from Z/OS to the IBM PC, the resultant string length may decrease. If the target buffer is fixed in size and the resultant string overflows or underflows the buffer, the string must be truncated or padded in a safe manner according to Guideline G3 - Manipulating MBCS data stream.
Allocate enough space to accommodate the resultant MBCS string after coded character set conversion.
Example: The table below shows the number of bytes needed for the different encodings of the 9-character data string ''.
|Encoding for Japanese||Length||Hex representation|
|EBCDIC||14||C1 C2 C3 0E 4F58 48F2 5780 0F F1 F2 F3|
|PC (Shift-JIS)||12||41 42 43 8ABF 8E9A FA8D 31 32 33|
|EUC||13||41 42 43 B4C1 BBFA 8FB4C7 31 32 33|
|UTF-16||18||0041 0042 0043 6F22 5B57 5393 0031 0032 0033|
|UTF-8||15||41 42 43 E6BCA2 E5AD97 E58E93 31 32 33|
Example: s represents a single-byte character and dd represents a double-byte character
|IBM PC||IBM EBCDIC|
|X'ssssdddddd' (10 bytes)||X'ssssSOddddddSI' (12 bytes)|
If the target string buffer is fixed at 10 bytes, then the resultant string will be truncated to X'ssssSOddddd'. The last byte is an invalid character at best, or misinterpreted as a single-byte character at worst. As per the discussion in Guideline G3 - Manipulating MBCS data stream, compensation is required by replacing the last byte with the Shift In control code.
|IBM PC||IBM EBCDIC|
|X'ssssdddddd' (10 bytes)||X'ssssSOddddd' (10 bytes, before compensation)|
|X'ssssSOddddSI' (10 bytes, after compensation)|
Need assistance with your globalization questions?
- Guidelines quick reference
- A: User interface
- B: Writing for an international audience
- C: Respect for culture and conventions
- D: Product structure in a globalized environment
- E: Input and output interfaces
- F: Coded character sets
- G: Introducing Asian ideographic scripts
- H: Languages with a bidirectional script
- I: The cursive Arabic script