Guideline G: Introducing Asian ideographic
scripts

G5: Buffer space considerations



Buffer space considerations

When uploading an MBCS string from the IBM PC to the IBM Z/OS using the EBCDIC encoding scheme, the resultant string length may increase because of the insertion of the Shift Out and Shift In control codes. Conversely when downloading from Z/OS to the IBM PC, the resultant string length may decrease. If the target buffer is fixed in size and the resultant string overflows or underflows the buffer, the string must be truncated or padded in a safe manner according to Guideline G3 - Manipulating MBCS data stream.


Guideline G5


Allocate enough space to accommodate the resultant MBCS string after coded character set conversion.


Example: The table below shows the number of bytes needed for the different encodings of the 9-character data string 'ABC漢字厓123'.


Encoding for Japanese Length Hex representation
EBCDIC 14 C1 C2 C3 0E 4F58 48F2 5780 0F F1 F2 F3
PC (Shift-JIS) 12 41 42 43 8ABF 8E9A FA8D 31 32 33
EUC 13 41 42 43 B4C1 BBFA 8FB4C7 31 32 33
UTF-16 18 0041 0042 0043 6F22 5B57 5393 0031 0032 0033
UTF-8 15 41 42 43 E6BCA2 E5AD97 E58E93 31 32 33

Example: s represents a single-byte character and dd represents a double-byte character


IBM PC IBM EBCDIC
X'ssssdddddd' (10 bytes) X'ssssSOddddddSI' (12 bytes)

If the target string buffer is fixed at 10 bytes, then the resultant string will be truncated to X'ssssSOddddd'. The last byte is an invalid character at best, or misinterpreted as a single-byte character at worst. As per the discussion in Guideline G3 - Manipulating MBCS data stream, compensation is required by replacing the last byte with the Shift In control code.


IBM PC IBM EBCDIC
X'ssssdddddd' (10 bytes) X'ssssSOddddd' (10 bytes, before compensation)

X'ssssSOddddSI' (10 bytes, after compensation)