Translate Multiple Bytes (XLATEMB)


Op Code (Hex) Operand 1


1071 Translation template


Operand 1: Space pointer.

Bound program access

Built-in number for XLATEMB is 390. XLATEMB ( translation_template : address )

Description

The source data string specified in the operand 1 translation template is converted starting with the left-most input byte using the function byte, control map 1, control map 2, and the verification map. The converted data string is returned in the receiver space specified in the operand 1 translation template.

Terminology

ASCII

Abbreviation for American National Standard Code for Information Interchange.

bi-di (bidirectional language)

The ability to write and read a language in two directions, such as from left to right and from right to left.

Control map

A special layout of bytes used to control data conversion. The different types of control maps will be discussed later in this document.

Code page

A collection of characters assigned to code points.

Code points

A unique bit-pattern assigned to each graphic character, to be used by the computer when entering, storing, viewing, printing, or exchanging information.

Double Byte Character Set (DBCS)

A set of characters in which each character is represented by a 2-byte code.

Endian

The order of the bytes in memory. On big endian systems the most significant value is stored in the lowest address. On little endian systems the least significant value is stored in the lowest address. For example, take the integer value 13488. In big endian it is stored as hex '34B0' and on little endian it would be stored as hex 'B034'. The machine by default is big endian.

EBCDIC

Abbreviation for extended binary coded decimal interchange code.

Graphic

Term used to designate pure DBCS data.

ISO/IEC 10646

The international standard used to represent most of the world's written languages by assigning multiple bytes for each character. This standard was written by the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC).

Mixed data

Data that contains single-byte and double-byte encoding.

Well formed mixed data

Mixed data where DBCS data is bracketed by shift-out (SO) and shift-in (SI) controls.

Octet

An ordered sequence of eight bits, considered as a unit.

Single Byte Character Set (SBCS)

A set of characters in which each character is represented by a 1-byte code.

String type

Used to determine the orientation and shaping of bi-di characters. The string types are defined in Table 7.

Substitution value

A single-byte or multiple-byte code to be output from a conversion when the input character is not found in a ward control block or ward.

UCS-2 Level 1

Defines the form and level of UCS. UCS-2 is a 16-bit form of UCS. Level 1 is an implementation level that does not support combining of characters. Every UCS-2 Level 1 character must be made up of only 2 bytes.

Unicode case level

The Unicode case level is the version of the Unicode standard used as the base for the casing function.

UTF-16

Defines a form and level of UCS. UTF-16 allows access to 63K characters as single UCS 16-bit units. It can access an additional 1 million characters by a mechanism known as surrogate pairs. Two ranges of UCS code values are reserved for the high (first) and low (second) values of these pairs. The high range is from hex D800 to hex DBFF and the low range is from hex DC00 to hex DFFF. A properly formed surrogate requires a high range code value followed by a low range value to form a valid character.

UTF-8

UTF-8 is the Unicode Transformation Format that serializes a Unicode code point as a sequence of one to four bytes. These sequences are mathematically equivalent to the set of UTF-16 characters.

Universal Multiple-Octet Coded Character Set (UCS)

Character set defined by IS0/IEC standard 10646.

Ward

A set of 256 single-byte or multiple-byte codes where all of the codes share a common first hex input byte when converting from a 2-byte code.

Ward control block

A set of 256 2-byte values within a control map. The 2-byte values provide the offsets from the start of the control map space to the beginning of all of the wards within the control map.

Operand 1

 

Operand 1 is a space pointer to a 128-byte translation template aligned on a 16-byte boundary. If the translation template is not aligned on a 16-byte boundary, boundary alignment  (hex 0602) exception is signaled.

Offset
Dec Hex
Field Name
Data Type and Length
0 0
Function
Bin(2)
2 2
Control flags
Char(2)
2 2
Control map type
Bit 0



0 = Control map type D not supplied.
1 = Control map type D supplied.


2 2
Substitution check
Bit 1



0 = Do not check for substitution.
1 = Check for substitution on conversion from UCS.


2 2
Override default multiple-byte substitution value
Bit 2



0 = Use the default UCS-2 Level 1 multiple-byte substitution value.
1 = Use the specified multiple-byte substitution value.


2 2
Ward transparency
Bit 3



0 = Use the multiple-byte substitution value when converting characters in an empty ward.
1 = Do not convert characters in an empty ward.


2 2
Well formed mixed data output
Bit 4



0 = Do not ensure well formed mixed data on output
1 = Ensure well formed mixed data on output


2 2
Caching requested
Bit 5



0 = Do not request caching for function hex 0100 requests
1 = Do request caching for function hex 0100 requests


2 2
Endian mode
Bit 6



0 = Handle the input and output of Unicode data as big endian
1 = Handle the input and output of Unicode data as little endian


2 2
UTF-16 casing map
Bit 7



0 = Use UCS-2 Level 1 maps for casing requests
1 = Use UTF-16 maps for casing requests


2 2 Unicode bi-di formatting code Bit 8
0 = Do not process Unicode bi-di formatting codes
1 = Process Unicode bi-di formatting codes


2 2 Reserved (binary 0) Bits 9-15 +
4 4
Source length
Bin(4)
8 8
Receiver buffer length
Bin(4)
12 C
Receiver converted data length
Bin(4)
16 10
Source range
Char(4)
16 10
Range 1 lower limit
Bits 0-7
16 10
Range 1 upper limit
Bits 8-15
16 10
Range 2 lower limit
Bits 16-23
16 10
Range 2 upper limit
Bits 24-31
20 14
Single-byte substitution value
Char(1)
21 15
Multiple-byte substitution value
Char(2)
23 17 Reserved (binary 0) Char(35) +
58 3A Unicode case level version UBin(2)
60 3C
Source string type
Bin(2)
62 3E
Receiver string type
Bin(2)
64 40
Source
Space pointer
80 50
Receiver
Space pointer
96 60
Verification pointer
Space pointer
112 70
Control map
Space pointer
128 80
--- End ---

Note:Fields annotated with a plus sign (+) are reserved fields. A reserved field value of non-zero results in the signaling of the template value invalid  (hex 3801) exception.

Translation Template Field Descriptions

Function

The function selected determines the type of conversion to be performed. Table 1 outlines the types of conversions that may be performed and the operands required for each function. The Table 1 columns are defined as follows:

  • Function - The function selected.

  • Control map type - The type of control map required as input for the given function.

  • Verification map allowed - A verification map is allowed for the given function. A verification map is never required.

  • Source data type - The source data type required for the given function.

  • Receiver data type - The receiver data type returned for the given function.

  • Estimated required buffer size - Value used to determine the actual required receiver buffer length. Multiply the estimated required buffer size value by the source length to get the required receiver buffer length. If this value is less than the minimum buffer size, use the minimum buffer size value for the receiver buffer length.

Table 1. XLATEMB supported functions

Function (hex) Control map type Verification map allowed Source data type Receiver data type Estimated required buffer size
0001 A or D No SBCS UCS-2/UTF-16 2
0002 B or D Yes UCS-2/UTF-16 SBCS .5
0003 C or D No Graphic UCS-2/UTF-16 1
0004 C or D Yes UCS-2/UTF-16 Graphic 1
0005 C or D No Mixed EBCDIC UCS-2/UTF-16 2
0006 C or D Yes UCS-2/UTF-16 Mixed EBCDIC 2
0007 C No Mixed ASCII UCS-2/UTF-16 2
0008 C Yes UCS-2/UTF-16 Mixed ASCII 1
0009 none No UCS-2/UTF-16 Uppercase UCS-2/UTF-16 1
000A none No UCS-2/UTF-16 Lowercase UCS-2/UTF-16 1
000B D No GB 18030 UTF-16 2
000C D Yes UTF-16 GB 18030 2
0033 A or D No SBCS UTF-8 3
0034 B or D No UTF-8 SBCS 1
0035 C or D No Graphic UTF-8 1.5
0036 C or D No UTF-8 Graphic 2
0037 C or D No Mixed EBCDIC UTF-8 2
0038 C or D No UTF-8 Mixed EBCDIC 2
003B none No UTF-8 Uppercase UTF-8 1
003C none No UTF-8 Lowercase UTF-8 1
00FE none No UTF-8 UCS-2/UTF-16 2
00FF none No UCS-2/UTF-16 UTF-8 1.5
0100 E No UCS-2/UTF-16 Weighted UCS-2/UTF-16 1
Note: Use of a function value not shown in Table 1 results in the signaling of the template value invalid  (hex 3801) exception.

Control flags

  • Control map type: Determines which control map type will be supplied for the specified function. This field is verified against the function specified when a type D control map is specified. A control map type of D specified with an incorrect function results in the signaling of the template value invalid  (hex 3801) exception. Refer to Table 1 for details on which functions require this flag to be set.

  • Substitution check: Check for substitution on conversion from UCS-2 Level 1 or UTF-16 or UTF-8 data. Substitution check is only supported for functions 0002, 0004, 0006, 0008, 000C, 0034, 0036, and 0038.

    When a substitution character is encountered, a substitution character used  (hex 0C20) exception is signaled at instruction completion. Complete results are placed in the receiver and receiver converted data length fields. The number of substitutions in the receiver data will be stored in the number of substitutions field of the exception data for the substitution character used  (hex 0C20) exception.

  • Override default multiple-byte substitution value: Determines which multiple-byte substitution value will be placed into the receiver space when using a type C or type D control map and substitution is required on conversion to UCS-2 Level 1 data. Override default multiple-byte substitution value is only supported for functions 0003, 0005, and 0007.

    When substitution is required and the function is 0003, 0005, or 0007, the multiple-byte substitution value will be one of the following:

    • If the override default multiple-byte substitution value is 0, the default UCS-2 Level 1 substitution value of hex FFFD will be used.

    • If the override default multiple-byte substitution value is 1, the multiple-byte substitution value specified in the template will be used.
    Note: The override default multiple-byte substitution value field must be set to 0 for functions 0001, 0002, 0004, 0006, 0008, 0009, 000A, 000B, 000C, 0033, 0034, 0035, 0036, 0037, 0038, 003B, 003C, 00FE, 00FF and 0100 or a template value invalid  (hex 3801) exception will be signaled.

  • Ward transparency: Determines whether source characters in an empty ward are converted using a multiple-byte substitution value, or moved to the receiver space transparently with no conversion taking place. Ward transparency is only supported for functions 0003, 0004, 0005, and 0007. If the ward transparency field is binary 1, then the multiple-byte substitution value is not used and the override default multiple-byte substitution value field is ignored.
    Note: The ward transparency field must be set to binary 0 for functions 0001, 0002, 0006, 0008, 0009, 000A, 000B, 000C,0033, 0034, 0035, 0036, 0037, 0038, 003B, 003C, 00FE, 00FF and 0100 or a template value invalid  (hex 3801) exception will be signaled. When using the ward transparency feature with functions 0005 and 0007, the first entry in the control map ward control block must be non-zero, or a template value invalid  (hex 3801) exception will be signaled.

  • Well formed mixed data output: Determines whether a DBCS character in the last two bytes of the receiver space (as defined by receiver buffer size) should be replaced by a shift-in control.
    Note: The well formed mixed data output field must be set to 0 for function 0100 or a template value invalid  (hex 3801) exception will be signaled.
    Note: This control is only applied where data truncation is necessary. Mixed data output will always be well formed if the receiver space is large enough.

  • Caching requested: Requests caching for the internal data structures used for a type E control map. The cache descriptor field in the control map must not be modified.

  • endian mode: Requests that input and output of Unicode data be handled either as little endian or big endian.
    Note: The endian mode bit needs to be set to binary 0 for functions 0033, 0034, 0035, 0036, 0037, 0038, 003B, 003C, and 0100 or a template value invalid  (hex 3801) exception will be signaled.

  • UTF-16 casing map: Requests that the UTF-16 casing map be used for casing functions.
    Note: The UTF-16 casing map bit needs to be set to binary 0 for functions 0001, 0002, 0003, 0004, 0005, 0006, 0007, 0008, 000B, 000C, 0033, 0034, 0035, 0036, 0037, 0038, 00FE, 00FF, and 0100 or a template value invalid  (hex 3801) exception will be signaled.

  • Unicode bi-di formatting code: Requests that Unicode bi-di formatting codes be processed. This only applies to type D maps that have string types defined; otherwise it is ignored. For functions 0001 and 0033, bi-di formatting codes will be inserted. For functions 0002 and 0034, bi-di formatting codes will be removed.
    Note: The Unicode bi-di formatting code bit must be set to binary 0 for functions 0003, 0004, 0005, 0006, 0007, 0008, 0009, 000B, 000C, 0035, 0036, 0037, 0038, 00FE, 00FF, and 0100 or a template value invalid  (hex 3801) exception will be signaled.

Source length

The length of the source data contained in the space addressed by the source space pointer. A length value of less than 1 results in the signaling of the template value invalid  (hex 3801) exception.

Receiver buffer length

The length of the receiver space pointed to by the receiver space pointer. A length value of less than 1 results in the signaling of the template value invalid  (hex 3801) exception.

Receiver converted data length

The length of the data placed in the receiver space after conversion. This field is set by the machine and will always be less than or equal to the value specified for receiver buffer length.

Source range

The range of the double-byte content of the mixed ASCII source input data. Source range is only used with function 0007. The source range field is divided into 2 ranges, range 1 and range 2. Each range has a 1 byte lower and 1 byte upper limit. Some actual working examples of source ranges are defined below:


Source Ranges Supported Language
Hex 819FE0FC Japanese
Hex 81BF0000 Korean
Hex 81FC0000 Simplified Chinese
Hex 81FC0000 Traditional Chinese
Hex 8FFE0000 Republic of Korea National Standard
Note: A template value invalid  (hex 3801) exception will be signaled if one of the following occurs:

  • If the function is 0007 and range 1 is set to nulls.

  • If the function is 0007 and the upper limit is less than the lower limit for either range 1 or range 2.
If range 2 is nulls then it will not be used and no exception will be signaled.
Note: The source range field must be set to binary 0 for any function except function 0007 or a template value invalid  (hex 3801) exception will be signaled.

Single-byte substitution value

A single-byte value to be output from a conversion when the following occurs:

  • A type C or type D control map ward control block entry is hex zeros and single-byte data is being processed.

  • A type B control map ward control block entry is hex zeros.
Note: The single-byte substitution value must be set to hex 00 for functions 0001, 0003, 0004, 0005, 0007, 0009, 000A, 000B, 000C, 0033, 0035, 0036, 0037, 003B, 003C, 00FE, 00FF and 0100 or a template value invalid  (hex 3801) exception will be signaled.

Multiple-byte substitution value

A 2-byte value to be output from a conversion when a type C or type D control map ward control block entry is hex zeros and multiple-byte data is being processed. The multiple-byte substitution value is ignored if the ward transparency feature is being used.

Note: The multiple-byte substitution value must be set to hex 0000 for functions 0001, 0002, 0009, 000A, 000B, 000C, 0033, 0034, 0035, 0037, 003B, 003C, 00FE, 00FF and 0100 or a template value invalid  (hex 3801) exception will be signaled. If the function is 0003, 0005, or 0007, and the override default multiple-byte substitution value field is 0, then the multiple-byte substitution value must be set to hex 0000 or a template value invalid  (hex 3801) exception will be signaled.

Unicode case level version

Requests a specific Unicode case level version be used for the casing function. Allowable values are Hex 0000, Hex 0200 and Hex 0400.

  • Hex 0000, 0200 are based on the Unicode standard 2.0

  • Hex 0400 is based on the Unicode standard 4.0
Note: The Unicode case level version must be set to binary 0 for functions 0001, 0002, 0003, 0004, 0005, 0006, 0007, 0008, 000B, 000C, 0033, 0034, 0035, 0036, 0037, 0038, 00FE, 00FF, and 0100 or a template value invalid (hex(3801) exception will be signaled.

Source string type

Overrides the default string type of the source field. This value only applies to type D maps that have string types defined, it is otherwise ignored.

Note: The source string type value must be set to hex 0000 for functions 0003, 0004, 0005, 0006, 0007, 0008, 0009, 000A, 000B, 000C, 0035, 0036, 0037, 0038, 003B, 003C, 00FE, 00FF and 0100 or a template value invalid  (hex 3801) exception will be signaled.

Receiver string type

Overrides the default string type of the receiver field. This value only applies to type D maps that have string types defined, it is otherwise ignored.

Note: The receiver string type value must be set to hex 0000 for functions 0003, 0004, 0005, 0006, 0007, 0008, 0009, 000A, 000B, 000C, 0035, 0036, 0037, 0038, 003B, 003C, 00FE, 00FF and 0100 or a template value invalid  (hex 3801) exception will be signaled.

Source

A space pointer to the source data buffer. The number of bytes available is specified by the source length field.

Receiver

A space pointer to the receiver data buffer. The number of bytes available is specified by the receiver buffer length field. If an error occurs during conversion this buffer will contain the data converted up to the point of the error. The length of the data converted is stored in the receiver converted data length field.

Note: Undefined results can occur if the storage locations specified by source and receiver overlap.

Verification pointer (optional)

A space pointer to a verification map to be used to verify UCS-2 Level 1 source data. The verification map has the following format:

Offset
Dec Hex
Field Name
Data Type and Length
0 0
Map size
UBin(2)
2 2
Verification map entry
Char(2)
4 4
--- End ---

The map size is a hex value that indicates the number of 2-byte verification map entries that exist in the map.

Each verification map entry contains one UCS-2 Level 1 code.

Note:The number of 2-byte verification map entries is determined by the number specified in the map size field.
If a verification map is specified, it is used to verify that the UCS-2 Level 1 data in the source input is correct. The verification map contains a list of valid UCS-2 Level 1 codes. The map values must be encoded in UCS-2 Level 1 and must be sorted in ascending numerical order. Failure to sort the verification map will result in unpredictable results. Refer to Table 1 for specific function codes which support use of the verification map. If any UCS-2 Level 1 code is not found in the verification map during the conversion, a source verification error  (hex 0C21) exception is signaled. If unused, the verification pointer must be a null pointer value. This ensures no verification takes place.
Note: The verification pointer field must be set to a null pointer value for functions 0033, 0034, 0035, 0036, 0037, 0038, 003B, 003C, 00FE, 00FF and 0100 or a template value invalid  (hex 3801) exception will be signaled.

The following is an example of a verification map:

0011009A0100010101020103010401050106010701080109010A010B010C010D03B103B2
Table 2 shows the layout of the example verification map shown above with offsets and entry number included for clarity. The first 2-byte value, at offset hex 0000, indicates the number of 2-byte UCS-2 Level 1 codes in the remainder of the map. In this example, the first value in the map is hex 0011 (decimal 17), indicating that there are 17 2-byte codes in the remainder of the map.

Table 2. Verification map layout

Offset Entry Number Verification Map Value
0000
0011 ( Number of entries in verification map. )
0002 1 009A
0004 2 0100
0006 3 0101
0008 4 0102
000A 5 0103
000C 6 0104
000E 7 0105
0010 8 0106
0012 9 0107
0014 10 0108
0016 11 0109
0018 12 010A
001A 13 010B
001C 14 010C
001E 15 010D
0020 16 03B1
0022 17 03B2
Note: Offset values are from the start of the verification map.

Control map

A space pointer to a control map to be used in the conversion of the source data. Refer to Table 1 for information on types of maps required for the various functions. If unused, the control map pointer must be a null pointer value.

Control map types

The following list explains the different types of control maps:

Type A -- Used to map SBCS data to UCS-2 Level 1 multiple-byte data. The type A control map has the following format:

Offset
Dec Hex
Field Name
Data Type and Length
0 0
Type A control map
[256] Char(2)



A type A control map consists of 256 2-byte codes.



0 0
Type A map entries
Char(2)



The type A control map entries are indexed by the hex input byte. Hex input values can range from 00 to FF for a total of 256 2-byte values. All entries are required to be fully populated for the index range of hex 00 to hex FF. To ensure proper conversion, unused entries should be set to some value.



512 200
--- End ---

Note:The type A control map entry field is repeated 256 times to give the byte total of 512.

The following is a partial example of a type A control map:

0100010101020103010401050106010701080109010A010B010C...01FE01FF
Table 3 shows a partial layout of the example type A control map shown above with hex input values and offsets included for clarity. To find the result with a 3-byte input value of hex 0C0805, do the following:

  1. Use the first input byte, hex 0C, to index into the map.

  2. At offset hex 0018, the control map value is hex 010C.

  3. Now index into the space, using the second input byte of hex 08.

  4. At offset hex 0010, the control map value is hex 0108.

  5. Now index into the space, using the third input byte of hex 05.

  6. At offset hex 000A, the control map value is hex 0105.

  7. The result placed in the receiver data buffer is hex 010C01080105.

Table 3. Type A map layout

Hex Input Value Offset Control Map Value
00 0000 0100
01 0002 0101
02 0004 0102
03 0006 0103
04 0008 0104
05 000A 0105
06 000C 0106
07 000E 0107
08 0010 0108
09 0012 0109
0A 0014 010A
0B 0016 010B
0C 0018 010C
· · ·
FE 01FC 01FE
FF 01FE 01FF
Note: Offset values are from the start of the type A control map space. The output value is the actual map value at the specified offset.

Type B -- Used to map UCS-2 Level 1 multiple-byte data to SBCS data. The type B control map has the following format:

Offset
Dec Hex
Field Name
Data Type and Length
0 0
Ward control block
[256] UBin(2)



A type B control map ward control block consists of 256 2-byte offsets that are indexed by the first hex input byte. This block defines the offsets into the type B control map for each of the wards defined in the type B control map. The ward control block section of the type B control map begins at offset hex 0000. The first byte of the 2-byte hex input data is used to index into the ward control block section of the type B control map.



0 0
Ward control block entries
UBin(2)



The ward control block entry is a 2-byte offset into the type B control map to the beginning of one of the ward detail maps. All ward control block entries are required to be fully populated for a range of hex 00 to hex FF for a total of 256 2-byte entries. Unused control entries should be set to hex zeros. There is no ward associated with control entries of all zeros. If zeros are encountered, the single byte substitution value will be output, the next hex input byte is skipped, and the conversion will continue with the next input byte.



512 200
Ward details
[256] Char(1)



A type B control map ward consists of 256 1-byte codes. A ward is addressed by the ward control block.



512 200
Ward detail entries
Char(1)



The ward detail entries are indexed by the second hex input byte. Hex input values range from 00 to FF for a total of 256 single-byte values. All ward entries are required to be fully populated for the index range of hex 00 to hex FF. Unused entries should be set to a substitution value.



768 300
--- End ---

Table 4 shows a partial layout of a type B control map with hex input values and offsets included for clarity. To find the result with an input value of hex 03B3, do the following:

  1. Use the first input byte, hex 03, to index into the ward control block section.

  2. The corresponding ward control block entry value, hex 0300, is used as an offset to the start of the ward detail for ward 03.

  3. Using offset hex 0300 as a base, use the second input byte value, hex B3, to index into the ward detail for ward 03.

  4. The corresponding entry in the ward 03 detail is the single-byte control map value of hex 8C.

  5. The result placed in the receiver data buffer is hex 8C.

Table 4. Type B map layout

Hex Input Value Offset Control Map Value
Ward Control Block
00 0000 0200
01 0002 0000
02 0004 0000
03 0006 0300
04 0008 0000
05 000A 0000
· · ·
FE 01FC 0000
FF 01FE 0000
Ward Detail (for Ward 00)
00 0200 00
01 0201 01
02 0202 02
· · ·
99 0299 39
9A 029A 3A
9B 029B 3B
· · ·
FD 02FD 3F
FE 02FE 3F
FF 02FF 3F
Ward Detail: (for Ward 03)
00 0300 00
01 0301 01
02 0302 02
· · ·
B1 03B1 8A
B2 03B2 8B
B3 03B3 8C
B4 03B4 8D
B5 03B5 8E
· · ·
FE 03FE 3F
FF 03FF 3F
Note: Offset values are from the start of the type B control map space. The output value is the actual map value at the specified offset.

Type C -- The type C control map is used for conversion of the following:

  • UCS-2 Level 1 to graphic

  • Graphic to UCS-2 Level 1

  • UCS-2 Level 1 to mixed EBCDIC

  • Mixed EBCDIC to UCS-2 Level 1

  • UCS-2 Level 1 to mixed ASCII

  • Mixed ASCII to UCS-2 Level 1
The type C control map has the following format:
Offset
Dec Hex
Field Name
Data Type and Length
0 0
Ward control block
[256] UBin(2)



A type C control map ward control block consists of 256 2-byte offsets that are indexed by the first hex input byte. This block defines the offsets into the type C control map for each of the wards defined in the type C control map. The ward control block section of the type C control map begins at offset hex 0000. The first byte of the 2-byte hex input data is used to index into the ward control block section of the type C control map. When converting mixed data to UCS-2 Level 1 (functions 0005 and 0007), and the input data is single-byte, ward control block entry 00 is used for translation.



0 0
Ward control block entries
UBin(2)



The ward control block entry is a 2-byte offset into the type C control map to the beginning of one of the ward detail maps. All ward control block entries are required to be fully populated for a range of hex 00 to hex FF for a total of 256 2-byte entries. Unused control entries should be set to hex zeros. There is no ward associated with control entries of all zeros.

If zeros are encountered and single-byte data is being processed, the single-byte substitution value will be output and the conversion will continue with the next input byte.






If zeros are encountered and multiple-byte data is being processed, one of the following will occur:

  • If the ward transparency field is binary 0, then the appropriate multiple-byte substitution value will be output, the next input byte is skipped, and the conversion will continue with the next input byte.

  • If the ward transparency field is binary 1, then the two source characters will be transparently moved to the receiver buffer, with no conversion taking place. The conversion will continue with the next input byte.

If a control map size greater than 64K bytes is specified, offset values referencing the control map must be specified in multiples of 512 bytes. For example a value of hex 0002 would indicate an offset of 1024 bytes.



512 200
Ward details
[256] Char(2)



A type C control map ward consists of 256 2-byte codes. A ward is addressed by the ward control block.



512 200
Ward detail entries
Char(2)



The ward detail entries are indexed by the second hex input byte. Hex input values can range from 00 to FF for a total of 256 2-byte values. All ward entries are required to be fully populated for the index range of hex 00 to hex FF. Unused entries should be set to some substitution value.



1024 400
--- End ---

The following is a partial example of a type C control map with simple offset values since the map is less than 64K bytes in size:

000000000200040000000000...000000000000000100020003000400050006
0007...3F3F3F3F0200020102020203020402050206...3F3F3F3F

The following is a partial example of a type C control map with offset values that are multiples of 512 bytes since the map is greater than 64K bytes in size:

000000000001000200000000...000000000000000100020003000400050006
0007...3F3F3F3F0200020102020203020402050206...3F3F3F3F....
Table 5 shows these partial layouts of a type C control map overlayed on each other with hex input values and offsets included for clarity. The larger map is truncated to the size of the smaller map so they can be represented by the same table. To find the result with a graphic input value of hex 0207, perform the following:

  1. Use the first byte of the graphic input value, hex 02, to index into the ward control block section of either example type C map.

  2. The corresponding ward control block entry value, hex 0200 for the smaller map, is used as an offset to the start of the ward detail for ward 02, since the control map size is less than or equal to 64K. For control maps sizes greater than 64K the ward control block entry value would be 0001, indicating an offset of 512 bytes (offset of hex 0200), as shown in the example larger map.

  3. Using offset hex 0200 as a base (regardless of which size map provided it), the second byte of the input value, hex 07, is used to index into the ward detail for ward 02.

  4. The corresponding entry in the ward 02 detail is the double-byte control map value of hex 0007.

  5. The result placed in the receiver data buffer is this hex 0007.

Table 5. Type C map layout

Hex Input Value Offset Control Map Value
Ward Control Block
00 0000 0000 or 0000
01 0002 0000 or 0000
02 0004 0200 or 0001
03 0006 0400 or 0002
04 0008 0000 or 0000
05 000A 0000 or 0000
· · ·
FE 01FC 0000 or 0000
FF 01FE 0000 or 0000
Ward Detail: (for Ward 02)
00 0200 0000
01 0202 0001
02 0204 0002
03 0206 0003
04 0208 0004
05 020A 0005
06 020C 0006
07 020E 0007
· · ·
FE 03FC 3F3F
FF 03FE 3F3F
Ward Detail: (for Ward 03)
00 0400 0200
01 0402 0201
02 0404 0202
03 0406 0203
04 0408 0204
05 040A 0205
06 040C 0206
· · ·
FE 5FC 3F3F
FF 5FD 3F3F
Note: Offset values are from the start of the type C control map space. The output value is the actual map value at the specified offset.

Type D -- A 4-byte selection code that indicates which predefined control map is to be used. The selection code is pointed to by the control map space pointer. Type D maps are only supported for functions 0001, 0002, 0003, 0004, 0005, 0006, 000B, 000C, 0033, 0034, 0035, 0036, 0037, and 0038. If zeros are encountered in the ward control block and the function is 0006, the specified single-byte substitution value or multiple-byte substitution value is placed into the receiver buffer.

If zeros are encountered and the function is 0005, the value output will be one of the following:

  • If the ward transparency field is binary 0, then the appropriate multiple-byte substitution value will be output, the next input byte is skipped, and the conversion will continue with the next input byte.

  • If the ward transparency field is binary 1, then the two source characters will be transparently moved to the receiver buffer, with no conversion taking place. The conversion will continue with the next input byte.

The type D control map selection code has the following format:

Offset
Dec Hex
Field Name
Data Type and Length
0 0
Selection code
UBin(4)
4 4
--- End ---

The predefined control maps are defined in Table 6.

Table 6. Supported predefined control maps

Selection Code Language/Country Supported UCS2 CCSID Value Other CCSID Value
Hex 34B00025 USA/Canada 13488 37
Hex 34B00100 Netherlands 13488 256
Hex 34B00111 Germany F.R./Austria 13488 273
Hex 34B00115 Denmark, Norway 13488 277
Hex 34B00116 Finland, Sweden 13488 278
Hex 34B00118 Italy 13488 280
Hex 34B0011C Spain/Latin America 13488 284
Hex 34B0011D United Kingdom 13488 285
Hex 34B00122 Japanese Katakana 13488 290
Hex 34B00129 France 13488 297
Hex 34B0012C Japanese Latin pure 13488 300
Hex 34B001A4 Arabic (all presentation shapes) string type 4 13488 420
Hex 34B001A7 Greece 13488 423
Hex 34B001A8 Hebrew string type 4 13488 424
Hex 34B001F4 Latin 1 13488 500
Hex 34B00341 Korean extended 13488 833
Hex 34B00342 Korean pure 13488 834
Hex 34B00343 Traditional Chinese pure 13488 835
Hex 34B00344 Simplified Chinese extended 13488 836
Hex 34B00345 Simplified Chinese pure 13488 837
Hex 34B00346 Thai extended 13488 838
Hex 34B00366 Latin 2 Multilingual 13488 870
Hex 34B00367 Iceland 13488 871
Hex 34B0036B Greece 13488 875
Hex 34B00370 Cyrillic, Multilingual 13488 880
Hex 34B00389 Turkey Latin 3 Multilingual 13488 905
Hex 34B00396 Urdu 13488 918
Hex 34B0039C Latin 9 13488 924
Hex 34B003A2 Japanese Katakana mixed extended 13488 930
Hex 34B003A5 Korean mixed extended 13488 933
Hex 34B003A7 Simplified Chinese mixed extended 13488 935
Hex 34B003A9 Traditional Chinese mixed extended 13488 937
Hex 34B003AB Japanese Latin mixed extended 13488 939
Hex 34B00401 Cyrillic, Multilingual 13488 1025
Hex 34B00402 Turkey Latin 5 13488 1026
Hex 34B00403 Japanese Latin extended 13488 1027
Hex 34B00449 Farsi 13488 1097
Hex 34B00458 Baltic, Multilingual 13488 1112
Hex 34B00462 Estonian 13488 1122
Hex 34B00463 Cyrillic Ukraine 13488 1123
Hex 34B0046A Vietnamese 13488 1130
Hex 34B0046C Lao 13488 1132
Hex 34B00474 USA/Canada with Euro 13488 1140
Hex 34B00475 Germany F.R./Austria with Euro 13488 1141
Hex 34B00476 Denmark, Norway with Euro 13488 1142
Hex 34B00477 Finland, Sweden with Euro 13488 1143
Hex 34B00478 Italy with Euro 13488 1144
Hex 34B00479 Spain/Latin America with Euro 13488 1145
Hex 34B0047A United Kingdom with Euro 13488 1146
Hex 34B0047B France with Euro 13488 1147
Hex 34B0047C Common Europe with Euro 13488 1148
Hex 34B0047D Iceland with Euro 13488 1149
Hex 34B00481 Latin 2 Multilingual with Euro 13488 1153
Hex 34B00482 Cyrillic, Multilingual with Euro 13488 1154
Hex 34B00483 Turkey Latin 5 with Euro 13488 1155
Hex 34B00484 Baltic, Multilingual with Euro 13488 1156
Hex 34B00485 Estonia with Euro 13488 1157
Hex 34B00486 Cyrillic Ukraine with Euro 13488 1158
Hex 34B00488 Thai with Euro 13488 1160
Hex 34B0048C Vietnamese with Euro 13488 1164
Hex 34B00554 Korean mixed extended with Hangul 13488 1364
Hex 34B0056C Simplified Chinese for GBK mixed 13488 1388
Hex 34B00570 Simplified Chinese for GB18030 13488 1392
Hex 34B00577 Japanese Latin mixed with 4370 UDC and Euro 13488 1399
Hex 34B0112C Japanese pure 13488 4396
Hex 34B01342 Korean pure extended with Hangul 13488 4930
Hex 34B01345 Simplified Chinese for GBK pure 13488 4933
Hex 34B013A2 Japanese Katakana mixed extended 13488 5026
Hex 34B013AB Japanese Latin mixed extended 13488 5035
Hex 34B01403 Japanese Latin with Euro 13488 5123
Hex 34B021A4 Arabic (base shapes only) string type 5 13488 8612
Hex 34B02346 Thai extended 13488 9030
Hex 34B031A4 Arabic string type 7 13488 12708
Hex 34B03341 Korean extended 13488 13121
Hex 34B03344 Simplified Chinese for GBK extended 13488 13124
Hex 34B0412C Japanese Latin pure with 4370 UDC and Euro 13488 16684
Hex 34B07025 Traditional Chinese extended 13488 28709
Hex 34B0F303 Hebrew string type 5 13488 62211
Hex 34B0F310 Arabic (all presentation shapes) string type 6 13488 62224
Hex 34B0F31B Hebrew string type 6 13488 62235
Hex 34B0F325 Hebrew string type 10 13488 62245
Hex 0000012C Japanese Latin pure 61952 300
Hex 00000342 Korean pure 61952 834
Hex 00000343 Traditional Chinese pure 61952 835
Hex 00000345 Simplified Chinese pure 61952 837
Hex 000003A2 Japanese Katakana mixed extended 61952 930
Hex 000003A5 Korean mixed extended 61952 933
Hex 000003A7 Simplified Chinese mixed extended 61952 935
Hex 000003A9 Traditional Chinese mixed extended 61952 937
Hex 000003AB Japanese Latin mixed extended 61952 939
Hex 00000554 Korean mixed extended with Hangul 61952 1364
Hex 0000056C Simplified Chinese for GBK mixed 61952 1388
Hex 00000577 Japanese Latin mixed with 4370 UDC and Euro 61952 1399
Hex 00001342 Korean pure extended with Hangul 61952 4930
Hex 00001345 Simplified Chinese for GBK pure 61952 4933
Hex 000013A2 Japanese Katakana mixed extended 61952 5026
Hex 000013AB Japanese Latin mixed extended 61952 5035
Hex 0000412C Japanese Latin pure with 4370 UDC and Euro 61952 16684
Note:

Reference the International Application Development SC41-4603 for information on CCSIDs.

If the selection code field does not contain one of the supported values, a template value invalid  (hex 3801) exception is signaled.

The string types are defined in Table 7.

Table 7. String type definitions

String Type Text Type Numeric Shaping Orientation Text Shaping Symmetrical Swapping
4 Visual Passthrough LTR Shaped Off
5 Implicit Arabic LTR Unshaped On
6 Implicit Arabic RTL Unshaped On
7(*) Visual Passthrough Contextual * Unshaped-Ligatures Off
8 Visual Passthrough RTL Shaped Off
9 Visual Passthrough RTL Shaped On
10 Implicit
Contextual LTR
On
11 Implicit
Contextual RTL
On
12 Implicit Arabic RTL Shaped Off
Note:

(*) Field orientation is left to right (LTR), when first alphabetic character is a Latin one, and right to left (RTL) when it is a bi-di (RTL) character; characters are unshaped, but LamAlef ligatures are kept, and not broken into constituents.

Type E -- Used to map UCS-2 Level 1 multiple-byte data to its weights. The type E control map has two formats.

The first format, type E-1 has the following format. Type E-1 maps are only supported for function 0100

Offset
Dec Hex
Field Name
Data Type and Length
0 0
Cache descriptor
Char(128)



The type E-1 control map cache descriptor field is reserved for use by XLATEMB. Do not modify these bytes; they are controlled by XLATEMB.



128 80
Number of entries
UBin(4)



The type E-1 control map number of entries field contains the number of entries in the sort details array. This field must have a value between 1 and 65536 (this indicates that this a type E-1 control map). A value of 0 would indicate that this is a type E-2 control map.



132 84
Sort details
[256] Char(4)



The type E-1 control map sort details array consists of a series of 4-byte entries. Each entry is made up of a 2-byte UCS code point and a 2-byte weight for that code point. The MI user must ensure that the code points are in ascending hex order and that the weights start at 0000. Code points that are not included in the control map are weighted after the last specified code point.


132 84
UCS code point
UBin(2)
134 86
UCS code point weight
UBin(2)
1156 484
--- End ---

Table 8 shows a partial layout of a type E-1 control map with offsets and output weights included for clarity.

Table 8. Type E-1 map layout

Offset (Hex) Field Value (Hex)
Cache descriptor for the 1st 128 bytes
0000 000...000
Number of entries
0080 00000007
Sort details
Offset (Hex) UCS Code Point Output Weight
0084 0024 0001
0088 0041 0000
008C 0042 0003
0090 0043 0002
0094 0044 0002
0098 FFE1 0004
009C FFE2 0004

The second format, type E-2 has the following format:

Type E-2 -- has a 4-byte selection code that indicates which predefined control map is to be used. The selection code is pointed to by the control map space pointer. Type E-2 maps are only supported for function 0100


Offset
Dec Hex
Field Name
Data Type and Length
0 0
Cache descriptor
Char(128)



The type E-2 control map cache descriptor field is reserved for use by XLATEMB. Do not modify these bytes; they are controlled by XLATEMB.



128 80
E-2 map indicator
UBin(4)



The type E-2 control map E-2 map indicator field must contain zero to indicate that this is an E-2 type control map. A non-zero value would indicate that this is an E-1 type control map.



132 84
Selection code
Char(4)



The type E-2 control map selection code field specifies a 4-byte selection code of a predefined control map.



136 88
--- End ---

Table 9 shows the layout of the example type E-2 control map shown above with offsets included for clarity.

Table 9. Type E-2 map layout

Offset (Hex) Field value (Hex)
Cache descriptor for the 1st 128 bytes
0000 000...000
E-2 map indicator
0080 00000000
Selection code
0084 03A50000

The predefined control maps are defined in Table 10.

Table 10. Supported predefined control maps

Selection Code Language Supported
Hex 03A50000 Korean Unique Sequence
Hex 03A50001 Korean Shared Sequence
Hex 03A70000 Simplified Chinese Unique Sequence
Hex 03A70001 Simplified Chinese Shared Sequence
Hex 03A90000 Traditional Chinese Unique Sequence
Hex 03A90001 Traditional Chinese Shared Sequence
Hex 13A20000 Japanese Unique Sequence
Hex 13A20001 Japanese Shared Sequence

If the selection code field does not contain one of the supported values, a template value invalid  (hex 3801) exception is signaled.

Template Value Invalid exception reason codes

This instruction supports setting of the optional reason code field in the exception data which can be retrieved when the template value invalid  (hex 3801) exception is signaled. The template value invalid reason codes are defined as follows:


Reason Code Description
Hex 0001
Template value is not valid. The template field in error can be determined by using the offset, stored in the template offset information of the exception data for the template value invalid  (hex 3801) exception, to offset from the start of the operand 1 translation template to the start of the field in error.
Hex 0002
Unsupported function selected. No conversion will occur.
Hex 0004
The specified type D control map is not supported. No conversion will occur.
Hex 0005
The source range field was specified incorrectly for one of the following conditions:

  • Range 1 is set to nulls.

  • The upper limit is less than the lower limit for either range 1 or range 2.
Note: Reason code hex 0005 can only occur when function 0007 is selected.
Hex 0006
The specified type E control map is invalid. No conversion will occur.

XLATEMB Examples

Example 1: Convert hex 0B05 using XLATEMB function 01 (convert from SBCS to UCS-2 Level 1). This example uses the type A control map in Table 3. To find the result:

  1. Use the first byte of the input data, hex 0B, to index (2-bytes for each index value) into the type A map.

  2. At offset hex 0016, the corresponding UCS-2 Level 1 value is hex 010B.

  3. Use the second byte of the input data, hex 05, to index (2-bytes for each index value) into the type A map.

  4. At offset hex 000A, the corresponding UCS-2 Level 1 value hex 0105 is output.

  5. The instruction completes with a value of hex 010B0105 placed in the receiver and a hex 0004 will be placed in the receiver converted data length.

Example 2: Convert hex 03B1009A using function 0002 (convert from UCS-2 Level 1 to SBCS). This example uses the verification map in Table 2 and type B map in Table 4. To find the result:

  1. The first UCS-2 Level 1 input value, hex 03B1, is compared against the verification map. Since the value is found at offset hex 0020 in the verification map, processing will continue.

  2. Use the first byte of the input data, hex 03, to index into the type B control map ward control block starting at offset hex 0000.

  3. At offset hex 0006, the ward control block entry value is hex 0300.

  4. Use hex 0300 to offset from the start of the control map to the start of the ward detail for ward 03.

  5. Use the second input byte, hex B1, to index into the ward detail for ward 03.

  6. At offset hex 03B1, the corresponding SBCS value hex 8A is output.

  7. The second UCS-2 Level 1 input value, hex 009A, will be compared against the verification map. Since the value is found at offset hex 0002 in the verification map, processing will continue.

  8. Use the first byte of the input data, hex 00, to index into the type B control map ward control block, starting at offset hex 0000.

  9. At offset hex 0000, the ward control block entry value is hex 0200.

  10. Use hex 0200 to offset from the start of the control map to the start of the ward detail for ward 00.

  11. Use the second input byte, hex 9A, to index into the ward detail for ward 00.

  12. At offset hex 029A, the corresponding SBCS value hex 3A is output.

  13. The final output of hex 8A3A is placed in the receiver and a hex 0002 will be placed in the receiver converted data length.

Authorization Required

Lock Enforcement

Exceptions

06 Addressing

0A Authorization

0C Computation

10 Damage Encountered

1C Machine-Dependent

20 Machine Support

22 Object Access

24 Pointer Specification

32 Scalar Specification

38 Template Specification

44 Protection Violation