Support for UTF-16 in application data

CICS® web services support conversion of UTF-16 encoded application data into XML or JSON and also XML or JSON into UTF-16 encoded application data. Use UTF-16 when you need to store and process data in multiple languages.

CICS SOAP and JSON web services support conversion of UTF-16 encoded application data into XML or JSON and also XML or JSON into UTF-16 encoded application data. Unicode is a variable-width encoding scheme that enables systems to handle data efficiently.

UTF-16 is a variable width encoding for Unicode, where each character is represented by 2 or 4 bytes. CICS web services support CCSID 1200 for application data, which is UTF-16 BE (big endian) with IBM® Private Use Area. This behavior is consistent with UTF-16 support in all supported languages.

UTF-16 is supported at mapping level 4.0 and upwards. You can customize how application data is converted by using mapping settings in the assistants. For more information about XML mapping levels, see Mapping levels for the CICS assistants. For more information about JSON mapping levels, see Mapping levels for the CICS JSON assistants.

Note: UTF-16 requires more processing time and is less storage efficient than EBCDIC encodings. Furthermore, mixing encoding types incurs extra runtime processing.

Mapping UTF-16 from XML or JSON schema to language structures

Support for UTF-16 depends on how you create the web service. Mapping XML or JSON schema to language structures, also known as top-down mapping, has the following characteristics. If UTF-16 is enabled, all text fields are mapped to UTF-16 fields, whereas numeric display data types in COBOL are mapped as EBCDIC. To use UTF-16, set the CCSID parameter of DFHJS2LS, DFHSC2LS, or DFHWS2LS to 1200.

For example, if the following XML schema fragment were present in the WSDL:

<xsd:element name="myString" nillable="false">
          <xsd:simpleType>
             <xsd:restriction base="xsd:string">
                <xsd:maxLength value="20"/>
             </xsd:restriction>
          </xsd:simpleType>
        </xsd:element>

The DFHWS2LS assistant generates the following field in a COBOL language structure:

myString PIC N(20) USAGE NATIONAL
The CHAR-MULTIPLIER parameter of the web services assistants can be used to specify the length of a field the assistants generate.
CHAR-MULTIPLIER

When you use UTF-16, the only valid values for the CHAR-MULTIPLIER parameter are 2 or 4, where 2 is the default value.

CHAR-MULTIPLIER=2, where the schema describes a string of maxlength x, generates PIC N(x). Setting CHAR-MULTIPLIER=2 does not preclude the use of surrogate pairs in a UTF-16 string, but impacts the number of characters that fit in the field.

CHAR-MULTIPLIER=4 generates PIC N(2x). If CHAR-MULTIPLIER=4, the value at run time is padded if the string includes characters that can be expressed in a single encoding unit.

Mapping UTF-16 from language structures to XML or JSON schema

Mapping from a language structure to XML or JSON schema, also known as bottom-up mapping, is managed differently to top-down mapping. If a UTF-16 string is declared in the language structure, then the data is interpreted by CICS as UTF-16 encoded, otherwise, data is assumed to be in an EBCDIC encoding. The CCSID parameter for DFHLS2JS, DFHLS2SC, or DFHLS2WS indicates the encoding of any EBCDIC text within the application data; it must not be set to indicate UTF-16.

The data types that are interpreted as UTF-16 characters are as follows: PIC N (n) in COBOL, WIDECHAR(n) in PL/I, and char16_t[n] in C and C++.

The CHAR-USAGE parameter of the web services assistants can be used to specify data types.
CHAR-USAGE

In COBOL, the national data type, PIC N, can be used for UTF-16 or DBCS data. This setting is controlled by the NSYMBOL compiler option. You must set the CHAR-USAGE parameter on the assistant to the same value as the NSYMBOL compiler option to ensure that the data is handled appropriately. This is typically set to CHAR-USAGE=NATIONAL when you use UTF-16.

If you want to mix national data types that contain UTF-16 and DBCS data in the same copybook, you can use the USAGE NATIONAL or USAGE DISPLAY-1 qualifiers on individual fields.

Note: DFHLS2WS, DFHLS2SC, and DFHLS2JS do not the support the COBOL GROUP USAGE NATIONAL clause.