IT17612: MQ-JMS: An unexpected byte-order-mark character is visible in messages decoded from CCSID 17584
Closed as program error.
A WebSphere MQ classes for JMS V220.127.116.11 application consumes a message from a queue, which had been generated and put to the queue by a Siebel application. The message is put to the queue with the following declared character encoding configuration: MQMD Format: MQSTR (MQFMT_STRING) MQMD CodedCharSetId: 17584 MQMD Encoding: 564 (0x222) The body of the message consists of XML character data. When the message is consumed by the receiving application, and its character content is passed to an XML parser, the XML parser throws a parsing error. Previously, the application had been using the WebSphere MQ classes for JMS V18.104.22.168 where the problem was not seen, and the XML parser was able to process the message successfully. Examining the byte sequence at the start of the message body on the queue before being consumed by the JMS application, the bytes of the message body were of the form: 0 1 2 3 4 5 6 7 8 9 A B C D E F fffe3c00 3f007800 6d006c00 20007600 : ..<.?.x.m.l. .v. 65007200 73006900 6f006e00 3d002200 : e.r.s.i.o.n.=.". 31002e00 30002200 20006500 6e006300 : 1...0.". .e.n.c. 6f006400 69006e00 67003d00 22005500 : o.d.i.n.g.=.".U.
Configure the message producing application to generate a message body which is encoded in an alternative character encoding scheme, such as: UTF-8 (CCSID 1208)
**************************************************************** USERS AFFECTED: This issue affects users of the IBM MQ classes for JMS who have applications that are consuming messages where the message body is declared to be of type MQSTR, with the character encoding declaration: CCSID: 1200, 13488, 17584 Encoding: 564 (0x222) where the message body contains a byte-order-mark at the start of the data. Platforms affected: MultiPlatform **************************************************************** PROBLEM DESCRIPTION: With the code change associated with MQ APAR IV40180: http://www.ibm.com/support/docview.wss?uid=swg1IV40180 when an IBM MQ classes for JMS application consumes a message which is declared to be character encoded using CCSID 1200, 13488 or 17584, and the Encoding field is declared to have little-endian integer encoding (0x222), the IBM MQ classes for JMS map this character encoding scheme to the Java Charset named: UTF-16LE This differs to the currently defined IBM global standards (external to IBM MQ), where these CCSID values are all declared to be big-endian encoded, as per the IBM Globalization documentation: 1200: https://www.ibm.com/software/globalization/ccsid/ccsid13488.html Name: "UTF-16 BE with IBM PUA" "Data is big endian order" 13488: https://www.ibm.com/software/globalization/ccsid/ccsid13488.html Name: "Unicode 2.0, UTF-16 BE with IBM PUA" "Data is big endian order" 17584: https://www.ibm.com/software/globalization/ccsid/ccsid17584.html Name: "Unicode 3.0, UTF-16 BE with IBM PUA" "Data is big endian order" It was observed that when viewing the message on the queue prior to consumption by the JMS application, this bytes of this particular message's body also started with a byte-order-mark: '0xFF 0xFE' The Java Charset 'UTF-16LE' does not permit a byte-order-mark character to be present in the document, which results in this message's byte-order-mark being interpreted as a visible character at the start of the message document which was added to the "java.lang.String" object returned to the application as a result of the JMS method call: javax.jms.TextMessage.getText() This in turn resulted in the application's XML parser failing to correctly parse the XML document. Prior to the MQ APAR IV40180, a message's character data declared to be encoded using CCSID 17584 (with message Encoding value 0x222) would be decoded using the Java Charset name: 'UnicodeLittle' which permitted the byte-order-mark to be present in the bytes of the message body. The issue with this Java Charset is that there was no mapping present in the IBM MQ classes for Java/JMS to map it back to an IBM CCSID, which meant that while messages could be received and decoded using this Java Charset, those same messages could then not be sent back to the queue manager using the IBM MQ classes for JMS API. The code change associated with APAR IV40180 was included in the MQ versions: 22.214.171.124 126.96.36.199 188.8.131.52 resulting in the observed change of behaviour going from any of the IBM MQ classes for JMS versions prior to the above fixpack level. By mapping CCSID 17584 to "UTF-16LE" as APAR IV40180 did, a byte-order-mark present in the message on the queue would be interpreted as a printable character into the Java String object, which is incorrect, although it should be noted that CCSID 17584 is currently officially declared as always being big-endian ordered without a byte-order-mark. In this same scenario, when the JVM system property was defined: -Dcom.ibm.mq.cfg.CCSID.MapUtf16ByteOrderByCCSID=YES then all the byte ordering was reversed, resulting in corrupted character data as the IBM MQ classes for JMS mapped CCSID 17584 to the encoding scheme CCSID 1200, resulting in the use of the big-endian Java Charset "UTF-16".
The default encoding mapping for CCSIDs: 1200 13488 17584 where the message's integer "Encoding" value is defined to use little-endian encoding (0x222), has now been mapped to the Java Charset named: x-UTF16LE-BOM In addition, due to the complexity of the use of CCSID 1200/13488/17584 with IBM MQ, a new property has been defined which controls which Java Charset the data will be decoded from the bytes of the message on the queue, irrespective of the integer encoding value is specified on the message. This property has the name: com.ibm.mq.cfg.CCSID.MapCcsid1200ToSpecificCharset and can be set as a JVM argument. For example, if the IBM MQ classes for JMS are to be configured to interpret a CCSID 1200 message's bytes using the Java 'UnicodeLittle' encoding, you would use the command line JVM argument: -Dcom.ibm.mq.cfg.CCSID.MapCcsid1200ToSpecificCharset=UnicodeLitt le Note that this property has no effect when sending a message from the IBM MQ classes for JMS back to MQ, so care is needed when using it. If you use this property, and specify a Java Charset name which your running JVM recognises but is not one which the IBM MQ classes for JMS recognise, your application will be able to receive the message, but not send it back to MQ, as the IBM MQ classes for JMS will not be able to map the message's declared "JMS_IBM_Character_Set" property back into an CCSID value. --------------------------------------------------------------- The fix is targeted for delivery in the following PTFs: Version Maintenance Level v7.5 184.108.40.206 v8.0 220.127.116.11 v9.0 CD 9.0.5 v9.0 LTS 18.104.22.168 The latest available maintenance can be obtained from 'WebSphere MQ Recommended Fixes' http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037 If the maintenance level is not yet available information on its planned availability can be found in 'WebSphere MQ Planned Maintenance Release Dates' http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309 ---------------------------------------------------------------
Reported component name
WMQ BASE MULTIP
Reported component ID
NoSpecatt / Xsystem
Last modified date
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fixed component name
WMQ BASE MULTIP
Fixed component ID
Applicable component levels
Translate this page: