DB2 Version 9.7 for Linux, UNIX, and Windows

Thai and Unicode collation algorithm differences

The collation algorithm used in a Thai Industrial Standard (TIS) TIS620-1 (code page 874) Thai database with the NLSCHAR collation option is similar, but not identical to, the collation algorithm used in a Unicode database with the UCA500R1_LTH collation option.

The differences are as follows:

When sorting TIS620-1 data, each character only has one weight, and that weight is used to compare with another character's weight during collation. When sorting Unicode data, each character has several weights, and all the weights of that character can be used during collation.
When sorting TIS620-1 data, the space character X'20', hyphen character X'2D', and full stop character X'2E' all have smaller weights than all the Thai characters. When sorting Unicode data, however, those three characters are considered as punctuation marks; and are used for comparison only when all other characters in the two strings being compared are equal.
The Paiyannoi character X'CF' and the Maiyamok character X'E6' in a TIS620-1 database are treated as punctuation marks when they follow other Thai characters, and as normal characters, with their own weights, when they appear at the beginning of a string. The same two characters in a Unicode database (U+0E2F and U+0E46 respectively) are always treated as punctuation marks, and will be used for comparison when all other characters in the two strings being compared are equal.

More information about Thai characters can be found in the Southeast Asian Scripts chapter of The Unicode Standard book.