The collation algorithm used in a Thai Industrial
Standard (TIS) TIS620-1 (code page 874) Thai database with the NLSCHAR
collation
option is similar, but not identical to, the collation algorithm used
in a
Unicode database with the UCA500R1_LTH collation option.
The differences are as
follows:
- When sorting TIS620-1 data, each character only
has one weight, and that
weight is used to compare with another character's weight during
collation.
When sorting Unicode data, each character has several weights, and
all the
weights of that character can be used during collation.
- When
sorting TIS620-1 data, the space character X'20', hyphen character
X'2D', and full stop character X'2E' all have smaller weights than
all the
Thai characters. When sorting Unicode data, however, those three
characters
are considered as punctuation marks; and are used for comparison
only when
all other characters in the two strings being compared are equal.
- The Paiyannoi character X'CF' and the Maiyamok character X'E6'
in a TIS620-1
database are treated as punctuation marks when they follow other
Thai characters,
and as normal characters, with their own weights, when they appear
at the
beginning of a string. The same two characters in a Unicode database
(U+0E2F
and U+0E46 respectively) are always treated as punctuation marks,
and will
be used for comparison when all other characters in the two strings
being
compared are equal.
More information about
Thai characters can be
found in the Southeast Asian Scripts chapter of The Unicode
Standard book.