An introduction to the Thai language

Collation

In Thai, sorting using a character's encoding values produces invalid output. The standard of Thai collation rules refer to Thai Royal Institute Dictionary 2525 B.E. Edition, the official standard Thai dictionary.

Thai collation results
Table 3: The different collation results between codepoint value approach and Thai Royal Institute Dictionary approach.

Collation rules
Words are ordered alphabetically, not phonetically. Consonant weight is:

Thai alphabetical collation

Vowels are also ordered by written forms, not by sounds. Vowel weight is:

Thai vowel weights

Tonal marks and diacritics are ignored at the primary level. If the words are identical at the primary level, the tonal mark and diacritics are considered at the secondary level. Tonal mark and diacritic weight is:

Thai tonal marks

Thai punctuation marks are less significant than tonal mark and diacritics. They must be ignored at the primary and secondary level.

Rearrangement
Usually, leading vowels (U+0E40 through U+0E44) are written before initial consonants. In Thai collation implementation, leading vowels must be considered after the initial consonant by swapping the leading vowel and consonant before string comparison.

For example:

Thai tonal marks