An introduction to the Thai language

Word break boundary

Traditionally, Thai text is written without separating characters (such as spaces) between words. In general, word separators can be found at the end of sentences.

Figure 6: Example of Thai text

Implementation of Thai word break boundaries requires a rules-based approach. Since these rules are complicated and have many exceptions, a rules-based approach alone is not adequate to handle Thai word breaks accurately. A dictionary-based approach is combined with a rules-based approach to improve accuracy. The dictionary-based approach matches sequences of characters with entries in a dictionary. The accuracy of word breaks depends on the rules-based algorithm and the number of Thai words in the dictionary. However, there is no guarantee that this combined approach is 100% accurate.

For application developers, the standard APIs for the Thai word break boundary are available in the International Components for Unicode (ICU) and Java Development Kit (JDK).