Customizing line-breaking algorithms

When moving characters and words to the next line, be aware that languages have their own set of punctuation marks and characters that cannot be placed at the beginning or end of a line.

When formatting text for such scripts as Chinese and Japanese that do not require any space between successive words and sentences, it is very easy to arbitrarily move characters or punctuation marks to the next line during the line breaking process, resulting in forbidden characters at the beginning or end of a line. You cannot use Latin script-based text formatting algorithms for these scripts.

For example, the Kinsoku rules in Japan govern the way lines break. Certain characters are not allowed to begin a line and end a line. You cannot begin a line with a closing parenthesis or a period. You cannot end a line with an open parenthesis. These line-breaking restrictions can be found in the Japanese Industrial Standard (JIS) x-4051.


Guideline A11

When formatting text, be aware of punctuation marks or characters that cannot be placed at the beginning or end of a line during the line-breaking process.

There are tools that are available that can detect and analyze characters placed at the beginning or the end of a line. For instance, Java has classes and methods that can do correct text formatting. The BreakIterator class can analyze character, word, sentence and potential line-break boundaries and has mechanisms to correctly handle punctuation and hyphenated words. Each instance of the BreakIterator class can detect and point to the boundary in a string of text. Using this class can help us determine where a text string can be broken when line-wrapping.