Introduction to bidirectional languages

Bidi characteristics: bidirectionality

Bidi scripts are characterized by bidirectionality and shaping.

The following are key aspects of bidirectional languages:

Directional segments
Bidirectional text can consist of text that has one directionality, such as Arabic text written from right to left, and text that has an opposite directionality, such as numbers. The portion of text with a distinct directionality is called a segment

A segment can have an additional segment with an opposite directionality embedded or "nested" within it. It is theoretically possible to have many levels of nesting, but generally there are no more than two. One level of nesting is necessary for the entry of numbers within Arabic or Hebrew text. It is customary In Hebrew, for example, to write the name of the street before the number of the house, as in this example (uppercase English letters represent Arabic or Hebrew letters):

<--------> <-> <----------->

The street name is entered from right to left. Text flow then has to be reversed to allow correct entry of the number from left to right. This is the nested left-to-right segment. Then the flow must be reversed again to allow entry of the entrance information from right to left.

Imagine someone composing a letter in English to a person who can also read Hebrew, and writing his or her address in Hebrew. In this case, the address in Hebrew will actually be a nested segment of the English text.

address is B ECNARTNE 25 TEERTS ELPAM today
----------> <-----------------> <-> <------------------> ------>
0 1 2 1 0

Since the nested segment of the address contains a nested segment, the street number, we end up with two levels of nesting.

Global orientation
Global orientation, also known as writing order, reading order, or paragraph direction, is that side of the screen, window, or page from which the text begins. Global orientation does not necessary imply that the reader will read the text starting from the same side it has been written. In other words, global orientation is not obvious from the context itself, as the following example illustrates. In this example English uppercase letters represent Arabic or Hebrew letters:

FRED DOES NOT BELIEVE taht yas syawla i

This sentence has one meaning when read from left to right: ‘Fred does not believe I always say that,’ and another when read from right to left: ‘I always say that Fred does not believe.’ Both meanings are valid.

Text that is predominantly oriented right to left will be read by readers of Arabic or Hebrew from right to left. It would be preferable for the global orientation of the text be also right to left, but if nothing is done or specified in the rendering program, the default global orientation will probably default to the natural writing and reading order for English text.

Since global orientation is not always obvious from its context, an application processing bidirectional data must understand and specify it.

The order of text and text-types
In handling bidirectional text, we have to distinguish between the physical order in which text is presented and the logical order in which segments are typed (or pronounced, if read aloud). Some segments may need to be reordered into a logical or physical order.

There are different approaches to the reordering bidirectional text. The term 'text-type' is used to identify which method is applicable for specific text.

Logical order versus physical order
Consider the following example (upper caseEnglish letters represent Arabic or Hebrew letters):

my wife's name is ILIN

The global orientation as well as the order in which the reader reads the text is left-to-right. In the physical order, after the letters i and s comes the letter I of the segment containing my wife's name in Hebrew. Note, however, that my wife's name is pronounced "NILI". In the logical order the first letter of the name segment is thus the letter N.

Bidirectional text may be stored in either its logical order or in its physical order. Each approach has advantages and drawbacks.

Logical order is currently the preferred method for entering and processing text and it is widely supported in workstation environments. On mainframes, bidirectional text is almost always stored in physical order. Integration of bidirectional text from mainframes and workstation environments requires transformation of the bidirectional text into a layout where all text has the same order.

Text-types and reordering techniques
Different text-types require different approaches to reordering:

Visual text-type
This is the oldest approach, dating from a time when there was no processing capability at the workstation. The entire screen is simply copied to storage and back to the screen, possibly inverting every row (depending on the physical orientation of the screen). Each application programmer has to know where the embedded segments are located and process them accordingly. This text-type is called visual because it is a replication of the presented form. Many legacy applications use this type of text.

Implicit text-type
The implicit text-type, also called logical text-type, assumes that the letters of the Latin alphabet have an inherent left-to-right directionality, and those of the Arabic, Persian, Urdu, and Hebrew alphabets have an inherent right-to-left directionality. Text is stored in logical order, and directional characteristics are recognized by an algorithm, and segment inversion is performed automatically. Implicit algorithms are conceptually simple, but they cannot handle some strings such as, for example, a part number that intermixes numbers, left-to-right characters, and right-to-left characters.

Explicit text-type
The explicit text-type depends on additional control characters embedded in the text that instruct an algorithm to perform segment inversions, shaping, or numeral selections, as well as other transformations. Text with an explicit text-type is usually stored in logical order, but the controls embedded in the text may complicate automatic text processing.

No single text-type can be used in all cases. Implicit techniques are usually heuristic and have the shortcomings discussed above. Explicit techniques, while alleviating the limitations of implicit techniques, introduce other challenges, such as the need for automatic processes to handle embedded controls.

The basic display algorithm defined in the Unicode Standard Bidi algorithm bridges implicit and explicit techniques. In principle, it is an implicit reordering algorithm, but it handles a few specific directional controls embedded in the text. There are applications and related databases for all three text-types, and it is possible for the same bidirectional text to be stored in different layouts on different systems.

Symmetrical swapping
Some characters, such as the greater-than sign or a left parenthesis, have an implied directional meaning or a complementary symmetric character with an opposite directional meaning. When used within a segment that is presented from right to left, these characters must be replaced by its symmetric sibling to ensure that the correct meaning is preserved. This replacement is called symmetrical swapping.

In the following example, uppercase English characters represent Arabic or Hebrew letters:

Assume that the global orientation is right-to-left and you have in storage:


If rendered, as normally expected, from right to left this text will look on the screen as:

This is, of course, incorrect! The less-than sign must be exchanged in storage with a greater-than sign to preserve the correct meaning of the expression,

Other characters that require symmetrical swapping include parentheses, square brackets, and braces. Although symmetrical swapping is a characteristic of bidirectional languages, it is not always mandatory for the software functions that transform different bidirectional-language text layouts. Sometimes this function is performed automatically by the workstation hardware or micro code.

Widget mirroring of translated graphical user interfaces (GUI's)
Since Arabic or Hebrew text is read is from right to left, it is natural for a reader of one of these languages to view right-aligned text, to read books with binding on the right, and, when using a computer application with an Arabic or Hebrew translated GUI, to expect it to be mirrored. This includes viewing menu buttons on the right, the navigation tree on the right, and indentation to the left.

Here is an example of an application GUI translated to Hebrew:


Below is another example of a Hebrew-translated GUI window. The vertical scroll bar is on the left, as espected, the labels are on the right of the entry boxes, and the position of the buttons at the bottom of the screen has been horizontally mirrored.

Translating a GUI to Arabic or Hebrew does not mean that the entry boxes cannot contain English text that may be aligned to the left , as in the menu box containing the text ALAMO (FED). Generally, product names are left un-translated. A GUI window may contain distinct frames, areas, or windows with content is that not translated, such as the title portion which contains the IBM logo and the name of the Content Manager eClient® product. These frames or windows do not need to be mirrored, and text in them can be left left-aligned.

IBM Content Manager eClient