Introduction to Indic languages


Along with the number of languages and scripts involved, Indic languages provide challenges to developers because of their complexity and orthographic nature.

Linguistically, India is a unique country. No other region has a comparative variety of distinct languages and scripts. Apart from some shared general characteristics, they are different enough that developers should understand their individual characteristics.

Language Script
Bengali Bengali
Assamese, nearly identical with Bengali script
Gujarati Gujarati
Kannada Kannada
Kashmiri Sharada/Urdu/Devanagari
Malayalam Malayalam
Punjabi Gurmukhi/Urdu
Oriya Oriya
Sindhi Devanagari/Urdu
Tamil Tamil
Telugu Telugu
Urdu Devanagari/Urdu

The written forms of Indic languages behave differently from scripts such as English. For example, as you read these lines in English you pronounce the syllables in a strict left-to-right sequence of consonants and vowels. In Indic scripts, however, visual pronunciation indicators in a syllable do not always occur from left to right. This behavior creates specific problems in the creation of computing solutions for these languages.

Another difficulty is the lack of a standard definition for the behavior of Indic languages. We have made some progress in getting achieving consensus for a single ‘definition’ of each language and making it available for linguists to perfect and developers to use. This consensus may later function as a standard.

In the following pages we briefly cover a vast subject and address the general behavioral aspects of Indic languages.