Introduction to Indic languages

Basic components of Indic languages

The basic phonetic components of Indic languages are vowels (called 'swar' in Hindi) and consonants (called 'vyanjan' in Hindi). These form the basis of Indic alphabets. There are also dependent characters which are used in conjunction with the alphabet to modify the sounds they represent. We call these 'special-modifiers.

• The number of characters in the alphabet varies with different scripts. No hard rule applies.

• Vowels and consonants are the independent characters representing sounds that originate from different parts of the mouth.

• To a linguist, consonants in Indic scripts contain an inherent vowel 'a'. For example, the first consonant in most Indic scripts is 'ka' (in Devanagari it is ), which is the equivalent of the 'pure' consonant sound 'k' and the vowel 'a.' For an English equivalent, think of the character 'B.' It can be considered to be comprised of the pure consonant 'b' and the vowel 'e'.

• The pure consonant in Indic is also known as the 'dead' consonant. It is depicted by a special character called 'halant' ( in Hindi) placed below a character () to suppress the inherent vowel in the consonant. This applies to all consonants in Indic alphabets. The name for 'halant' varies among the languages, and is known as the VIRAMA in the Unicode Standard (http://www.unicode.org/ ).


According to the linguistic view, an Indic alphabet contains vowels, pure consonants and some special modifiers. You will see later how this simplifies some complexities in understanding Indic languages.

Encoding Indic scripts
Large-scale, multilingual applications have existed in India since the early days of computing. The stand-alone nature of these applications, however, ensured that any scheme used to encode so many Indic languages remained confined to the environments in which they were deployed, and therefore not interoperable. There are a number of these encodings, many of which are proprietary.
The most established encoding is called ISCII (Indian Script Code for Information Interchange) which was created in 1988. It is an ingenious encoding scheme that takes advantage of the common linguistic threads of Indic scripts. The Unicode Standard and its ISO counterpart (ISO/IEC 10646) were created as a global standard to address the needs of most major languages. When Unicode was extended to Indic languages, it drew from the ISCII standard.
Most modern software providers utilize Unicode encoding for Web-based applications, making it the ideal choice for e-business in Indic languages. As additional experience is gained with the use of Unicode for Indic languages, refinements to the standard are being added for incorporation into future versions of the standard.