Executive Overview

Globalization involves dealing with the various languages used in the world. Each language has its own alphabet, punctuation marks, numbers and other symbols--a set of characters. These characters are represented in computers as numbers using coded character sets, which are also known in the industry as code pages, code sets, and charsets. In a coded character set each character is assigned a number, often in hexadecimal format (base 16).

Because coded character sets evolved along with the technological advances in the computer hardware, software, and communications, there are many of them. The most recent and popular coded character set is Unicode, which contains all of the characters needed to write all of the languages in current use, and many other languages as well. While Unicode is recommended for all text representation, especially in today's ondemand world, several older character sets are still used to represent data in databases, Web page collections, or user interfaces.

An understanding of coded character sets will assist in the proper handling of text that is represented using them. This article gives you a basic understanding of the coded character sets that are typically encountered in today's computer systems and some of their characteristics that you should be aware of while dealing with textual data.

Continue to Importance of coded character sets


Further reading

The Character Encoding Model: Unicode Technical Report UTR#17