Character encoding

Understanding how graphic character data is encoded is essential when developing multilingual software. Each language has its own alphabet, punctuation marks, numbers and other symbols which are represented in computers as numbers. There are various encoding schemes which are used when assigning numbers to characters to create coded character sets. These encoding schemes include commonly used standards such as ASCII and Unicode as well as IBM’s own EBCDIC. Encodings may be fixed single-byte, where each character is represented by a single 8-bit byte, fixed multi-byte where each character is represented by 2 or more 8-bit bytes or they may be mixed where each character is encoded with a variable number of bytes. Global applications must be capable of identifying and handling data regardless of how it is encoded.

Coded Character Sets: An Overview - This article provides an introduction to the concepts involved in character data handling.

Character Data Representation Architecture - This reference publication provides detailed information on the IBM defined identifiers, their use and relationships.