Unicode

Unicode is an encoding scheme that currently provides a unique code point for over 100,000 characters. This standard enables systems to more easily handle global data, regardless of the platform, program, or language.

Before Unicode was defined, no single encoding was adequate for all available letters and symbols. For example, consider the following restrictions for EBCDIC and ASCII:
  • These encoding schemes have one code page per character set. For example, they have one code page for Japanese characters and another code page for German characters.
  • These encoding schemes often encode data in different positions. For example, the letter A is encoded as X'C1' in most EBCDIC code pages, but it is encoded as X'41' in most ASCII code pages.
  • Even within encoding schemes, characters might be mapped differently. For example, the letter ä is encoded as X'C0' in EBCDIC code page 273, but it is encoded as X'43' in EBCDIC code page 37. (Code page 37 has the left brace character ( { ) at position X'C0'.) This same letter ä is encoded as X'E4' in ASCII code page 819 and as X'7B' in ASCII code page 1011.
Thus, handling data from more than one character set, such as German characters and Arabic characters, was difficult when ASCII or EBCDIC was used.

Unicode avoids these problems by having a single standard that can provide a unique code point for over a million characters. Currently, the standard has defined code points for just over 100,000 characters. You can view the Unicode code points by looking at the Unicode character code charts on the Unicode Consortium web site. For example, if you look up Unicode code point U+41, you can see that it corresponds to the character 'A'.

The following table shows the first 128 Unicode code points from U+00 to U+7E. These code points are the same as those in ASCII 367.

Table 1. The first 128 code points for Unicode and ASCII CCSID 367
1st →   2nd↓ 0- 1- 2- 3- 4- 5- 6- 7-
-0 NUL DLE (sp) 0 @ P ` p
-1 SCH DC1 ! 1 A Q a q
-2 STX DC2 " 2 B R b r
-3 ETX DC3 # 3 C S c s
-4 EQT DC4 $ 4 D T d t
-5 ENQ NAK % 5 E U e u
-6 ACK SYN & 6 F V f v
-7 BEL ETB ' 7 G W g w
-8 BS CAN ( 8 H X h x
-9 HT EM ) 9 I Y i y
-A LF SUB * : J Z j z
B- VT ESC + ; K [ k {
-C FF FS , < L \ l |
-D OR GS - = M ] m }
-E SO RS . > N ^ n ~
-F SI US / ? O _ o DEL