What is the Unicode Standard?

The Unicode Standard precisely defines a character set as well as a small number of encodings for it. It enables you to handle text in any language efficiently. It allows a single application to work for a global audience.

Before the Unicode Standard, the encoding systems that existed did not cover all the necessary numbers, characters, and symbols in use. Different encoding systems might assign the same number to different characters. If you used the wrong encoding system, your output might not have been what you expected to see.

The Unicode Standard provides a unique number for every character, regardless of platform, language, or program. Using the Unicode Standard, you can develop a software product that works with various platforms, languages, and countries. The Unicode Standard also allows data to be transported through many different systems. Modern systems provide internationalization solutions based on the Unicode Standard.

The original Unicode Standard repertoire covered all major languages commonly used in computing. The Unicode Standard continues to grow and to include more scripts.

The design of the Unicode Standard differs in several ways from traditional character sets and encoding schemes:

Its repertoire enables users to include text efficiently in almost all languages within a single document.
It can be encoded in a byte-based way with one or more bytes per character, but the default encoding scheme uses 16-bit units that allow much simpler processing for all common characters.
Many characters, such as letters with accents and umlauts, can be combined from the base character and accent or umlaut modifiers. This combining reduces the number of different characters that need to be encoded separately. Pre-composed variants for characters that existed in common character sets at the time were included for compatibility.

Characters and their usage are well-defined and described. Traditional character sets typically provide only the name or a picture of a character and its number and byte encoding; the Unicode Standard has a comprehensive database of properties available. It also defines a number of processes and algorithms for dealing with many aspects of text processing to make it more interoperable.

The early inclusion of all characters of commonly used character sets makes the Unicode Standard a useful mechanism for converting between traditional character sets, and makes it feasible to process non-Unicode text by first converting the text into Unicode, processing the text, and then converting it back to the original encoding without loss of data.