F8: Avoiding unassigned code points
Avoiding unassigned code points
The content of a coded character set must be standardized, as arbitrary changes to its graphic character content would result in chaos. Manufacturers of hardware and software, as well as standards bodies such as ISO/IEC and ANSI, register and regulate coded character sets. Sometimes developers try to press the guidelines beyond the usual by making furtive use of holes in coded character sets that are only partially full.
This guideline is intended to prevent a design practice that will limit your product usage. For graphic characters, you should ensure that consistency is much more important than flexibility. If you need new characters, either ask your firm to create and register a new coded character set, or devise another alternative method. Some coded character sets consist of a user defined area where you can add in your private new characters, but at the expense of not being able to interchange those characters with other products. Never claim to be using an established coded character set (such as ISO/IEC 8859-1) if it is your customized version.
The Private Use Area (PUA) of the Unicode Standard offers some flexibility in this area. Characters that have limited use can be encoded in the PUA but both sender and receiver has to agree on how to interpret the codes for successful interchange.
Example: To enhance the visual difference between the uppercase letter 'O' and the zero digit '0' (zero), your product created the new character 'Ø' and assigned to it a currently unused code point. Now whenever you need to display the digit zero, your product uses that code point and displays Ø instead. Later that unused code point is assigned a new character, and your product now displays the new character instead of Ø.
Need assistance with your globalization questions?
- Guidelines quick reference
- A: User interface
- B: Writing for an international audience
- C: Respect for culture and conventions
- D: Product structure in a globalized environment
- E: Input and output interfaces
- F: Coded character sets
- G: Introducing Asian ideographic scripts
- H: Languages with a bidirectional script
- I: The cursive Arabic script