Guideline F: Coded character sets

F3: Supporting graphic character sets



Supporting graphic character sets

Products may have a sensitivity to the content of a character set, that is, the product can operate correctly only when the total number of characters is limited to a subset of the total that is contained in the active coded character set. For products that must work with a restricted set of characters, the set must be redefinable to use the most common characters in a particular region.


Guideline F3


Support more than one character encoding and allow the user to select the character encoding; support Unicode at a minimum.

There may be many reasons that justify limiting a character set content:

Example: File names on a file system can contain only a subset of all the characters supported by the platform. Certain characters, such as, the wildcard and path separator characters, are not allowed because they are needed for other purposes. The set of characters supported by the platform in different countries differs, hence the need to support a different set of file name characters. Products that operate in France will need to support file names such as RENÉ.


Guideline F3-1


Configure data repositories for Unicode data; convert all data to be stored in the repository into Unicode.

When a product or software application writes Unicode data, it must be able to write all the characters in the complete Unicode character range (from U+0000 to U+10FFFF) including surrogate areas via software interface such as network interface, database connection and APIs. The Unicode data produced for files or data streams should use the required Unicode Transformation encoding. For example, UTF-8 files should be produced when UTF-8 is required as the XML encoding.