Guideline F: Coded character sets


F9: Identify encoding



Identify encoding

Guideline F9


Identify the character encoding of the information.

Character data is processed, transmitted and stored in different ways by different systems. There is a need to specify the general format, encoding and other relevant information about the character data to eliminate the guesswork on what the sequence of binary bits really represent. Once information about the data is identified, systems will be able to determine how one character corresponds ("maps") to another.

For example, HTML files use the charset parameter in the META tag < meta http-equiv="... " ... > statement to specify the encoding of data that is sent over the network via the internet. The charset parameter indicates the actual code page (as registered in IANA) of the HTML contents as shown in the sample code below:

Note that if the HTML allows the user to enter data using HTML forms, having the HTML in Unicode encoding (such as UTF-8) would mean that the form entries are also sent to the server in Unicode encoding. The server usually, does not know the original charset of the HTML.