Skip to main content

Software  > Globalization > Guidelines overview > Coded character sets > 

Globalize your On Demand Business

Coding Graphic Characters | Using Graphic Characters | Supporting Graphic Character Sets | Accessing Graphic Characters | Validating Graphic Characters | Respecting Reserved Code Points | Redefining Graphic Character Meaning | Avoiding Unassigned Code Points | Identify Encoding
Identify Encoding

Guideline F9

Identify the character encoding of the information.

Character data is processed, transmitted and stored in different ways by different systems. There is a need to specify the general format, encoding and other relevant information about the character data to eliminate the guesswork on what the sequence of binary bits really represent. Once information about the data is identified, systems will be able to determine how one character corresponds ("maps") to another.

For example, HTML files use the charset parameter in the META tag < meta http-equiv="... " ... > statement to specify the encoding of data that is sent over the network via the internet. The charset parameter indicates the actual code page (as registered in IANA) of the HTML contents as shown in the sample code below:

  • meta http-equiv="Content-Type" content="text/html; charset="GB18030"/
  • meta http-equiv="Content-Type" content="text/html; charset="ISO_8859-6"/
  • meta http-equiv="Content-Type" content="text/html; charset="Shift_JIS"/
  • meta http-equiv="Content-Type" content="text/html; charset="UTF-8"/

Note that if the HTML allows the user to enter data using HTML forms, having the HTML in Unicode encoding (such as UTF-8) would mean that the form entries are also sent to the server in Unicode encoding. The server usually, does not know the original charset of the HTML.


We're here to help
Easy ways to get the answers you need.
E-mail IBM