F5: Validating graphic characters
Validating graphic characters
Many products validate the user's input to ensure that only acceptable characters are entered into the products. Validation can check if the input character is alphabetic, numeric, alphanumeric, or some other type. Since each script has its own way of defining the type of a particular character, any form of validity checking must be based on that script's definition of validity.
Example: The following Java code validates the input character as being alphabetic only:
// This is an example of a BAD coding technique to validate characters
if ((ch >= 'a' && ch <= 'z') || (ch <= 'A' && ch >= 'Z'))
… // ch is a letter
In Germany, your product rejects the valid character ü.
A possible solution is to use the Character class in Java that has methods that use the Unicode Standard in validating a character's data type. In the following Java code, the Character.isLetter() method returns a true value (the character is a valid alphabetic letter) regardless of whether the character ü is a German character or not. Note that national use characters are alphabetic characters from languages besides English.
… // ch is a letter
Other methods that are frequently used with the Character class are: IsDigit, isLetterOrDigit, IsLowerCase, IsUpperCase, IsSpaceChar and IsDefined. The Character.getType() method can also be used to return the type of a given character based on the Unicode Standard. The following Java code uses the Character.getType() method to determine the data type of a constant.
if (Character.getType('r') == Character.LOWERCASE_LETTER)
if (Character.getType('R') == Character.UPPERCASE_LETTER)
if (Character.getType('5') == Character.DECIMAL_DIGIT_NUMBER)
if (Character.getType('$') == Character.CURRENCY_SYMBOL)
if (Character.getType('>') == Character.MATH_SYMBOL)
if (Character.getType('_') == Character.CONNECTOR_PUNCTUATION)
Need assistance with your globalization questions?
- Guidelines quick reference
- A: User interface
- B: Writing for an international audience
- C: Respect for culture and conventions
- D: Product structure in a globalized environment
- E: Input and output interfaces
- F: Coded character sets
- G: Introducing Asian ideographic scripts
- H: Languages with a bidirectional script
- I: The cursive Arabic script