Guideline F: Coded character sets


F5: Validating graphic characters



Validating graphic characters

Many products validate the user's input to ensure that only acceptable characters are entered into the products. Validation can check if the input character is alphabetic, numeric, alphanumeric, or some other type. Since each script has its own way of defining the type of a particular character, any form of validity checking must be based on that script's definition of validity.

Guideline F5


Validate characters based on the data type, which can be redefined by the user.

Example: The following Java code validates the input character as being alphabetic only:

// This is an example of a BAD coding technique to validate characters

char ch;
:
if ((ch >= 'a' && ch <= 'z') || (ch <= 'A' && ch >= 'Z'))
… // ch is a letter
:

In Germany, your product rejects the valid character ü.

A possible solution is to use the Character class in Java that has methods that use the Unicode Standard in validating a character's data type. In the following Java code, the Character.isLetter() method returns a true value (the character is a valid alphabetic letter) regardless of whether the character ü is a German character or not. Note that national use characters are alphabetic characters from languages besides English.

char ch;
:
if (Character.isLetter(ch))
… // ch is a letter
:

Other methods that are frequently used with the Character class are: IsDigit, isLetterOrDigit, IsLowerCase, IsUpperCase, IsSpaceChar and IsDefined. The Character.getType() method can also be used to return the type of a given character based on the Unicode Standard. The following Java code uses the Character.getType() method to determine the data type of a constant.

if (Character.getType('r') == Character.LOWERCASE_LETTER)
:
if (Character.getType('R') == Character.UPPERCASE_LETTER)
:
if (Character.getType('5') == Character.DECIMAL_DIGIT_NUMBER)
:
if (Character.getType('$') == Character.CURRENCY_SYMBOL)
:
if (Character.getType('>') == Character.MATH_SYMBOL)
:
if (Character.getType('_') == Character.CONNECTOR_PUNCTUATION)
: