LC_CTYPE category

This category defines character classification, case conversion, and other character attributes. In this category, you can represent a series of characters by using three adjacent periods as an ellipsis symbol (). An ellipsis is interpreted as including all characters with an encoded value higher than the encoded value of the character preceding the ellipsis and lower than the encoded value following the ellipsis.

An ellipsis is valid within a single encoded character set. For example, \x30;…;\x39; includes in the character class all characters with encoded values from X'30' to X'39'.

The keywords recognized in the LC_CTYPE category are listed below. In the descriptions, the term "automatically included" means that it is not an error either to include or omit any of the referenced characters; they are assumed by default even if the entire keyword is missing and accepted if present. If a keyword is specified without any arguments, the default characters are assumed.

When a character is automatically included, it has an encoded value dependent on the charmap file in effect. If no charmap file is specified, the encoding of the encoded character set IBM-1047 is assumed.
copy
Specifies the name of an existing locale to be used as the source for the definition of this category. If this keyword is specified, no other keywords are present in this category. If the locale is not found, an error is reported and no locale output is created. The copy keyword cannot specify a locale that also specifies the copy keyword for the same category.
charclass
Defines one or more locale-specific character class names as strings separated by semicolons. Each named character class can then be defined subsequently in the LC_CTYPE definition. A character class name consists of at least one and at most {CHARCLASS_NAME_MAX} bytes of alphanumeric characters from the portable filename character set. The first character of a character class name cannot be a digit. The name cannot match any of the LC_CTYPE keywords defined in this information.
upper
Defines characters to be classified as uppercase letters. No character defined for the keywords cntrl, digit, punct, or space can be specified. The uppercase letters A through Z are automatically included in this class. The isupper() and iswupper() functions test for any character and wide character, respectively, included in this class.
lower
Defines characters to be classified as lowercase letters. No character defined for the keywords cntrl, digit, punct, or space can be specified. The lowercase letters a through z are automatically included in this class. The islower() and iswlower() functions test for any character and wide character, respectively, included in this class.
alpha
Defines characters to be classified as letters. No character defined for the keywords cntrl, digit, punct, or space can be specified. Characters classified as either upper or lower are automatically included in this class. The isalpha() and iswalpha() functions test for any character or wide character, respectively, included in this class.
digit
Defines characters to be classified as numeric digits. Only the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. can be specified. If they are, they must be in contiguous ascending sequence by numerical value. The digits 0 through 9 are automatically included in this class. The isdigit() and iswdigit() functions test for any character or wide character, respectively, included in this class.
space
Defines characters to be classified as whitespace characters. No character defined for the keywords upper, lower, alpha, digit, or xdigit can be specified for space. The characters <space>, <form-feed>, <newline>, <carriage-return>, <horizontal-tab>, and <vertical-tab>, and any characters defined in the class blank are automatically included in this class. The functions isspace() and iswspace() test for any character or wide character, respectively, included in this class.
cntrl
Defines characters to be classified as control characters. No character defined for the keywords upper, lower, alpha, digit, punct, graph, print, or xdigit can be specified for cntrl. The functions iscntrl() and iswcntrl() test for any character or wide character, respectively, included in this class.
punct
Defines characters to be classified as punctuation characters. No character defined for the keywords upper, lower, alpha, digit, cntrl, or xdigit, or as the <space> character, can be specified. The functions ispunct() and iswpunct() test for any character or wide character, respectively, included in this class.
graph
Defines characters to be classified as printing characters, not including the <space> character. Characters specified for the keywords upper, lower, alpha, digit, xdigit, and punct are automatically included. No character specified in the keyword cntrl can be specified for graph. The functions isgraph() and iswgraph() test for any character or wide character, respectively, included in this class.
print
Defines characters to be classified as printing characters, including the <space> character. Characters specified for the keywords upper, lower, alpha, digit, xdigit, punct, and the <space> character are automatically included. No character specified in the keyword cntrl can be specified for print. The functions isprint() and iswprint() test for any character or wide character, respectively, included in this class.
xdigit
Defines characters to be classified as hexadecimal digits. Only the characters defined for the class digit can be specified, in contiguous ascending sequence by numerical value, followed by one or more sets of six characters representing the hexadecimal digits 10 through 15, with each set in ascending order (for example, A, B, C, D, E, F, a, b, c, d, e, f). The digits 0 through 9, the uppercase letters A through F, and the lowercase letters a through f are automatically included in this class. The functions isxdigit() and iswxdigit() test for any character or wide character, respectively, included in this class.
blank
Defines characters to be classified as blank characters. The characters <space> and <tab> are automatically included in this class. The functions isblank() and iswblank() test for any character or wide character, respectively, included in this class.
toupper
Defines the mapping of lowercase letters to uppercase letters. The operand consists of character pairs, separated by semicolons. The characters in each character pair are separated by a comma; the pair is enclosed in parentheses. The first character in each pair is the lowercase letter, and the second is the corresponding uppercase letter. Only characters specified for the keywords lower and upper can be specified for toupper. The lowercase letters a through z, their corresponding uppercase letters A through Z, are automatically in this mapping, but only when the toupper keyword is omitted from the locale definition. It affects the behavior of the toupper() and towupper() functions for mapping characters and wide characters, respectively.
tolower
Defines the mapping of uppercase letters to lowercase letters. The operand consists of character pairs, separated by semicolons. The characters in each character pair are separated by a comma; the pair is enclosed by parentheses. The first character in each pair is the uppercase letter, and the second is its corresponding lowercase letter. Only characters specified for the keywords lower and upper can be specified. If the tolower keyword is omitted from the locale definition, the mapping is the reverse mapping of the one specified for the toupper. The tolower keyword affects the behavior of the tolower() and towlower() functions for mapping characters and wide characters, respectively.

You may define additional character classes using your own keywords. A maximum of 31 classes are supported in total: the 12 standard classes, and up to 19 user-defined classes. The defined classes affect the behavior of wctype() and iswctype() functions.

Figure 1 is an example of the definition of the LC_CTYPE category.

Figure 1. Example LC_CTYPE definition
escape_char           /
comment_char          %

%%%%%%%%%%%%%
LC_CTYPE
%%%%%%%%%%%%%
% upper letters are A-Z by default plus the three defined below
upper   <A-acute.>;<A-grave.>;<C-acute.>

% lower case letters are a-z by default plus the three defined below
lower   <a-acute>;<a_grave><c-acute>

% space characters are default 6 characters plus the one defined below
space   <hyphen-minus>
cntrl   <alert>;<backspace>;<tab>;<newline>;<vertical-tab>;/
        <form-feed>;<carriage-return>;<NUL>;/
        <SO>;<SI>

% default graph, print,punct, digit, xdigit, blank classes

% toupper mapping defined only for the following three pairs
toupper (<a-acute),<A-acute>);/
        (<a-grave),<A-grave>);/
        (<c-acute),<C-acute>);

% default upper to lower case mapping

% user defined class
myclass  <e-ogonek>;<E-ogonek>

END LC_CTYPE