This category defines character classification, case conversion,
and other character attributes. In this category, you can represent
a series of characters by using three adjacent periods as an ellipsis
symbol (…). An ellipsis is interpreted as including
all characters with an encoded value higher than the encoded value
of the character preceding the ellipsis and lower than the encoded
value following the ellipsis.
An ellipsis is valid within a single encoded character set. For
example, \x30;…;\x39; includes in the character class
all characters with encoded values from X'30' to X'39'.
The keywords recognized in the LC_CTYPE category are
listed below. In the descriptions, the term "automatically included"
means that it is not an error either to include or omit any of the
referenced characters; they are assumed by default even if the entire
keyword is missing and accepted if present. If a keyword is specified
without any arguments, the default characters are assumed.
When a character is automatically included, it has an encoded value
dependent on the
charmap file in effect. If no
charmap file
is specified, the encoding of the encoded character set IBM-1047 is
assumed.
- copy
- Specifies the name of an existing locale to be used as the source
for the definition of this category. If this keyword is specified,
no other keywords are present in this category. If the locale is not
found, an error is reported and no locale output is created. The copy keyword
cannot specify a locale that also specifies the copy keyword
for the same category.
- charclass
- Defines one or more locale-specific character class names as
strings separated by semicolons. Each named character class can then
be defined subsequently in the LC_CTYPE definition. A
character class name consists of at least one and at most {CHARCLASS_NAME_MAX}
bytes of alphanumeric characters from the portable filename character
set. The first character of a character class name cannot be a digit.
The name cannot match any of the LC_CTYPE keywords defined
in this information.
- upper
- Defines characters to be classified as uppercase letters. No
character defined for the keywords cntrl, digit, punct,
or space can be specified. The uppercase letters A through Z are
automatically included in this class. The isupper() and iswupper() functions
test for any character and wide character, respectively, included
in this class.
- lower
- Defines characters to be classified as lowercase letters. No
character defined for the keywords cntrl, digit, punct,
or space can be specified. The lowercase letters a through z are
automatically included in this class. The islower() and iswlower() functions
test for any character and wide character, respectively, included
in this class.
- alpha
- Defines characters to be classified as letters. No character
defined for the keywords cntrl, digit, punct,
or space can be specified. Characters classified as either upper or lower are
automatically included in this class. The isalpha() and iswalpha() functions
test for any character or wide character, respectively, included in
this class.
- digit
- Defines characters to be classified as numeric digits. Only
the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. can be specified.
If they are, they must be in contiguous ascending sequence by numerical
value. The digits 0 through 9 are automatically
included in this class. The isdigit() and iswdigit() functions test
for any character or wide character, respectively, included in this
class.
- space
- Defines characters to be classified as whitespace characters.
No character defined for the keywords upper, lower, alpha, digit,
or xdigit can be specified for space. The characters <space>, <form-feed>, <newline>, <carriage-return>, <horizontal-tab>,
and <vertical-tab>, and any characters defined in the
class blank are automatically included in this class. The
functions isspace() and iswspace() test for any character or wide
character, respectively, included in this class.
- cntrl
- Defines characters to be classified as control characters. No
character defined for the keywords upper, lower, alpha, digit, punct, graph, print,
or xdigit can be specified for cntrl. The functions iscntrl() and iswcntrl() test
for any character or wide character, respectively, included in this
class.
- punct
- Defines characters to be classified as punctuation characters.
No character defined for the keywords upper, lower, alpha, digit, cntrl,
or xdigit, or as the <space> character, can
be specified. The functions ispunct() and iswpunct() test for any
character or wide character, respectively, included in this class.
- graph
- Defines characters to be classified as printing characters,
not including the <space> character. Characters specified
for the keywords upper, lower, alpha, digit, xdigit,
and punct are automatically included. No character specified
in the keyword cntrl can be specified for graph. The
functions isgraph() and iswgraph() test for any character or wide
character, respectively, included in this class.
- print
- Defines characters to be classified as printing characters,
including the <space> character. Characters specified
for the keywords upper, lower, alpha, digit, xdigit, punct,
and the <space> character are automatically included.
No character specified in the keyword cntrl can be specified
for print. The functions isprint() and iswprint() test
for any character or wide character, respectively, included in this
class.
- xdigit
- Defines characters to be classified as hexadecimal digits. Only
the characters defined for the class digit can be specified,
in contiguous ascending sequence by numerical value, followed by one
or more sets of six characters representing the hexadecimal digits 10 through 15,
with each set in ascending order (for example, A, B, C, D, E,
F, a, b, c, d, e, f). The digits 0 through 9,
the uppercase letters A through F, and the lowercase
letters a through f are automatically included
in this class. The functions isxdigit() and iswxdigit() test for any
character or wide character, respectively, included in this class.
- blank
- Defines characters to be classified as blank characters. The
characters <space> and <tab> are automatically
included in this class. The functions isblank() and iswblank() test
for any character or wide character, respectively, included in this
class.
- toupper
- Defines the mapping of lowercase letters to uppercase letters.
The operand consists of character pairs, separated by semicolons.
The characters in each character pair are separated by a comma; the
pair is enclosed in parentheses. The first character in each pair
is the lowercase letter, and the second is the corresponding uppercase
letter. Only characters specified for the keywords lower and upper can
be specified for toupper. The lowercase letters a through z,
their corresponding uppercase letters A through Z,
are automatically in this mapping, but only when the toupper keyword
is omitted from the locale definition. It affects the behavior of
the toupper() and towupper() functions for mapping characters and wide
characters, respectively.
- tolower
- Defines the mapping of uppercase letters to lowercase letters.
The operand consists of character pairs, separated by semicolons.
The characters in each character pair are separated by a comma; the
pair is enclosed by parentheses. The first character in each pair
is the uppercase letter, and the second is its corresponding lowercase
letter. Only characters specified for the keywords lower and upper can
be specified. If the tolower keyword is omitted from the
locale definition, the mapping is the reverse mapping of the one specified
for the toupper. The tolower keyword affects
the behavior of the tolower() and towlower() functions for mapping
characters and wide characters, respectively.
You may define additional character classes using your own keywords.
A maximum of 31 classes are supported in total: the 12 standard classes,
and up to 19 user-defined classes. The defined classes affect the
behavior of wctype() and iswctype() functions.
Figure 1 is an example of the definition of the LC_CTYPE category.
Figure 1. Example
LC_CTYPE definitionescape_char /
comment_char %
%%%%%%%%%%%%%
LC_CTYPE
%%%%%%%%%%%%%
% upper letters are A-Z by default plus the three defined below
upper <A-acute.>;<A-grave.>;<C-acute.>
% lower case letters are a-z by default plus the three defined below
lower <a-acute>;<a_grave><c-acute>
% space characters are default 6 characters plus the one defined below
space <hyphen-minus>
cntrl <alert>;<backspace>;<tab>;<newline>;<vertical-tab>;/
<form-feed>;<carriage-return>;<NUL>;/
<SO>;<SI>
% default graph, print,punct, digit, xdigit, blank classes
% toupper mapping defined only for the following three pairs
toupper (<a-acute),<A-acute>);/
(<a-grave),<A-grave>);/
(<c-acute),<C-acute>);
% default upper to lower case mapping
% user defined class
myclass <e-ogonek>;<E-ogonek>
END LC_CTYPE