The following outlines the differences between the classic language
level and all other standard-based language levels:
Tokenization
Tokens introduced by macro expansion
may be combined with adjacent tokens in some cases. Historically,
this was an artifact of the text-based implementations of older preprocessors,
and because, in older implementations, the preprocessor was a separate
program whose output was passed on to the compiler.
For similar
reasons, tokens separated only by a comment may also be combined to
form a single token. Here is a summary of how tokenization of a program
compiled in classic mode is performed:
- At a given point in the source file, the next token is the longest
sequence of characters that can possibly form a token. For example, i+++++j is
tokenized as i ++ ++ + j even though i ++ + ++ j may
have resulted in a correct program.
- If the token formed is an identifier and a macro name, the macro
is replaced by the text of the tokens specified on its #define directive.
Each parameter is replaced by the text of the corresponding argument.
Comments are removed from both the arguments and the macro text.
- Scanning is resumed at the first step from the point at which
the macro was replaced, as if it were part of the original program.
- When the entire program has been preprocessed, the result is scanned
again by the compiler as in the first step. The second and third steps
do not apply here since there will be no macros to replace. Constructs
generated by the first three steps that resemble preprocessing directives
are not processed as such.
It is in the third and fourth steps that the text of adjacent
but previously separate tokens may be combined to form new tokens.
The \ character
for line continuation is accepted only in string and character literals
and on preprocessing directives.
Constructs such as:
#if 0
“unterminated
#endif
#define US ”Unterminating string
char *s = US terminated now“
will not generate diagnostic
messages, since the first is an unterminated literal in a FALSE block,
and the second is completed after macro expansion. However:
char *s = US;
will
generate a diagnostic message since the string literal in US is
not completed before the end of the line.
Empty character
literals are allowed. The value of the literal is zero.
Preprocessing directives
The
# token
must appear in the first column of the line. The token immediately
following
# is available for macro expansion. The line can
be continued with
\ only if the name of the directive and,
in the following example, the
( has been seen:
#define f(a,b) a+b
f\
(1,2) /* accepted */
#define f(a,b) a+b
f(\
1,2) /* not accepted */
The rules concerning \ apply
whether or not the directive is valid. For example,
#\
define M 1 /* not allowed */
#def\
ine M 1 /* not allowed */
#define\
M 1 /* allowed */
#dfine\
M 1 /* equivalent to #dfine M 1, even
though #dfine is not valid */
Following
are the preprocessor directive differences.
- #ifdef/#ifndef
- When the first token is not an identifier, no diagnostic message
is generated, and the condition is FALSE.
- #else
- When there are extra tokens, no diagnostic message is generated.
- #endif
- When there are extra tokens, no diagnostic message is generated.
- #include
- The < and > are separate
tokens. The header is formed by combining the spelling of the < and > with
the tokens between them. Therefore /* and // are
recognized as comments (and are always stripped), and the ” and ' do
begin literals within the < and >.
(Remember that in C programs, C++-style comments // are
recognized when -qcpluscmt is specified.)
- #line
- The spelling of all tokens which are not part of the line number
form the new file name. These tokens need not be string literals.
- #error
- Not recognized.
- #define
- A valid macro parameter list consists of zero or more identifiers
each separated by commas. The commas are ignored and the parameter
list is constructed as if they were not specified. The parameter names
need not be unique. If there is a conflict, the last name specified
is recognized.
For an invalid parameter list, a warning is issued.
If a macro name is redefined with a new definition, a warning will
be issued and the new definition used.
- #undef
- When there are extra tokens, no diagnostic message is generated.
Macro expansion
- When the number of arguments on a macro invocation does not match
the number of parameters, a warning is issued.
- If the ( token is present after the macro name
of a function-like macro, it is treated as too few arguments (as above)
and a warning is issued.
- Parameters are replaced in string literals and character literals.
- Examples:
#define M() 1
#define N(a) (a)
#define O(a,b) ((a) + (b))
M(); /* no error */
N(); /* empty argument */
O(); /* empty first argument
and too few arguments */
Text output
No text is generated to replace
comments.