Differences between the classic language level and all other standard-based language levels

The following outlines the differences between the classic language level and all other standard-based language levels:

Tokenization

Tokens introduced by macro expansion may be combined with adjacent tokens in some cases. Historically, this was an artifact of the text-based implementations of older preprocessors, and because, in older implementations, the preprocessor was a separate program whose output was passed on to the compiler.

For similar reasons, tokens separated only by a comment may also be combined to form a single token. Here is a summary of how tokenization of a program compiled in classic mode is performed:

  1. At a given point in the source file, the next token is the longest sequence of characters that can possibly form a token. For example, i+++++j is tokenized as i ++ ++ + j even though i ++ + ++ j may have resulted in a correct program.
  2. If the token formed is an identifier and a macro name, the macro is replaced by the text of the tokens specified on its #define directive. Each parameter is replaced by the text of the corresponding argument. Comments are removed from both the arguments and the macro text.
  3. Scanning is resumed at the first step from the point at which the macro was replaced, as if it were part of the original program.
  4. When the entire program has been preprocessed, the result is scanned again by the compiler as in the first step. The second and third steps do not apply here since there will be no macros to replace. Constructs generated by the first three steps that resemble preprocessing directives are not processed as such.

It is in the third and fourth steps that the text of adjacent but previously separate tokens may be combined to form new tokens.

The \ character for line continuation is accepted only in string and character literals and on preprocessing directives.

Constructs such as:

#if 0
  “unterminated
#endif
#define US ”Unterminating string
char *s = US terminated now“

will not generate diagnostic messages, since the first is an unterminated literal in a FALSE block, and the second is completed after macro expansion. However:

char *s = US;

will generate a diagnostic message since the string literal in US is not completed before the end of the line.

Empty character literals are allowed. The value of the literal is zero.

Preprocessing directives

The # token must appear in the first column of the line. The token immediately following # is available for macro expansion. The line can be continued with \ only if the name of the directive and, in the following example, the ( has been seen:
#define f(a,b) a+b
f\
(1,2)      /* accepted */
#define f(a,b) a+b
f(\
1,2)       /* not accepted */

The rules concerning \ apply whether or not the directive is valid. For example,

#\
define M 1   /* not allowed */
#def\
ine M 1      /* not allowed */
#define\
M 1          /* allowed */
#dfine\
M 1          /* equivalent to #dfine M 1, even
                   though #dfine is not valid  */

Following are the preprocessor directive differences.

#ifdef/#ifndef
When the first token is not an identifier, no diagnostic message is generated, and the condition is FALSE.
#else
When there are extra tokens, no diagnostic message is generated.
#endif
When there are extra tokens, no diagnostic message is generated.
#include
The < and > are separate tokens. The header is formed by combining the spelling of the < and > with the tokens between them. Therefore /* and // are recognized as comments (and are always stripped), and the and ' do begin literals within the < and >. (Remember that in C programs, C++-style comments // are recognized when -qcpluscmt is specified.)
#line
The spelling of all tokens which are not part of the line number form the new file name. These tokens need not be string literals.
#error
Not recognized.
#define
A valid macro parameter list consists of zero or more identifiers each separated by commas. The commas are ignored and the parameter list is constructed as if they were not specified. The parameter names need not be unique. If there is a conflict, the last name specified is recognized.

For an invalid parameter list, a warning is issued. If a macro name is redefined with a new definition, a warning will be issued and the new definition used.

#undef
When there are extra tokens, no diagnostic message is generated.

Macro expansion

Text output

No text is generated to replace comments.