magic — Format of the /etc/magic file

Related information

Description

The file command uses the /etc/magic file in its attempt to identify the type of a binary file. Essentially, /etc/magic contains templates showing what different types of files look like.

The magic file contains lines describing magic numbers, which identify particular types of files. Lines beginning with a > or & character represent continuation lines to a preceding main entry:
>
If the file command finds a match on the main entry line, these additional patterns are checked. Any pattern that matches is used. This may generate additional output; a single blank separates each matching line's output if any output exists for that line.
If the file command finds a match on the main entry line, and a following continuation line begins with this character, that continuation line's pattern must also match, or neither line is used. Output text associated with any line beginning with the & character is ignored.
Each line consists of four fields, separated by one or more tabs:
(a)
The first field is a byte offset in the file, consisting of an optional offset operator and a value. In continuation lines, the offset immediately follows a continuation character.

If no offset operator is specified, then the offset value indicates an offset from the beginning of the file.

The * offset operator specifies that the value located at the memory location following the operator be used as the offset. Thus, *0x3C indicates that the value contained in 0x3C should be used as the offset.

The + offset operator specifies an incremental offset, based on the value of the last offset. Thus, +15 indicates that the offset value is 15 bytes from the last specified offset.

If the byte offset has passed the file length limit, the test will not match.

(b)
The second field is the type of the value.
The valid specifiers are listed below:
d
Signed decimal
u
Unsigned decimal
s
String
u and d can be followed by an optional unsigned decimal integer that specifies the number of bytes represented by the type. The numbers of bytes supported are refined to the byte length of the C-language type char, short, int,long. u and d can also be followed by an optional size specifiers listed below:
C
char
S
short
I
int
L
long
The C, S, I, or L specifiers are correspond to the number of bytes in the C-language types char, short, int, or long.

All type specifiers, except for s, can be followed by a mask specifier of the form &number. The mask value will be bitwise AND 'ed with the value of the input file before the comparison with the value field of the line is made. By default the mask will be interpreted as an unsigned decimal number. With a leading 0x or 0X, the mask will be interpreted as an unsigned hexadecimal number; otherwise, with a leading 0, the mask will be interpreted as an unsigned octal number.

The long format of type specifiers is supported. The valid specifiers, and their interpretation, are listed below:
Specifier _UNIX03=YES _UNIX03 is not YES
byte dC uC
short dS uS
long dL uL
string s s
(c)
The next field is a value, preceded by an optional operator.

If the specifier from the type field is s or string, then interpret the value as a string. Otherwise, interpret it as a number. If the value is a string, then the test will succeed only when a string value exactly matches the bytes from the file. The string value field can contain at most 127 characters per magic line.

If the value is a string, it can contain the following sequences:
  • \character

    The backslash-escape sequences as specified in the Base Definitions volume of IEEE Std 1003.1-2001, Table 5-1, Escape Sequences and Associated Actions (\\, \a, \b, \f, \n, \r, \t, \v ). In addition, the escape sequence \ (the <backslash> character followed by a <space> character) will be recognized to represent a <space> character.

  • \octal

    Octal sequences that can be used to represent characters with specific coded values. An octal sequence consists of a backslash followed by the longest sequence of one, two, or three octal-digit characters (01234567).

By default, any value that is not a string will be interpreted as a signed decimal number. Any such value, with a leading 0x or 0X, will be interpreted as an unsigned hexadecimal number; otherwise, with a leading zero, the value will be interpreted as an unsigned octal number. To maintain compatibility with other systems, numeric values are not subject to bounds checking. Use numeric values that match the specified type.

Operators only apply to nonstring types: byte, short and long. The default operator is = (exact match). The operators are:
=
Equal.
!
Not equal.
>
Greater than.
<
Less than.
&
All bits in pattern must match.
At least one bit in pattern must not match.
x or ?
Any value matches (must be the only character in the field). ? is an extension to traditional implementations of magic.
(d)
The rest of the line is the message string to be printed if the particular file matches the template. Note that the contents of this field is ignored if the line begins with the & continuation character. The fourth field may contain a printf ()-type format indicator to output the magic number (see printf for more details on format indicators). If the field contains a printf ()-type format indicator, the value read from the file will be the argument to printf.

Usage notes

  1. Characters from a code page other than IBM-1047 should not be added to the /etc/magic file (the default magic file).
  2. Characters from a code page other than IBM-1047 can be used in alternate magic files that are specified by the –m or -M option on the file command. These characters should only be used in the third field of the magic file template when the field type is string. They will only match files containing these characters when the file command is invoked in the non-IBM-1047 locale.

Examples

Here are some sample entries:

Characters    
0   short 0x5AD4 DOS executable
*0x18 Short 0x40
>*0x3c Short 0x6584C OS/2 linear executable
>*0x3C Short 0x454e
>+54byte 1 OS/2 format
>+54byte 2 Windows format
0   short 0xFDF0 DOS library
0   string AH Halo bitmapped font file
0   short 0x601A Atara ST contiguous executable
>14 long >0 – not stripped
0 byte 0X1F  
>1 byte 0x1E Packed file
>1 byte 0x9D Compressed file

Related information

The file command.