Purpose
Specifies the cache configuration for a specific
execution machine.
The
compiler uses this information to tune program performance, especially
for loop operations that can be structured (or blocked)
to process only the amount of data that can fit into the data cache.
Syntax
.-:------------------------.
V |
>>- -q--cache--=----+-assoc--=--+-0------+-+-+-----------------><
| +-1------+ |
| '-number-' |
+-auto-----------------+
+-cost--=--cycles------+
+-level--=--+-1-+------+
| +-2-+ |
| '-3-' |
+-line--=--bytes-------+
+-size--=--Kbytes------+
'-type--=--+-C-+-------'
+-c-+
+-D-+
+-d-+
+-I-+
'-i-'
Parameters
- assoc=number
- Specifies the set associativity of the cache:
- 0
- Direct-mapped cache
- 1
- Fully associative cache
- n > 1
- n-way set-associative cache
- auto
- Automatically detects the specific cache configuration of the
compiling machine. It assumes that the execution environment will
be the same as the compilation environment.
- cost=cycles
- Specifies the performance penalty that results from a cache miss
so that the compiler can decide whether to perform an optimization
that might result in extra cache misses.
- level=level
- Specifies which level of cache is affected:
- 1
- Basic cache
- 2
- Level-2 cache or the table lookaside buffer (TLB) if the machine
has no level-2 cache
- 3
- TLB in a machine that does have a level-2 cache
Other levels are possible but are currently undefined.
If a system has more than one level of cache, use a separate -qcache option
to describe each level.
- line=bytes
- Specifies the line size of the cache.
- size=Kbytes
- Specifies the total size of this cache.
- type={C|c| D|d|I|i}
- Specifies the type of cache that the settings apply to, as follows:
- C or c for
a combined data and instruction cache
- D or d for
the data cache
- I or i for
the instruction cache
Usage
If you know exactly what type of
system a program is intended to be executed on and that system has
its instruction or data cache configured differently from the default
case (as governed by the -qtune setting),
you can specify the exact characteristics of the cache to allow the
compiler to compute more precisely the benefits of particular cache-related
optimizations.
For the -qcache option
to have any effect, you must include the level and type suboptions
and specify the -qhot option or an option
that implies -qhot.
- If you know some but not all of the values, specify the ones you
do know.
- If a system has more than one level of cache, use a separate -qcache option
to describe each level. If you have limited time to spend experimenting
with this option, it is more important to specify the characteristics
of the data cache than of the instruction cache.
- If you are not sure of the exact cache sizes of the target systems,
use relatively small estimated values. It is better to have some cache
memory that is not used than to have cache misses or page faults from
specifying a cache that is larger than the target system has.
If you specify the wrong values for the cache configuration
or run the program on a machine with a different configuration, the
program may not be as fast as possible but will still work correctly.
Remember, if you are not sure of the exact values for cache sizes,
use a conservative estimate.
Examples
To tune performance for a system
with a combined instruction and data level-1 cache where the cache
is two-way associative, 8 KB in size, and has 64-byte cache lines:
xlf95 -O3 -qhot -qcache=type=c:level=1:size=8:line=64:assoc=2 file.f
To
tune performance for a system with two levels of data cache, use two
-qcache options:
xlf95 -O3 -qhot -qcache=type=D:level=1:size=256:line=256:assoc=4 \
-qcache=type=D:level=2:size=512:line=256:assoc=2 file.f
To
tune performance for a system with two types of cache, again use two
-qcache options:
xlf95 -O3 -qhot -qcache=type=D:level=1:size=256:line=256:assoc=4 \
-qcache=type=I:level=1:size=512:line=256:assoc=2 file.f