-qprefetch

Category

Optimization and tuning

@PROCESS

None.

Purpose

Inserts prefetch instructions automatically where there are opportunities to improve code performance.

When -qprefetch is in effect, the compiler may insert prefetch instructions in compiled code. When -qnoprefetch is in effect, prefetch instructions are not inserted in compiled code.

Syntax

Read syntax diagramSkip visual syntax diagram
                    .-:-----------------------------------.     
                    V                                     |     
        .-prefetch----+---------------------------------+-+-.   
        |             |    .-noassistthread-----------. |   |   
        |             +-=--+-assistthread--=--+-SMT-+-+-+   |   
        |             |                       '-CMP-'   |   |   
        |             |    .-noaggressive-.             |   |   
        |             +-=--+-aggressive---+-------------+   |   
        |             '-=--dscr--=--value---------------'   |   
>>- -q--+-noprefetch----------------------------------------+--><

Defaults

-qprefetch=noassistthread:noaggressive:dscr=0

Parameters

assistthread | noassistthread
When you work with applications that generate a high cache-miss rate, you can use -qprefetch=assistthread to exploit assist threads for data prefetching. This suboption guides the compiler to exploit assist threads at optimization level -O3 -qhot or higher. If you do not specify -qprefetch=assistthread, -qprefetch=noassistthread is implied.
CMP
For systems based on the chip multi-processor architecture (CMP), you can use -qprefetch=assistthread=cmp.
SMT
For systems based on the simultaneous multi-threading architecture (SMT), you can use -qprefetch=assistthread=smt.
Note: If you do not specify either CMP or SMT, the compiler uses the default setting based on your system architecture.
aggressive | noaggressive
This suboption guides the compiler to generate aggressive data prefetching at optimization level -O3 or higher. If you do not specify aggressive, -qprefetch=noaggressive is implied.
dscr
You can specify a value for the dscr suboption to improve the runtime performance of your applications. The compiler sets the Data Stream Control Register (DSCR) to the specified value to control the hardware prefetch engine. For POWER8™ processors, the value is valid only when the optimization level is -O2 or greater; for POWER5, POWER6®, and POWER7® processors, the value is valid only when the optimization level is -O3 or greater and the high-order transformation (HOT) is in effect. The default value of dscr is 0.
value

The value that you specify for dscr must be 0 or greater, and representable as a 64-bit unsigned integer. Otherwise, the compiler issues a warning message and sets dscr to 0. The compiler accepts both decimal and hexadecimal numbers, and a hexadecimal number requires the prefix of 0x. The value range depends on your system architecture. See the product information about the POWER® Architecture for details. If you specify multiple values, the last one takes effect.

Usage

The -qnoprefetch option does not prevent built-in functions such as __prefetch_by_stream from generating prefetch instructions.

When you run -qprefetch=assistthread, the compiler uses the delinquent load information to perform analysis and generates prefetching assist threads. The delinquent load information can either be provided through the built-in __mem_delay function (const void *delinquent_load_address, const unsigned int delay_cycles), or gathered from dynamic profiling using -qpdf1=level=2.

When you use -qpdf to call -qprefetch=assistthread, you must use the traditional two-step PDF invocation:
  1. Run -qpdf1=level=2
  2. Run -qpdf2 -qprefetch=assistthread

Example

DO i = 1, 1000

!IBM* MEM_DELAY(x(i), 10)
x(i) = x(i) + 1

END DO

Examples

DO I = 1, 1000

!IBM* MEM_DELAY(X(I), 10)

X(I) = X(I) + 1

END DO

Related information