None.
Inserts prefetch instructions automatically where there are opportunities to improve code performance.
When -qprefetch is in effect, the compiler may insert prefetch instructions in compiled code. When -qnoprefetch is in effect, prefetch instructions are not inserted in compiled code.
.-:-----------------------------------. V | .-prefetch----+---------------------------------+-+-. | | .-noassistthread-----------. | | | +-=--+-assistthread--=--+-SMT-+-+-+ | | | '-CMP-' | | | | .-noaggressive-. | | | +-=--+-aggressive---+-------------+ | | '-=--dscr--=--value---------------' | >>- -q--+-noprefetch----------------------------------------+--><
-qprefetch=noassistthread:noaggressive:dscr=0
The value that you specify for dscr must be 0 or greater, and representable as a 64-bit unsigned integer. Otherwise, the compiler issues a warning message and sets dscr to 0. The compiler accepts both decimal and hexadecimal numbers, and a hexadecimal number requires the prefix of 0x. The value range depends on your system architecture. See the product information about the POWER® Architecture for details. If you specify multiple values, the last one takes effect.
The -qnoprefetch option does not prevent built-in functions such as __prefetch_by_stream from generating prefetch instructions.
When you run -qprefetch=assistthread, the compiler uses the delinquent load information to perform analysis and generates prefetching assist threads. The delinquent load information can either be provided through the built-in __mem_delay function (const void *delinquent_load_address, const unsigned int delay_cycles), or gathered from dynamic profiling using -qpdf1=level=2.
Here is how you generate code using assist threads with __MEM_DELAY:
int y[64], x[1089], w[1024];
void foo(void){
int i, j;
for (i = 0; i &l; 64; i++) {
for (j = 0; j < 1024; j++) {
/* what to prefetch? y[i]; inserted by the user */
__mem_delay(&y[i], 10);
y[i] = y[i] + x[i + j] * w[j];
x[i + j + 1] = y[i] * 2;
}
}
}
void foo@clone(unsigned thread_id, unsigned version)
{ if (!1) goto lab_1;
/* version control to synchronize assist and main thread */
if (version == @2version0) goto lab_5;
goto lab_1;
lab_5:
@CIV1 = 0;
do { /* id=1 guarded */ /* ~2 */
if (!1) goto lab_3;
@CIV0 = 0;
do { /* id=2 guarded */ /* ~4 */
/* region = 0 */
/* __dcbt call generated to prefetch y[i] access */
__dcbt(((char *)&y + (4)*(@CIV1)))
@CIV0 = @CIV0 + 1;
} while ((unsigned) @CIV0 < 1024u); /* ~4 */
lab_3:
@CIV1 = @CIV1 + 1;
} while ((unsigned) @CIV1 < 64u); /* ~2 */
lab_1:
return;
}