__mem_delay
Purpose
The __mem_delay built-in function specifies how many delay cycles there are for specific loads. These specific loads are delinquent loads with a long memory access latency because of cache misses.
When you specify which load is delinquent the compiler takes that information and carries out optimizations such as data prefetching. In addition, when you run -qprefetch=assistthread, the compiler uses the delinquent load information to perform analysis and generate prefetching assist threads. For more information, see -qprefetch.
Prototype
void* __mem_delay (const void *address, const unsigned int cycles);
Parameters
- address
- The address of the data to be loaded or stored.
- cycles
- A compile time constant, typically either L1 miss latency or L2 miss latency.
Usage
The __mem_delay built-in function is placed immediately before a statement that contains a specified memory reference.
Examples
Here is how you generate code using assist threads with __mem_delay:
int y[64], x[1089], w[1024];
void foo(void){
int i, j;
for (i = 0; i &l; 64; i++) {
for (j = 0; j < 1024; j++) {
/* what to prefetch? y[i]; inserted by the user */
__mem_delay(&y[i], 10);
y[i] = y[i] + x[i + j] * w[j];
x[i + j + 1] = y[i] * 2;
}
}
}
void foo@clone(unsigned thread_id, unsigned version)
{ if (!1) goto lab_1;
/* version control to synchronize assist and main thread */
if (version == @2version0) goto lab_5;
goto lab_1;
lab_5:
@CIV1 = 0;
do { /* id=1 guarded */ /* ~2 */
if (!1) goto lab_3;
@CIV0 = 0;
do { /* id=2 guarded */ /* ~4 */
/* region = 0 */
/* __dcbt call generated to prefetch y[i] access */
__dcbt(((char *)&y + (4)*(@CIV1)))
@CIV0 = @CIV0 + 1;
} while ((unsigned) @CIV0 < 1024u); /* ~4 */
lab_3:
@CIV1 = @CIV1 + 1;
} while ((unsigned) @CIV1 < 64u); /* ~2 */
lab_1:
return;
}