#pragma unrollandfuse

Category

Optimization and tuning

Purpose

Instructs the compiler to attempt an unroll and fuse operation on nested for loops.

Syntax

Read syntax diagramSkip visual syntax diagram
>>-#--pragma--+-nounrollandfuse-----------------+--------------><
              '-unrollandfuse--+--------------+-'   
                               '-(--number--)-'     

Parameters

number
A loop unrolling factor. The value of number is a positive integral constant expression.

If number is not specified, the optimizer determines an appropriate unrolling factor for each nested loop.

Usage

The #pragma unrollandfuse directive applies only to the outer loops of nested for loops that meet the following conditions:
  • There must be only one loop counter variable, one increment point for that variable, and one termination variable. These cannot be altered at any point in the loop nest.
  • Loops cannot have multiple entry and exit points. The loop termination must be the only means to exit the loop.
  • Dependencies in the loop must not be "backwards-looking". For example, a statement such as A[i][j] = A[i -1][j + 1] + 4) must not appear within the loop.

For loop unrolling to occur, the #pragma unrollandfuse directive must precede a for loop. You must not specify #pragma unrollandfuse for the innermost for loop.

You must not specify #pragma unrollandfuse more than once, or combine the directive with #pragma nounrollandfuse, #pragma nounroll, #pragma unroll, or #pragma stream_unroll directives for the same for loop.

Predefined macros

None.

Examples

In the following example, a #pragma unrollandfuse directive replicates and fuses the body of the loop. This reduces the number of cache misses for array b.
int i, j;
int a[1000][1000];
int b[1000][1000];
int c[1000][1000];


....

#pragma unrollandfuse(2)
for (i=1; i<1000; i++) {
    for (j=1; j<1000; j++) {
        a[j][i] = b[i][j] * c[j][i];
    }
}
The for loop below shows a possible result of applying the #pragma unrollandfuse(2) directive to the loop shown above:
for (i=1; i<1000; i=i+2) {
    for (j=1; j<1000; j++) {
        a[j][i] = b[i][j] * c[j][i];
        a[j][i+1] = b[i+1][j] * c[j][i+1];
    }
}
You can also specify multiple #pragma unrollandfuse directives in a nested loop structure.
int i, j, k;
int a[1000][1000];
int b[1000][1000];
int c[1000][1000];
int d[1000][1000];
int e[1000][1000];


....

#pragma unrollandfuse(4)
for (i=1; i<1000; i++) {
#pragma unrollandfuse(2)
    for (j=1; j<1000; j++) {
			for (k=1; k<1000; k++) {
            a[j][i] = b[i][j] * c[j][i] + d[j][k] * e[i][k];
        }
    }
}

Related information