When optimization is enabled, breaks a stream contained in a for loop into multiple streams.
An unroll factor of 1 disables unrolling.
If number is not specified, the optimizer determines an appropriate unrolling factor for each nested loop.
To enable stream unrolling, you must specify -qhot and -qstrict, or -qsmp, or use optimization level -O4 or higher. If -qstrict is in effect, no stream unrolling takes place.
For stream unrolling to occur, the #pragma stream_unroll directive must be the last pragma specified preceding a for loop. Specifying #pragma stream_unroll more than once for the same for loop or combining it with other loop unrolling pragmas (#pragma unroll, #pragma nounroll, #pragma unrollandfuse, #pragma nounrollandfuse) results in a warning.
int i, m, n;
int a[1000];
int b[1000];
int c[1000];
....
#pragma stream_unroll(4)
for (i=0; i<n; i++) {
a[i] = b[i] * c[i];
}
The unroll factor of 4 reduces the number of iterations
from n to n/4, as follows: m = n/4;
for (i=0; i<n/4; i++){
a[i] = b[i] + c[i];
a[i+m] = b[i+m] + c[i+m];
a[i+2*m] = b[i+2*m] + c[i+2*m];
a[i+3*m] = b[i+3*m] + c[i+3*m];
}
The increased number of read and store operations are
distributed among a number of streams determined by the compiler,
which reduces computation time and increase performance.