STREAM_UNROLL

Purpose

The STREAM_UNROLL directive instructs the compiler to apply the combined functionality of software prefetch and loop unrolling to DO loops with a large iteration count. Stream unrolling functionality optimizes DO loops to use multiple streams. You can specify the STREAM_UNROLL directive for both inner and outer DO loops, and the compiler will use an optimal number of streams to perform stream unrolling where applicable. Applying the STREAM_UNROLL directive to a loop with dependencies will produce unexpected results.

Syntax

Read syntax diagramSkip visual syntax diagram
>>---STREAM_UNROLL--+---------------------+--------------------><
                    '-(--unroll_factor--)-'     

unroll_factor
The unroll_factor must be a positive scalar integer constant expression. An unroll_factor of 1 disables loop unrolling. If you do not specify an unroll_factor, the compiler determines the optimal number to perform stream unrolling.

Rules

You must specify one of the following compiler options to enable loop unrolling:
  • –O3 or higher optimization level
  • -qhot compiler option
  • -qsmp compiler option
Note that if the -qstrict option is in effect, no stream unrolling will occur. If you want to enable stream unrolling with the -qhot option alone, you must also specify -qstrict=none.

The STREAM_UNROLL directive must immediately precede a DO loop.

You must not specify the STREAM_UNROLL directive more than once, or combine the directive with UNROLL, NOUNROLL, UNROLL_AND_FUSE, or NOUNROLL_AND_FUSE directives for the same DO construct.

You must not specify the STREAM_UNROLL directive for a DO WHILE loop or an infinite DO loop.

Examples

The following is an example of how STREAM_UNROLL can increase performance.

     integer, dimension(1000) :: a, b, c
     integer i, m, n

!IBM* stream_unroll(4)
      do i =1, n
        a(i) = b(i) + c(i)
      enddo
     end
An unroll factor reduces the number of iterations from n to n/4, as follows:
m = n/4
do  i =1, n/4
    a(i) = b(i) + c(i)
    a(i+m) = b(i+m) + c(i+m)
    a(i+2*m) = b(i+2*m) + c(i+2*m)
    a(i+3*m) = b(i+3*m) + c(i+3*m)
enddo

The increased number of read and store operations are distributed among a number of streams determined by the compiler, reducing computation time and boosting performance.

Related information