The STREAM_UNROLL directive instructs the compiler to apply the combined functionality of software prefetch and loop unrolling to DO loops with a large iteration count. Stream unrolling functionality optimizes DO loops to use multiple streams. You can specify the STREAM_UNROLL directive for both inner and outer DO loops, and the compiler will use an optimal number of streams to perform stream unrolling where applicable. Applying the STREAM_UNROLL directive to a loop with dependencies will produce unexpected results.
The STREAM_UNROLL directive must immediately precede a DO loop.
You must not specify the STREAM_UNROLL directive more than once, or combine the directive with UNROLL, NOUNROLL, UNROLL_AND_FUSE, or NOUNROLL_AND_FUSE directives for the same DO construct.
You must not specify the STREAM_UNROLL directive for a DO WHILE loop or an infinite DO loop.
The following is an example of how STREAM_UNROLL can increase performance.
integer, dimension(1000) :: a, b, c
integer i, m, n
!IBM* stream_unroll(4)
do i =1, n
a(i) = b(i) + c(i)
enddo
end
m = n/4
do i =1, n/4
a(i) = b(i) + c(i)
a(i+m) = b(i+m) + c(i+m)
a(i+2*m) = b(i+2*m) + c(i+2*m)
a(i+3*m) = b(i+3*m) + c(i+3*m)
enddo
The increased number of read and store operations are distributed among a number of streams determined by the compiler, reducing computation time and boosting performance.