The ASSERT directive provides the compiler with the characteristics of DO loops that can assist in optimizing source code.
The ASSERT directive takes effect when you specify the -qhot or -qsmp compiler options.
The first noncomment line (not including other directives) following the ASSERT directive must be a DO loop. This line cannot be an infinite DO or DO WHILE loop. The ASSERT directive applies only to the DO loop immediately following the directive, and not to any nested DO loops.
ITERCNT provides an estimate to the optimizer about roughly how many iterations the DO loop will typically run. There is no requirement that the value be accurate; ITERCNT will only affect performance, never correctness.
When NODEPS is specified, the user is explicitly declaring to the compiler that no loop-carried dependencies exist within the DO loop or any procedures invoked from within the DO loop. A loop-carried dependency involves two iterations within a DO loop interfering with one another. Interference occurs in the following situations:
It is possible for two complementary ASSERT directives to apply to any given DO loop. However, an ASSERT directive cannot be followed by a contradicting ASSERT directive for a given DO loop:
!IBM* ASSERT (ITERCNT(10)) !IBM* INDEPENDENT, REDUCTION (A) !IBM* ASSERT (ITERCNT(20)) ! invalid DO I = 1, N A(I) = A(I) * I END DO
In the example above, the ASSERT(ITERCNT(20)) directive contradicts the ASSERT(ITERCNT(10)) directive and is invalid.
The ASSERT directive overrides the -qassert compiler option for the DO loop on which the ASSERT directive is specified.
Example 1:
! An example of the ASSERT directive with NODEPS. PROGRAM EX1 INTEGER A(100) !IBM* ASSERT (NODEPS) DO I = 1, 100 A(I) = A(I) * FNC1(I) END DO END PROGRAM EX1 FUNCTION FNC1(I) FNC1 = I * I END FUNCTION FNC1
Example 2:
! An example of the ASSERT directive with NODEPS and ITERCNT. SUBROUTINE SUB2 (N) INTEGER A(N) !IBM* ASSERT (NODEPS,ITERCNT(100)) DO I = 1, N A(I) = A(I) * FNC2(I) END DO END SUBROUTINE SUB2 FUNCTION FNC2 (I) FNC2 = I * I END FUNCTION FNC2
The BLOCK_LOOP directive allows you to exert greater control over optimizations on a specific DO loop inside a loop nest. Using a technique called blocking, the BLOCK_LOOP directive separates large iteration count DO loops into smaller iteration groups. Execution of these smaller groups can increase the efficiency of cache space use and augment performance.
Applying BLOCK_LOOP to a loop with dependencies, or a loop with alternate entry or exit points will produce unexpected results.
The BLOCK_LOOP directive takes effect only when the -qhot, -qipa, or -qsmp compiler option is specified.
If you do not specify name, blocking occurs on the first DO loop immediately following the BLOCK_LOOP directive.
For loop blocking to occur, a BLOCK_LOOP directive must immediately precede a DO loop.
You must not specify the BLOCK_LOOP directive more than once.
You must not specify the BLOCK_LOOP directive for a DO WHILE loop or an infinite DO loop.
! Loop Tiling for Multi-level Memory Heirarchy INTEGER :: M, N, i, j, k M = 1000 N = 1000 !IBM* BLOCK_LOOP(L3_cache_size, L3_cache_block) do i = 1, N !IBM* LOOPID(L3_cache_block) !IBM* BLOCK_LOOP(L2_cache_size, L2_cache_block) do j = 1, N !IBM* LOOPID(L2_cache_block) do k = 1, M do l = 1, M . . . end do end do end do end do end ! The compiler generated code would be equivalent to: do index1 = 1, M, L3_cache_size do i = 1, N do index2 = index1, min(index1 + L3_cache_size, M), L2_cache_size do j = 1, N do k = index2, min(index2 + L2_cache_size, M) do l = 1, M . . . end do end do end do end do end do end do
When the CNCALL directive is placed before a DO loop, you are explicitly declaring to the compiler that no loop-carried dependencies exist within any procedure called from the DO loop.
This directive only takes effect if you specify either the -qsmp or -qhot compiler option.
The first noncomment line (not including other directives) that is following the CNCALL directive must be a DO loop. This line cannot be an infinite DO or DO WHILE loop. The CNCALL directive applies only to the DO loop that is immediately following the directive and not to any nested DO loops.
When specifying the CNCALL directive, you are explicitly declaring to the compiler that no procedures invoked within the DO loop have any loop-carried dependencies. If the DO loop invokes a procedure, separate iterations of the loop must be able to concurrently call that procedure. The CNCALL directive does not assert that other operations in the loop do not have dependencies, it is only an assertion about procedure references.
A loop-carried dependency occurs when two iterations within a DO loop interfere with one another. See the ASSERT directive for the definition of interference.
! An example of CNCALL where the procedure invoked has ! no loop-carried dependency but the code within the ! DO loop itself has a loop-carried dependency. PROGRAM EX3 INTEGER A(100) !IBM* CNCALL DO I = 1, N A(I) = A(I) * FNC3(I) A(I) = A(I) + A(I-1) ! This has loop-carried dependency END DO END PROGRAM EX3 FUNCTION FNC3 (I) FNC3 = I * I END FUNCTION FNC3
The COLLAPSE directive reduces an entire array dimension to a single element by specifying that only the element in the lower bound of an array dimension is accessible. If you do not specify a lower bound, the default lower bound is one.
Used with discretion, the COLLAPSE directive can facilitate an increase in performance by reducing repetitive memory access associated with multiple-dimension arrays.
where collapse_array is:
where expression_list is a comma separated list of expression.
The COLLAPSE directive must contain at least one array.
The COLLAPSE directive applies only to the scoping unit in which it is specified. The declarations of arrays contained in a COLLAPSE directive must appear in the same scoping unit as the directive. An array that is accessible in a scoping unit by use or host association must not specified in a COLLAPSE directive in that scoping unit.
The lowest value you can specify in expression_list is one. The highest value must not be greater than the number of dimensions in the corresponding array.
A single scoping unit can contain multiple COLLAPSE declarations, though you can only specify an array once for a particular scoping unit.
You can not specify an array in both a COLLAPSE directive and an EQUIVALENCE statement.
You can not use the COLLAPSE directive with arrays that are components of derived types.
If you apply both the COLLAPSE and SUBSCRIPTORDER directives to an array, you must specify the SUBSCRIPTORDER directive first.
The COLLAPSE directive applies to:
Example 1: In the following example, the COLLAPSE directive is applied to the explicit-shape arrays A and B. Referencing A(m,2:100,2:100) and B(m,2:100,2:100) in the inner loops, become A(m,1,1) and B(m,1,1).
!IBM* COLLAPSE(A(2,3),B(2,3)) REAL*8 A(5,100,100), B(5,100,100), c(5,100,100) DO I=1,100 DO J=1,100 DO M=1,5 A(M,J,I) = SIN(C(M,J,I)) B(M,J,I) = COS(C(M,J,I)) END DO DO M=1,5 DO N=1,M C(M,J,I) = C(M,J,I) + A(N,J,I)*B(6-N,J,I) END DO END DO END DO END DO END
For more information on the SUBSCRIPTORDER directive, see SUBSCRIPTORDER
EJECT directs the compiler to start a new full page of the source listing. If there has been no source listing requested, the compiler will ignore this directive.
The EJECT compiler directive can have an inline comment and a label. However, if you specify a statement label, the compiler discards it. Therefore, you must not reference any label on an EJECT directive. An example of using the directive would be placing it before a DO loop that you do not want split across pages in the listing. If you send the source listing to a printer, the EJECT directive provides a page break.
The INCLUDE compiler directive inserts a specified statement or a group of statements into a program unit.
You are not required to specify the full path of the desired file, but must specify the file extension if one exists.
name must contain only characters allowable in the XL Fortran character set. See Characters for the character set supported by XL Fortran.
char_literal_constant is a character literal constant.
Conditional include allows you to selectively activate INCLUDE directives within Fortran source during compilation. Specify the files to include using the -qci compiler option.
In fixed source form, the INCLUDE compiler directive must start after column 6, and can have a label.
You can add an inline comment to the INCLUDE line.
An included file can contain any complete Fortran source statements and compiler directives, including other INCLUDE compiler directives. Recursive INCLUDE compiler directives are not allowed. An END statement can be part of the included group. The first and last included lines must not be continuation lines. The statements in the include file are processed with the source form of the including file.
If the SOURCEFORM directive appears in an include file, the source form reverts to that of the including file once processing of the include file is complete. After the inclusion of all groups, the resulting Fortran program must follow all of the Fortran rules for statement order.
For an INCLUDE compiler directive with the left and right parentheses syntax, XL Fortran translates the file name to lowercase unless the -qmixed compiler option is on.
The file system locates the specified filename as follows:
INCLUDE '/u/userid/dc101' ! full absolute file name specified INCLUDE '/u/userid/dc102.inc' ! INCLUDE file name has an extension INCLUDE 'userid/dc103' ! relative path name specified
INCLUDE (ABCdef) ! includes file abcdef
INCLUDE '../Abc' ! includes file Abc from parent directory ! of directory being searched
The INDEPENDENT directive, if used, must precede a DO loop, FORALL statement, or FORALL construct. It specifies that each operation in the FORALL statement or FORALL construct, can be executed in any order without affecting the semantics of the program. It also specifies each iteration of the DO loop, can be executed without affecting the semantics of the program.
This directive only takes effect if you specify either the -qsmp or -qhot compiler option.
The first noncomment line (not including other directives) following the INDEPENDENT directive must be a DO loop, FORALL statement, or the first statement of a FORALL construct. This line cannot be an infinite DO or DO WHILE loop. The INDEPENDENT directive applies only to the DO loop that is immediately following the directive and not to any nested DO loops.
An INDEPENDENT directive can have at most one NEW clause and at most one REDUCTION clause.
If the directive applies to a DO loop, no iteration of the loop can interfere with any other iteration. Interference occurs in the following situations:
If the NEW clause is specified, the directive must apply to a DO loop. The NEW clause modifies the directive and any surrounding INDEPENDENT directives by accepting any assertions made by such directive(s) as true. It does this even if the variables specified in the NEW clause are modified by each iteration of the loop. Variables specified in the NEW clause behave as if they are private to the body of the DO loop. That is, the program is unaffected if these variables (and any variables associated with them) were to become undefined both before and after each iteration of the loop.
Any variable you specify in the NEW clause or REDUCTION clause must not:
For FORALL, no combination of index values affected by the INDEPENDENT directive assigns to an atomic storage unit that is required by another combination. If a DO loop, FORALL statement, or FORALL construct all have the same body and each is preceded by an INDEPENDENT directive, they behave the same way.
The REDUCTION clause asserts that updates to named variables will occur within REDUCTION statements in the INDEPENDENT loop. Furthermore, the intermediate values of the REDUCTION variables are not used within the parallel section, other than in the updates themselves. Thus, the value of the REDUCTION variable after the construct is the result of a reduction tree.
If you specify the REDUCTION clause, the directive must apply to a DO loop. The only reference to a REDUCTION variable in an INDEPENDENT DO loop must be within a reduction statement.
A REDUCTION variable must be of intrinsic type, but must not be of type character. A REDUCTION variable must not be an allocatable array.
A REDUCTION variable must not occur in:
A REDUCTION statement can have one of the following forms:
>>-reduction_var_ref--=--expr--reduction_op--reduction_var_ref-><
>>-reduction_var_ref--=--reduction_var_ref--reduction_op--expr-><
>>-reduction_var_ref =--reduction_function--(expr,--reduction_var_ref)-><
>>-reduction_var_ref =--reduction_function--(reduction_var_ref,--expr)-><
where:
The following rules apply to REDUCTION statements:
Example 1:
INTEGER A(10),B(10,12),F !IBM* INDEPENDENT ! The NEW clause cannot be FORALL (I=1:9:2) A(I)=A(I+1) ! specified before a FORALL !IBM* INDEPENDENT, NEW(J) DO M=1,10 J=F(M) ! 'J' is used as a scratch A(M)=J*J ! variable in the loop !IBM* INDEPENDENT, NEW(N) DO N=1,12 ! The first executable statement B(M,N)=M+N*N ! following the INDEPENDENT must END DO ! be either a DO or FORALL END DO END
Example 2:
X=0 !IBM* INDEPENDENT, REDUCTION(X) DO J = 1, M X = X + J**2 END DO
Example 3:
INTEGER A(100), B(100, 100) !IBM* INDEPENDENT, REDUCTION(A), NEW(J) ! Example showing an array used DO I=1,100 ! for a reduction variable DO J=1, 100 A(I)=A(I)+B(J, I) END DO END DO
The #line directive associates code that is created by cpp or any other Fortran source code generator with input code created by the programmer. Because the preprocessor may cause lines of code to be inserted or deleted, the #line directive can be useful in error reporting and debugging because it identifies which lines in the original source caused the preprocessor to generate the corresponding lines in the intermediate file.
The #line directive is a noncomment directive and follows the syntax rules for this type of directive.
The #line directive follows the same rules as other noncomment directives, with the following exceptions:
The #line directive indicates the origin of all code following the directive in the current file. Another #line directive will override a previous one.
If you supply a filename, the subsequent code in the current file will be as if it originated from that filename. If you omit the filename, and no previous #line directive with a specified filename exists in the current file, the code in the current file is treated as if it originated from the current file at the line number specified. If a previous #line directive with a specified filename does exist in the current file, the filename from the previous directive is used.
line_number indicates the position, in the appropriate file, of the line of code following the directive. Subsequent lines in that file are assumed to have a one to one correspondence with subsequent lines in the source file until another #line directive is specified or the file ends.
When XL Fortran invokes cpp for a file, the preprocessor will emit #line directives unless you also specify the -d option.
The file test.F contains:
! File test.F, Line 1 #include "test.h" PRINT*, "test.F Line 3" ... PRINT*, "test.F Line 6" #include "test.h" PRINT*, "test.F Line 8" END
The file test.h contains:
! File test.h line 1 RRINT*,1 ! Syntax Error PRINT*,2
After the C preprocessor processes the file test.F with the default options:
#line 1 "test.F" ! File test.F, Line 1 #line 1 "test.h" ! File test.h Line 1 RRINT*,1 ! Syntax Error PRINT*,2 #line 3 "test.F" PRINT*, "test.F Line 3" ... #line 6 PRINT*, "test.F Line 6" #line 1 "test.h" ! File test.h Line 1 RRINT*,1 ! Syntax Error PRINT*,2 #line 8 "test.F" PRINT*, "test.F Line 8" END
The compiler displays the following messages after it processes the file that is created by the C preprocessor:
2 2 |RRINT*,1 !Syntax error ......a................ a - "test.h", line 2.6: 1515-019 (S) Syntax is incorrect. 4 2 |RRINT*,1 !Syntax error ......a................ a - "test.h", line 2.6: 1515-019 (S) Syntax is incorrect.
The LOOPID directive allows you to assign a unique identifier to loop within a scoping unit. You can use the identifier to direct loop transformations. The –qreport compiler option can use the identifier you create to provide reports on loop transformations.
The LOOPID directive must immediately precede a BLOCK_LOOP directive or DO construct.
You must not specify a LOOPID directive more than once for a given loop.
You must not specify a LOOPID directive for DO constructs without control statements, DO WHILE constructs, or an infinite DO.
The MEM_DELAY directive specifies how many delay cycles there will be for specific loads, these specific loads are a delinquent loads.
A delinquent load has a long memory access latency due to cache misses.
When you specify which load is a delinquent load with MEM_DELAY the compiler may take that information and carry out optimizations such as data prefetch.
The MEM_DELAY directive is placed immediately before a statement which contains a specified memory reference.
cycles must be a compile time constant, typically either L1 miss latency or L2 miss latency.
program mem1 integer::i,n integer::a(20),b(400) n=20 do i=1,n !IBM* mem_delay(b(n*i),10) a(i)=b(n*i) end do; end
Use the NEW directive to specify which variables should be local in a PARALLEL DO loop or a PARALLEL SECTIONS construct. This directive performs the same function as the PRIVATE clause of the PARALLEL DO directive and PARALLEL SECTIONS directive.
The NEW directive only takes effect if you specify the -qsmp compiler option.
The NEW directive must immediately follow either a PARALLEL DO directive or a PARALLEL SECTIONS directive.
If you specify the NEW directive, you must specify the corresponding PARALLEL DO or PARALLEL SECTIONS directive with no clauses.
If the NEW directive follows the PARALLEL DO directive, the first noncomment line (not including other directives) following the NEW directive must be a DO loop. This line cannot be an infinite DO or DO WHILE loop.
A variable name in the named_variable_list of the NEW directive has the same restrictions as a variable name appearing in the PRIVATE clause of the PARALLEL DO directive or a PRIVATE clause of the PARALLEL SECTIONS directive. See the sections on the PARALLEL DO directive and the PARALLEL SECTIONS constructin the XL Fortran Optimization and Programming Guide.
INTEGER A(10), C(10) REAL B(10) INTEGER FUNC(100) !SMP$ PARALLEL DO !SMP$ NEW I, TMP DO I = 1, 10 TMP = A(I) + COS(B(I)) C(I) = TMP + FUNC(I) END DO
The NOSIMD directive prohibits the compiler from automatically generating Vector Multimedia eXtension (VMX) instructions in the loop immediately following the directive, or in the FORALL construct.
The first noncomment line (not including other directives) following the NOSIMD directive must be a DO loop, FORALL statement, or the first statement of a FORALL construct. This line cannot be an infinite DO or DO WHILE loop. The NOSIMD directive applies only to the DO loop that is immediately following the directive and does not apply to any nested DO loops.
You can use the NOSIMD directive together with loop optimization and SMP directives.
SUBROUTINE VEC (A, B) REAL*8 A(200), B(200) !IBM* NOSIMD FORALL (N = 1:200), B(N) = B(N) / A(N) END SUBROUTINE
Refer to the -qhot=simd compiler option for information on controlling VMX support for an entire application.
The NOVECTOR directive prohibits the compiler from auto-vectorizing the loop immediately following the directive. Auto-vectorization involves converting certain operations performed in a loop and on successive array elements into a call to a routine that computes several results simultaneously.
The first noncomment line (not including other directives) following the NOVECTOR directive must be a DO loop, FORALL statement, or a FORALL construct. This line cannot be an infinite DO or DO WHILE loop. The NOVECTOR directive applies only to the DO loop, FORALL statement or the FORALL construct that is immediately following the directive and does not apply to any nested DO loops, or nested FORALL construct or statement.
You can use the NOVECTOR directive together with loop optimization and SMP directives.
SUBROUTINE VEC (A, B) REAL*8 A(200), B(200) !IBM* NOVECTOR DO N = 1, 200 B(N) = B(N) / A(N) END DO END SUBROUTINE
Refer to the -qhot=vector compiler option for information on controlling auto-vectorization for an entire application.
The PERMUTATION directive specifies that the elements of each array that is listed in the integer_array_name_list have no repeated values. This directive is useful when you use array elements as subscripts for other array references.
The PERMUTATION directive only takes effect if you specify either the -qsmp or -qhot compiler option.
The first noncomment line (not including other directives) that is following the PERMUTATION directive must be a DO loop. This line cannot be an infinite DO or DO WHILE loop. The PERMUTATION directive applies only to the DO loop that is immediately following the directive, and not to any nested DO loops.
PROGRAM EX3 INTEGER A(100), B(100) !IBM* PERMUTATION (A) DO I = 1, 100 A(I) = I B(A(I)) = B(A(I)) + A(I) END DO END PROGRAM EX3
The @PROCESS directive allows you to specify at the source level that a compiler option affects only an individual compilation unit. The directive can override options that you include in the configuration file, in the default settings, or on the command line. Refer to the XL Fortran Compiler Reference for information on limitations or restrictions for specifying a particular compiler option at the source level.
.-+---+----------------------------. | '-,-' | V | >>-@PROCESS----option--+----------------------+-+-------------->< '-(--suboption_list--)-'
In fixed source form, the @PROCESS directive can start in column 1 or after column 6. In free source form, the @PROCESS directive can start in any column.
You cannot place a statement label or inline comment on the same line as an @PROCESS compiler directive.
By default, any option settings you designate with the @PROCESS compiler directive are effective only for the compilation unit in which the statement appears. If the file has more than one compilation unit, the option returns to the original setting before compilation of the text unit. Trigger constants you specify using the DIRECTIVE option are in effect until the end of the file, or until processing NODIRECTIVE.
The @PROCESS compiler directive must appear before the first statement of a compilation unit. The only exceptions are for SOURCE and NOSOURCE compiler options, which you can specify in @PROCESS directives anywhere within the compilation unit.
See Compiler Option Details in the XL Fortran Compiler Reference for details on compiler options.
You can use the SNAPSHOT directive to specify a safe location where a breakpoint can be set with a debug program, and provide a set of variables that must remain visible to the debug program. The SNAPSHOT directive provides support for the -qsmp compiler option, though you can use it in a non-multi-threaded program.
There can be a small reduction in performance at the point where the SNAPSHOT directive is set, because the variables must be kept in memory for the debug program to access. Variables made visible by the SNAPSHOT directive are read-only. Undefined behavior will occur if these variables are modified through the debugger. Use with discretion.
At high optimization levels, the SNAPSHOT directive does not consistently preserve the contents of variables with a static storage class.
To use the SNAPSHOT directive, you must specify the -qdbg compiler option at compilation.
Example 1: In the following example, the SNAPSHOT directive is used to monitor the value of private variables.
INTEGER :: IDX INTEGER :: OMP_GET_NUM_THREADS, OMP_GET_THREAD_NUM INTEGER, ALLOCATABLE :: ARR(:) ! ... !$OMP PARALLEL, PRIVATE(IDX) !$OMP MASTER ALLOCATE(ARR(OMP_GET_NUM_THREADS())) !$OMP END MASTER !$OMP BARRIER IDX = OMP_GET_THREAD_NUM() + 1 !IBM* SNAPSHOT(IDX) ! The PRIVATE variable IDX is made visible ! to the debugger. ARR(IDX) = 2*IDX + 1 !$OMP END PARALLEL
Example 2: In the following example, the SNAPSHOT directive is used to monitor the intermediate values in debugging the program.
SUBROUTINE SHUFFLE(NTH, XDAT) INTEGER, INTENT(IN) :: NTH REAL, INTENT(INOUT) :: XDAT(:) INTEGER :: I_TH, IDX, PART(1), I, J, LB, UB INTEGER :: OMP_GET_THREAD_NUM INTEGER(8) :: Y=1 REAL :: TEMP CALL OMP_SET_NUM_THREADS(NTH) PART = UBOUND(XDAT)/NTH !$OMP PARALLEL, PRIVATE(NUM_TH, I, J, LB, UB, IDX, TEMP), SHARED(XDAT) NUM_TH = OMP_GET_THREAD_NUM() + 1 LB = (NUM_TH - 1)*PART(1) + 1 UB = NUM_TH*PART(1) DO I=LB, UB !$OMP CRITICAL Y = MOD(65539_8*y, 2_8**31) IDX = INT(REAL(Y)/REAL(2_8**31)*(UB - LB) + LB) !SMP$ SNAPSHOT(i, y, idx, num_th, lb, ub) !$OMP END CRITICAL TEMP = XDAT(I) XDAT(I) = XDAT(IDX) XDAT(IDX) = TEMP ENDDO !SMP$ SNAPSHOT(TEMP) ! The user can examine the value ! of the TEMP variable !$OMP END PARALLEL END
See the XL Fortran Compiler Reference for details on the -qdbg compiler option.
The SOURCEFORM compiler directive indicates that all subsequent lines are to be processed in the specified source form until the end of the file is reached or until an @PROCESS directive or another SOURCEFORM directive specifies a different source form.
The SOURCEFORM directive can appear anywhere within a file. An include file is compiled with the source form of the including file. If the SOURCEFORM directive appears in an include file, the source form reverts to that of the including file once processing of the include file is complete.
The SOURCEFORM directive cannot specify a label.
To modify your existing files to Fortran 90 free source form where include files exist:
!CONVERT*SOURCEFORM (FREE(F90))Define your own trigger_constant for this conversion process.
@PROCESS DIRECTIVE(CONVERT*) PROGRAM MAIN ! Main program not yet converted A=1; B=2 INCLUDE 'freeform.f' PRINT *, RESULT ! Reverts to fixed form END
where file freeform.f contains:
!CONVERT* SOURCEFORM(FREE(F90)) RESULT = A + B
The STREAM_UNROLL directive instructs the compiler to apply the combined functionality of software prefetch and loop unrolling to DO loops with a large iteration count. Stream unrolling functionality is available only on POWER4™ platforms or higher, and optimizes DO loops to use multiple streams. You can specify the STREAM_UNROLL directive for both inner and outer DO loops, and the compiler will use an optimal number of streams to perform stream unrolling where applicable. Applying the STREAM_UNROLL directive to a loop with dependencies will produce unexpected results.
You must specify one of the following compiler options to enable loop unrolling:
Note that if the -qstrict option is in effect, no stream unrolling will occur. If you want to enable stream unrolling with the -qhot option alone, you must also specify -qnostrict.
The STREAM_UNROLL directive must immediately precede a DO loop.
You must not specify the STREAM_UNROLL directive more than once, or combine the directive with UNROLL, NOUNROLL, UNROLL_AND_FUSE, or NOUNROLL_AND_FUSE directives for the same DO construct.
You must not specify the STREAM_UNROLL directive for a DO WHILE loop or an infinite DO loop.
The following is an example of how STREAM_UNROLL can increase performance.
integer, dimension(1000) :: a, b, c integer i, m, n !IBM* stream_unroll(4) do i =1, n a(i) = b(i) + c(i) enddo end
An unroll factor reduces the number of iterations from n to n/4, as follows:
m = n/4 do i =1, n/4 a(i) = b(i) + c(i) a(i+m) = b(i+m) + c(i+m) a(i+2*m) = b(i+2*m) + c(i+2*m) a(i+3*m) = b(i+3*m) + c(i+3*m) enddo
The increased number of read and store operations are distributed among a number of streams determined by the compiler, reducing computation time and boosting performance.
The SUBSCRIPTORDER directive rearranges the subscripts of an array. This results in a new array shape, since the directive changes the order of array dimensions in the declaration. All references to the array are correspondingly rearranged to match the new array shape.
Used with discretion, the SUBSCRIPTORDER directive may improve performance by increasing the number of cache hits and the amount of data prefetching. You may have to experiment with this directive until you find the arrangement that yields the most performance benefits. You may find SUBSCRIPTORDER especially useful when porting code originally intended for a non-cached hardware architecture.
In a cached hardware architecture, such as the PowerPC®, an entire cache line of data is often loaded into the processor in order to access each data element. Changing the storage arrangement can be used to ensure that consecutively accessed elements are stored contiguously. This may result in a performance improvement, as there are more element accesses for each cache line referenced. Additionally, contiguous arrays which are consecutively accessed may help to better exploit the processor's prefetching facility.
where subscriptorder_array is:
The SUBSCRIPTORDER directive must appear in a scoping unit preceding all declarations and references to the arrays in the subscriptorder_array list. The directive only applies to that scoping unit and must contain at least one array. If multiple scoping units share an array, then you must apply the SUBSCRIPTORDER directive to each of the applicable scoping units with identical subscript arrangements. Examples of methods of array sharing between scoping units include COMMON statements, USE statements, and subroutine arguments.
The lowest subscript number in a subscriptorder_number list must be 1. The highest number must be equal to the number of dimensions in the corresponding array. Every integer number between these two limits, including the limits, signifies a subscript number prior to rearrangement and must be included exactly once in the list.
You must not apply a SUBSCRIPTORDER directive multiple times to a particular array in a scoping unit.
You must maintain array shape conformance in passing arrays as actual arguments to elemental procedures, if one of the arrays appears in a SUBSCRIPTORDER directive. You must also adjust the actual arguments of the SHAPE, SIZE, LBOUND, and UBOUND inquiry intrinsic procedures and of most transformational intrinsic procedures.
You must manually modify data in input data files and in explicit initializations for arrays that appear in the SUBSCRIPTORDER directive.
On arrays to which the COLLAPSE directive is also applied, the COLLAPSE directive always refers to the pre-subscriptorder dimension numbers.
You must not rearrange the last dimension of an assumed-size array.
Example 1: In the following example, the SUBSCRIPTORDER directive is applied to an explicit-shape array and swaps the subscripts in every reference to the array, without affecting the program output.
!IBM* SUBSCRIPTORDER(A(2,1)) INTEGER COUNT/1/, A(3,2) DO J = 1, 3 DO K = 1, 2 ! Inefficient coding: innermost index is accessing rightmost ! dimension. The subscriptorder directive compensates by ! swapping the subscripts in the array's declaration and ! access statements. ! A(J,K) = COUNT PRINT*, J, K, A(J,K) COUNT = COUNT + 1 END DO END DO
Without the directive above, the array shape is (3,2) and the array elements would be stored in the following order:
A(1,1) A(2,1) A(3,1) A(1,2) A(2,2) A(3,2)
With the directive, the array shape is (2,3) and the array elements are stored in the following order:
A(1,1) A(2,1) A(1,2) A(2,2) A(1,3) A(2,3)
For more information on the COLLAPSE directive, see COLLAPSE
The UNROLL directive instructs the compiler to attempt loop unrolling where applicable. Loop unrolling replicates the body of the DO loop to reduce the number of iterations required to complete the loop.
You can control loop unrolling for an entire file using the -qunroll compiler option. Specifying the directive for a particular DO loop always overrides the compiler option.
>>-+-UNROLL--+---------------------+-+------------------------->< | '-(--unroll_factor--)-' | '-NOUNROLL------------------------'
You must specify one of the following compiler options to enable loop unrolling:
Note that if the -qstrict option is in effect, no loop unrolling will occur. If you want to enable loop unrolling with the -qhot option alone, you must also specify -qnostrict.
The UNROLL directive must immediately precede a DO loop.
You must not specify the UNROLL directive more than once, or combine the directive with NOUNROLL, STREAM_UNROLL, UNROLL_AND_FUSE, or NOUNROLL_AND_FUSE directives for the same DO construct.
You must not specify the UNROLL directive for a DO WHILE loop or an infinite DO loop.
Example 1: In this example, the UNROLL(2) directive is used to tell the compiler that the body of the loop can be replicated so that the work of two iterations is performed in a single iteration. Instead of performing 1000 iterations, if the compiler unrolls the loop, it will only perform 500 iterations.
!IBM* UNROLL(2) DO I = 1, 1000 A(I) = I END DO
If the compiler chooses to unroll the previous loop, the compiler translates the loop so that it is essentially equivalent to the following:
DO I = 1, 1000, 2 A(I) = I A(I+1) = I + 1 END DO
Example 2: In the first DO loop, UNROLL(3) is used. If unrolling is performed, the compiler will unroll the loop so that the work of three iterations is done in a single iteration. In the second DO loop, the compiler determines how to unroll the loop for maximum performance.
PROGRAM GOODUNROLL INTEGER I, X(1000) REAL A, B, C, TEMP, Y(1000) !IBM* UNROLL(3) DO I = 1, 1000 X(I) = X(I) + 1 END DO !IBM* UNROLL DO I = 1, 1000 A = -I B = I + 1 C = I + 2 TEMP = SQRT(B*B - 4*A*C) Y(I) = (-B + TEMP) / (2*A) END DO END PROGRAM GOODUNROLL
The UNROLL_AND_FUSE directive instructs the compiler to attempt a loop unroll and fuse where applicable. Loop unrolling replicates the body of multiple DO loops and combines the necessary iterations into a single unrolled loop. Using a fused loop can minimize the required number of loop iterations, while reducing the frequency of cache misses. Applying the UNROLL_AND_FUSE directive to a loop with dependencies will produce unexpected results.
>>-+-UNROLL_AND_FUSE--+---------------------+-+---------------->< | '-(--unroll_factor--)-' | '-NOUNROLL_AND_FUSE------------------------'
You must specify one of the following compiler options to enable loop unrolling:
Note that if the -qstrict option is in effect, no loop unrolling will occur. If you want to enable loop unrolling with the -qhot option alone, you must also specify -qnostrict.
The UNROLL_AND_FUSE directive must immediately precede a DO loop.
You must not specify the UNROLL_AND_FUSE directive for the innermost DO loop.
You must not specify the UNROLL_AND_FUSE directive more than once, or combine the directive with NOUNROLL_AND_FUSE, NOUNROLL, UNROLL, or STREAM_UNROLL directives for the same DO construct.
You must not specify the UNROLL_AND_FUSE directive for a DO WHILE loop or an infinite DO loop.
Example 1: In the following example, the UNROLL_AND_FUSE directive replicates and fuses the body of the loop. This reduces the number of cache misses for Array B.
INTEGER, DIMENSION(1000, 1000) :: A, B, C !IBM* UNROLL_AND_FUSE(2) DO I = 1, 1000 DO J = 1, 1000 A(J,I) = B(I,J) * C(J,I) END DO END DO END
The DO loop below shows a possible result of applying the UNROLL_AND_FUSE directive.
DO I = 1, 1000, 2 DO J = 1, 1000 A(J,I) = B(I,J) * C(J,I) A(J,I+1) = B(I+1, J) * C(J, I+1) END DO END DO
Example 2: The following example uses multiple UNROLL_AND_FUSE directives:
INTEGER, DIMENSION(1000, 1000) :: A, B, C, D, H !IBM* UNROLL_AND_FUSE(4) DO I = 1, 1000 !IBM* UNROLL_AND_FUSE(2) DO J = 1, 1000 DO k = 1, 1000 A(J,I) = B(I,J) * C(J,I) + D(J,K)*H(I,K) END DO END DO END DO END