Purpose
Enables parallelization of program code.
Syntax
.-nosmp-------------------------------------------------------.
>>- -q--+-smp--+----------------------------------------------------+-+-><
| .-:-------------------------------------------. |
| | .-nostackcheck----------------------------. | |
| | +-ostls-----------------------------------+ | |
| | +-opt-------------------------------------+ | |
| | +-norec_locks-----------------------------+ | |
| | +-noomp-----------------------------------+ | |
| | +-nonested_par----------------------------+ | |
| V +-auto------------------------------------+ | |
'-=----+-omp-------------------------------------+-+-'
+-noostls---------------------------------+
+-nested_par------------------------------+
+-noauto----------------------------------+
+-noopt-----------------------------------+
+-rec_locks-------------------------------+
| .-auto-------------------. |
+-schedule--=--+-runtime----------------+-+
| '-+-affinity-+--+------+-' |
| +-dynamic--+ '-=--n-' |
| +-guided---+ |
| '-static---' |
+-stackcheck------------------------------+
'-threshold--+------+---------------------'
'-=--n-'
Defaults
-qnosmp. Code
is produced for a uniprocessor machine.
Parameters
- auto | noauto
- Enables or disables automatic parallelization and optimization
of program code. By default, the compiler tries
to parallelize explicitly coded DO loops and those that are generated
by the compiler for processing arrays. When noauto is in effect, only program
code explicitly parallelized with OpenMP
directives is optimized. noauto is implied if you specify -qsmp=omp or -qsmp=noopt.
- nested_par | nonested_par
- By default, the compiler serializes a nested parallel construct.
When nested_par is in effect, the compiler parallelizes prescriptive
nested parallel constructs. This includes not only the loop constructs
that are nested within a scoping unit but also parallel constructs
in subprograms that are referenced (directly or indirectly) from within
other parallel constructs. Note that this suboption has no effect
on loops that are automatically parallelized. In this case, at most
one loop in a loop nest (in a scoping unit) will be parallelized.
The setting of the omp_set_nested routine or of the OMP_NESTED environment
variable overrides the setting of the -qsmp = nested_par | nonested_par option.
This
suboption should be used with caution. Depending on the number of
threads available and the amount of work in an outer loop, inner loops
could be executed sequentially even if this option is in effect. Parallelization
overhead may not necessarily be offset by program performance gains.
Note: The -qsmp=nested_par | nonested_par option has
been deprecated and might be removed in a future release. Use the OMP_NESTED environment
variable or the omp_set_nested routine instead.
- omp | noomp
- Enforces or relaxes strict compliance with the
OpenMP standard. When noomp is in effect, auto is
implied. When omp is in effect, noauto is implied
and only OpenMP parallelization directives are recognized. The compiler
issues warning messages if your code contains any language constructs
that do not conform to the OpenMP API.
Note: The -qsmp=omp option
must be used to enable OpenMP parallelization.
Specifying
omp also has the following effects:
- Automatic parallelization is disabled.
- All previously recognized directive triggers are ignored. The
only recognized directive trigger is $OMP. However, you can specify
additional triggers on subsequent -qdirective options.
- The -qcclines compiler option is enabled.
- When the C preprocessor is invoked, the _OPENMP C
preprocessor macro is defined based on the latest OpenMP API
specification that
XL Fortran supports. This macro is useful in supporting conditional
compilation. See Conditional
Compilation for
more information.
- opt | noopt
- Enables or disables optimization of parallelized program code.
When noopt is in effect, the compiler will do the smallest
amount of optimization that is required to parallelize the code. This
is useful for debugging because -qsmp enables the -O2 and -qhot options
by default, which may result in the movement of some variables into
registers that are inaccessible to the debugger. However, if the -qsmp=noopt and -g options
are specified, these variables will remain visible to the debugger.
- ostls| noostls
- Enables thread-local storage (TLS) provided by the operating system
to be used for threadprivate data. You can use the noostls suboption
to enable the non-TLS for threadprivate. The noostls suboption
is provided for compatibility with earlier versions.
Note: If
you want to use this suboption, your operating system must support
TLS to implement OpenMP threadprivate data. Use noostls to
disable OS level TLS if your operating system does not support it.
- rec_locks | norec_locks
- Determines whether recursive locks are used to
avoid problems associated with CRITICAL constructs. When rec_locks is
in effect, nested critical sections will not cause a deadlock; a thread can enter a CRITICAL construct from within
the dynamic extent of another CRITICAL construct that has the same
name. Note that the rec_locks suboption specifies behavior
for critical constructs that is inconsistent with the OpenMP API.
- schedule
- Specifies the type of scheduling algorithms and, except in the
case of auto, chunk size (n) that are used
for loops to which no other scheduling algorithm has been explicitly
assigned in the source code. Suboptions of the schedule suboption
are as follows:
- affinity[=n]
- The iterations of a loop are initially divided into n partitions,
containing ceiling(number_of_iterations/number_of_threads)
iterations. Each partition is initially assigned to a thread and is
then further subdivided into chunks that each contain n iterations.
If n is not specified, then the chunks consist of ceiling(number_of_iterations_left_in_partition /
2) loop iterations.
When a thread becomes free, it takes the next
chunk from its initially assigned partition. If there are no more
chunks in that partition, then the thread takes the next available
chunk from a partition initially assigned to another thread.
The
work in a partition initially assigned to a sleeping thread will be
completed by threads that are active.
The affinity scheduling
type is not part of the OpenMP API specification.
Note: This suboption has been deprecated. You can use the OMP_SCHEDULE environment
variable with the dynamic clause for a similar
functionality.
- auto
- Scheduling of the loop iterations is delegated to the compiler
and runtime systems. The compiler and runtime system can choose any
possible mapping of iterations to threads (including all possible
valid schedule types) and these might be different in different loops.
Do not specify chunk size (n).
- dynamic[=n]
- The iterations of a loop is divided into chunks that contain n iterations
each. If n is not specified, each chunk contains
one iteration.
Active threads are assigned these chunks on
a "first-come, first-do" basis. Chunks of the remaining work are
assigned to available threads until all work has been assigned.
- guided[=n]
- The iterations of a loop are divided into progressively smaller
chunks until a minimum chunk size of n loop iterations is reached.
If n is not specified, the default value for n is
1 iteration.
Active threads are assigned chunks on a "first-come,
first-do" basis. The first chunk contains ceiling(number_of_iterations/number_of_threads)
iterations. Subsequent chunks consist of ceiling(number_of_iterations_left
/ number_of_threads) iterations.
- runtime
- Specifies that the chunking algorithm will be determined at run
time.
- static[=n]
- The iterations of a loop are divided into chunks containing n iterations
each. Each thread is assigned chunks in a "round-robin" fashion.
This is known as block cyclic scheduling. If the value of n is
1, then the scheduling type is specifically referred to as cyclic
scheduling.
If n is not specified, the chunks will
contain floor(number_of_iterations/number_of_threads)
iterations. The first remainder (number_of_iterations/number_of_threads)
chunks have one more iteration. Each thread is assigned a separate
chunk. This is known as block scheduling.
If
a thread is asleep and it has been assigned work, it will be awakened
so that it may complete its work.
- n
- Must be an integer of value 1 or greater.
Specifying schedule with
no suboption is equivalent to schedule=auto.
For more information on chunking algorithms and
SCHEDULE, refer to Directives.
- stackcheck | nostackcheck
- Causes the compiler to check for stack overflow by slave threads
at run time, and issue a warning if the remaining stack size is less
than the number of bytes specified by the stackcheck option
of the XLSMPOPTS environment variable. This suboption is intended
for debugging purposes, and only takes effect when XLSMPOPTS=stackcheck is
also set; see XLSMPOPTS for
more information.
- threshold[=n]
- When -qsmp=auto is in effect, controls the amount of automatic
loop parallelization that occurs. The value of n represents
the minimum amount of work required in a loop in order for it to be
parallelized. Currently, the calculation of "work" is weighted heavily
by the number of iterations in the loop. In general, the higher the
value specified for n, the fewer loops are parallelized. Specifying
a value of 0 instructs the compiler to parallelize all auto-parallelizable
loops, whether or not it is profitable to do so. Specifying a value
of 100 instructs the compiler to parallelize only those auto-parallelizable
loops that it deems profitable. Specifying a value of greater than
100 will result in more loops being serialized.
- n
- Must be a positive integer of 0 or greater.
If you specify threshold with no suboption,
the program uses a default value of 100.
Specifying
-qsmp without suboptions is
equivalent to:
-qsmp=auto:opt:noomp:norec_locks:nonested_par:schedule=auto:
nostackcheck:threshold=100:ostls
Usage
- Specifying the omp suboption always implies noauto.
Specify -qsmp=omp:auto to apply automatic parallelization
on OpenMP-compliant applications, as well.
- When -qsmp is in effect, the compiler
recognizes all directives with the trigger constants SMP$, $OMP, and
IBMP, unless you specify the omp suboption. If you specify omp and
want the compiler to recognize directives specified with the other
triggers, you can use the -qdirective option to do so.
- You should only use -qsmp with the _r-suffixed
invocation commands, to automatically link in all of the threadsafe
components. You can use the -qsmp option with the non-_r-suffixed
invocation commands, but you are responsible for linking in the appropriate
components. If
you use the -qsmp option to compile any source file in a program,
then you must specify the -qsmp option at link time as well,
unless you link by using the ld command.
- If you use the f77 or fort77 command
with the -qsmp option to compile programs, specify -qnosave to
make the default storage class automatic, and specify -qthreaded to
tell the compiler to generate threadsafe code.
- Object files generated with the -qsmp=opt option can be
linked with object files generated with -qsmp=noopt. The visibility
within the debugger of the variables in each object file will not
be affected by linking.
- Specifying -qsmp implicitly sets -O2. The -qsmp option
overrides -qnooptimize, but does not override -O3, -O4,
or -O5. When debugging parallelized program code, you can disable
optimization in parallelized program code by specifying -qsmp=noopt.
- The -qsmp=noopt suboption overrides performance optimization
options anywhere on the command line unless -qsmp appears after -qsmp=noopt.
For example, -qsmp=noopt -O3 is equivalent to -qsmp=noopt,
while -qsmp=noopt -O3 -qsmp is equivalent to -qsmp -O3.
Examples
In
the following example, you should specify
-qsmp=rec_locks to
avoid a deadlock caused by
critical constructs.
program t
integer i, a, b
a = 0
b = 0
!smp$ parallel do
do i=1, 10
!smp$ critical
a = a + 1
!smp$ critical
b = b + 1
!smp$ end critical
!smp$ end critical
enddo
end