LI77084: -QHOT AT LEVEL=2 MAY BE SLOWER THAN AT LEVEL=1

Fixes are available

APAR status

Closed as suggestion for future release.

Error description

In general, -qhot=level=2 is available to the user to perform
more aggressive loop transformations through the polyhedral
framework, however the performance is not guaranteed to be as
good or better than -qhot=level=1.

The following example identifies a situation when
the code compiled with -qhot=level=2 may get slower than the
code compiled with -qhot=level=1:

      ...
      ...
      do j=1,n
      do i=1,n
      dr(i,j)=ar(i,j)+br(i,j)*cr(i,j)-bi(i,j)*ci(i,j)
      di(i,j)=ai(i,j)+br(i,j)*ci(i,j)+bi(i,j)*cr(i,j)
      end do
      end do

Because the access to all of these arrays are stride-1, and
there are no loop carried dependencies, XL compiler is able to
easily vectorize the inner loop.

The loop transformations performed under -qhot=level=1 iterates
through a series of transformations and identifies opportunities
that are profitable while also taking preventative measures not
to prevent opportunities from other transformations that are
deemed to be more profitable. In the case above it is able to
identify the SIMD opportunity and focus on vectorizing the inner
loop

Under -qhot=level=2 XL compiler's polyhedral framework is much
more aggressive and is solely focused on maximizing the
performance of the entire loop nest without taking into
consideration further opportunities that can be exploited.

In the case above, -qhot=level=2 decides to perform several
transformations that are profitable, but also prevents XL
compiler's auto-simd transformations from occuring.

If the source is built with -qsimd=noauto and -qhot=level=2
option combination, the performance of the resulting binary may
be better than the one compiled with -qhot=level=1.
Currently, the code compiled with -qhot=level=2 option fails to
recognize the simd opportunity that the compiler was able to
exploit for -qhot=level=1 compilation which leads to the
performance regression at -qhot=level=2.

Local fix

For loop intensive High-Performance Computing workloads it is
recommended to use -O3 or -O3 -qhot at compile and link time.
By default, -qhot implies -qhot=level=1.

Problem summary

Problem conclusion

Temporary fix

Comments

According to development team, the runtime performance of the
code provided by client and compiled with -qhot=level=2 will be
on par or better than the one compiled with -qhot=level=1
command line options in the future version of XL compilers.

APAR Information

APAR number
LI77084
Reported component name
XL FORTRAN FOR
Reported component ID
5799AH100
Reported release
E10
Status
CLOSED SUG
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2012-11-06
Closed date
2012-11-06
Last modified date
2012-11-06

APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:

Fix information

Applicable component levels

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SS2MB5","label":"XL Fortran for Blue Gene\/Q"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"ALL VERSIONS","Edition":"","Line of Business":{"code":"LOB57","label":"Power"}}]

Document Information

Modified date:
06 November 2012

Tips

LI77084: -QHOT AT LEVEL=2 MAY BE SLOWER THAN AT LEVEL=1

Fixes are available

Subscribe

APAR status

Closed as suggestion for future release.

Error description

Local fix

Problem summary

Problem conclusion

Temporary fix

Comments

APAR Information

APAR number

Reported component name

Reported component ID

Reported release

Status

PE

HIPER

Special Attention

Submitted date

Closed date

Last modified date

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:

Fix information

Applicable component levels

Document Information

Share your feedback

Need support?