LI77084: -QHOT AT LEVEL=2 MAY BE SLOWER THAN AT LEVEL=1
Fixes are available
February 2013 Update for XL C/C++ for Blue Gene/Q, V12.1
May 2013 Update for XL C/C++ for Blue Gene/Q, V12.1
XL C/C++ for Blue Gene/Q Fix Pack 5 (August 2013 Update) for 12.1
XL C/C++ for Blue Gene/Q Fix Pack 8 (May 2014 Update) for 12.1
XL C/C++ for Blue Gene/Q Fix Pack 9 (August 2014 Update) for 12.1
XL C/C++ for Blue Gene/Q Fix Pack 11 (February 2015 Update) for 12.1
XL C/C++ for Blue Gene/Q Fix Pack 12 (May 2015 Update) for 12.1
XL C/C++ for Blue Gene/Q Fix Pack 13 (August 2015 Update) for 12.1
XL C/C++ for Blue Gene/Q Fix Pack 14 (May 2016 Update) for 12.1
Closed as suggestion for future release.
In general, -qhot=level=2 is available to the user to perform more aggressive loop transformations through the polyhedral framework, however the performance is not guaranteed to be as good or better than -qhot=level=1. The following example identifies a situation when the code compiled with -qhot=level=2 may get slower than the code compiled with -qhot=level=1: ... ... do j=1,n do i=1,n dr(i,j)=ar(i,j)+br(i,j)*cr(i,j)-bi(i,j)*ci(i,j) di(i,j)=ai(i,j)+br(i,j)*ci(i,j)+bi(i,j)*cr(i,j) end do end do Because the access to all of these arrays are stride-1, and there are no loop carried dependencies, XL compiler is able to easily vectorize the inner loop. The loop transformations performed under -qhot=level=1 iterates through a series of transformations and identifies opportunities that are profitable while also taking preventative measures not to prevent opportunities from other transformations that are deemed to be more profitable. In the case above it is able to identify the SIMD opportunity and focus on vectorizing the inner loop Under -qhot=level=2 XL compiler's polyhedral framework is much more aggressive and is solely focused on maximizing the performance of the entire loop nest without taking into consideration further opportunities that can be exploited. In the case above, -qhot=level=2 decides to perform several transformations that are profitable, but also prevents XL compiler's auto-simd transformations from occuring. If the source is built with -qsimd=noauto and -qhot=level=2 option combination, the performance of the resulting binary may be better than the one compiled with -qhot=level=1. Currently, the code compiled with -qhot=level=2 option fails to recognize the simd opportunity that the compiler was able to exploit for -qhot=level=1 compilation which leads to the performance regression at -qhot=level=2.
For loop intensive High-Performance Computing workloads it is recommended to use -O3 or -O3 -qhot at compile and link time. By default, -qhot implies -qhot=level=1.
According to development team, the runtime performance of the code provided by client and compiled with -qhot=level=2 will be on par or better than the one compiled with -qhot=level=1 command line options in the future version of XL compilers.
Reported component name
XL FORTRAN FOR
Reported component ID
Last modified date
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following: