-qpdf1, -qpdf2
Category
Pragma equivalent
None.
Purpose
Tunes optimizations through profile-directed feedback (PDF), where results from sample program execution are used to improve optimization near conditional branches and in frequently executed code sections.
Optimizes an application for a typical usage scenario based on an analysis of how often branches are taken and blocks of code are run.
Syntax
.-nopdf2-----------------------------. +-nopdf1-----------------------------+ >>- -q--+-pdf1--+--------------------------+-+----------------->< | +-=--pdfname--=--file_path-+ | | +-=--unique----------------+ | | +-=--nounique--------------+ | | +-=--exename---------------+ | | +-=--defname---------------+ | | '-=--level--=--+-0-+-------' | | +-1-+ | | '-2-' | '-pdf2--+--------------------------+-' +-=--pdfname--=--file_path-+ +-=--exename---------------+ '-=--defname---------------'
Defaults
-qnopdf1, -qnopdf2
Parameters
- defname
- Reverts a PDF file to its default file name.
- exename
- Specifies the name of the generated PDF file according to the output file name specified by the -o option. For example, you can use -qpdf1=exename -o func func.c to generate a PDF file called .func_pdf.
- level=0 | 1 | 2
- Specifies different levels of profiling information to be generated
by the resulting application. The following table shows the type of
profiling information supported on each level. The plus sign (+) indicates
that the profiling type is supported.
Table 1. Profiling type supported on each -qpdf1 level Profiling type Level 0 1 2 Block-counter profiling + + + Call-counter profiling + + + Single-pass profiling + + Value profiling + + Multiple-pass profiling + Cache-miss profiling + -qpdf1=level=1 is the default level. It is equivalent to -qpdf1. Higher PDF levels profile more optimization opportunities but have a larger overhead.
Notes:- Only one application compiled with the -qpdf1=level=2 option can be run at a time on a particular computer.
- Cache-miss profiling information has several levels. If you want to gather different levels of cache-miss profiling information, set the PDF_PM_EVENT environment variable to L1MISS, L2MISS, or L3MISS (if applicable) accordingly. Only one level of cache-miss profiling information can be instrumented at a time. L2 cache-miss is the default level.
- If you want to bind your application to the specified processor for cache-miss profiling, set the PDF_BIND_PROCESSOR environment variable. Processor 0 is set by default.
- pdfname= file_path
- Specifies the directories and names for the PDF files and any existing PDF map files. By default, if the PDFDIR environment variable is set, the compiler places the PDF and PDF map files in the directory specified by PDFDIR. Otherwise, if the PDFDIR environment variable is not set, the compiler places these files in the current working directory. If the PDFDIR environment variable is set but the specified directory does not exist, the compiler issues a warning message. The name of the PDF map file follows the name of the PDF file if the -qpdf1=unique option is not specified. For example, if you specify the -qpdf1=pdfname=/home/joe/func option, the generated PDF file is called func, and the PDF map file is called func_map. Both of the files are placed in the /home/joe directory. You can use the pdfname suboption to do simultaneous runs of multiple executable applications by using the same directory. It is especially useful when tuning with PDF process on dynamic libraries.
- unique | nounique
- You can use the -qpdf1=unique option to avoid
locking a single PDF file when multiple processes are writing to the
same PDF file in the PDF training step. This option specifies whether
a unique PDF file is created for each process during run time. The
PDF file name is <pdf_file_name>.<pid>. <pdf_file_name> is ._pdf by
default or specified by other -qpdf1 suboptions,
which include pdfname, exename,
and defname. <pid> is
the ID of running process in the PDF training step. For example, if
you specify the -qpdf1=unique:pdfname=abc option,
and there are two processes for PDF training with the IDs 12345678
and 87654321, two PDF files abc.12345678 and abc.87654321 are
generated. Note:
- When -qpdf1=unique is specified, only one PDF map file is generated. The default name of the PDF map file is ._pdf_map.
- When -qpdf1=unique is specified, multiple PDF files with process IDs as suffixes are generated. You must use the mergepdf program to merge all these PDF files into one after the PDF training step.
Usage
- Compile your program with the -qpdf1 option and a minimum optimization level of -O2. A PDF map file named ._pdf_map by default and a resulting application are generated.
- Run the resulting application with a typical data set. Profiling information is written to a PDF file named ._pdf by default. This step is called the PDF training step.
- Recompile and link or relink the program with the -qpdf2 option and the optimization level used for the -qpdf1 option. The -qpdf2 process fine-tunes the optimizations according to the profiling information collected when the resulting application is run.
- The showpdf utility uses the PDF map file to display part of the profiling information in text or XML format. For details, see Viewing profiling information with showpdf. If you do not need to view the profiling information, specify the -qnoshowpdf option during the -qpdf1 phase so that the PDF map file is not generated. For details of -qnoshowpdf, see -qshowpdf.
- When option -O4, -O5, or any level of option -qipa is in effect, and you specify the -qpdf1 or -qpdf2 option at the link step but not at the compile step, the compiler issues a warning message. The message indicates that you must recompile your program to get all the profiling information.
- When the -qpdf1=pdfname option is used during the -qpdf1 phase, you must use the -qpdf2=pdfname option during the -qpdf2 phase for the compiler to recognize the correct PDF file. This rule also applies to the -qpdf[1|2]=exename option.
The compiler issues an information message with a number in the range of 0 - 100 during the -qpdf2 phase. If you have not changed your program between the -qpdf1 and -qpdf2 phases, the number is 100, which means that all the profiling information can be used to optimize the program. If the number is 0, it means that the profiling information is completely outdated, and the compiler cannot take advantage of any information. When the number is less than 100, you can choose to recompile your program with the -qpdf1 option and regenerate the profiling information.
Single-pass profiling
Single-pass profiling is supported on level 0 and 1 of the -qpdf1 phase. If you recompile your program and use either of the -qpdf1=level=0 or -qpdf1=level=1 option, the compiler removes the existing PDF file and the possible existing PDF map file before generating a new application.
Multiple-pass profiling
- If you have not specified the -qnoshowpdf option, PDF map files that correspond to the PDF files are also generated, with the default names ._pdf_map, ._pdf.1_map, and so on up to ._pdf.5_map.
- If you use the -qpdf2=pdfname option to specify a PDF file, specify a file name that does not end with a numeric suffix from .1 to .5. Otherwise, the compiler looks for wrong files. For example, if you specify the -qpdf2=pdfname=func.2 option during the -qpdf2 phase, the compiler looks for the PDF files named (func.2, func.2.1, func.2.2, func.2.3), which might not exist. If you specify the -qpdf2=pdfname=func option without the numeric suffix, the compiler looks for (func, func.1, func.2, func.3).
Other related options
- -qprefetch
- When you run the -qprefetch=assistthread option to generate data prefetching assist threads, the compiler uses the delinquent load information to perform analysis and generate them. The delinquent load information can be gathered from dynamic profiling using the -qpdf1=level=2 option. For more information, see -qprefetch.
- -qshowpdf
- Provides additional information to the profile file. See -qshowpdf for more information.
For recommended procedures of using PDF, see Using profile-directed feedback.
- cleanpdf
-
>>-cleanpdf--+--------+--+-----+--+--------------+------------->< '-pdfdir-' '- -u-' '- -f--pdfname-'
Removes all PDF files or the specified PDF files, including PDF files with process ID suffixes. Removing profiling information reduces runtime overhead if you change the program and then go through the PDF process again.
- pdfdir
- Specifies the directory that contains the PDF files to be removed. If pdfdir is not specified, the directory is set by the PDFDIR environment variable; if PDFDIR is not set, the directory is the current directory.
- -f pdfname
- Specifies the name of the PDF file to be removed. When specified,
files with the naming convention pdfname.<multiple_pass_profiling_times>,
if applicable, are also removed. <multiple_pass_profiling_times> is
a numeric suffix from 1 to 5.
If -f pdfname is not specified, ._pdf and files with the naming convention ._pdf.<multiple_pass_profiling_times>, if applicable, are removed.
- -u
- Removes the PDF file
that is specified by pdfname and files with the
following naming convention when applicable:
- pdfname.<pid>, where <pid> is the ID of running process in the PDF training step
- pdfname.<multiple_pass_profiling_times>.<pid>
- ._pdf.<pid>
- ._pdf.<multiple_pass_profiling_times>.<pid>
Run cleanpdf only when you finish the PDF process for a particular application. Otherwise, if you want to resume by using PDF process with that application, you must compile all of the files again with -qpdf1.
- mergepdf
-
.-------------------------. V | >>-mergepdf----+--------------+--input-+-- -o--output--+-----+--+-----+->< '- -r--scaling-' '- -n-' '- -v-'
Merges two or more PDF files into a single PDF file.
- -r scaling
- Specifies the scaling ratio for the PDF file. This value must be greater than zero and can be either an integer or a floating-point value. If not specified, a ratio of 1.0 is assumed.
- input
- Specifies the name of a PDF input file, or a directory that contains PDF files.
- -o output
- Specifies the name of the PDF output file, or a directory to which the merged output is written.
- -n
- If specified, PDF files are not normalized. If not specified, mergepdf normalizes files based on an internally calculated ratio before applying any user-defined scaling factor.
- -v
- Specifies verbose mode, and causes internal and user-specified scaling ratios to be displayed to standard output.
- resetpdf
-
>>-resetpdf--+--------+--+-----+--+--------------+------------->< '-pdfdir-' '- -u-' '- -f--pdfname-'
Same as cleanpdf.
- showpdf
-
Displays part of the profiling information written to PDF and PDF map files. To use this command, you must first compile your program and use the -qpdf1 option. See Viewing profiling information with showpdf for more information.
Predefined macros
None.
Examples
#Compile all the files with -qpdf1=level=0
xlc -qpdf1=level=0 -O3 file1.c file2.c file3.c
#Run with one set of input data
./a.out < sample.data
#Recompile all the files with -qpdf2
xlc -qpdf2 -O3 file1.c file2.c file3.c
#If the sample data is typical, the program
#can now run faster than without the PDF process
#Compile all the files with -qpdf1
xlc -qpdf1 -O3 file1.c file2.c file3.c
#Run with one set of input data
./a.out < sample.data
#Recompile all the files with -qpdf2
xlc -qpdf2 -O3 file1.c file2.c file3.c
#If the sample data is typical, the program
#can now run faster than without the PDF process
#Compile all the files with -qpdf1=level=2
xlc -qpdf1=level=2 -O3 file1.c file2.c file3.c
#Set PM_EVENT=L2MISS to gather L2 cache-miss profiling
#information
export PDF_PM_EVENT=L2MISS
#Run with one set of input data
./a.out < sample.data
#Recompile all the files with -qpdf2
xlc -qpdf2 -O3 file1.c file2.c file3.c
#If the sample data is typical, the program
#can now run faster than without the PDF process
#Compile all the files with -qpdf1=level=2
xlc -qpdf1=level=2 -O3 file1.c file2.c file3.c
#Set PM_EVENT=L1MISS to gather L1 cache-miss profiling
#information
export PDF_PM_EVENT=L1MISS
#Run with one set of input data
./a.out < sample.data
#Set PM_EVENT=L2MISS to gather L2 cache-miss profiling
#information
export PDF_PM_EVENT=L2MISS
#Run with one set of input data
./a.out < sample.data
#Recompile all the files with -qpdf2
xlc -qpdf2 -O3 file1.c file2.c file3.c
#If the sample data is typical, the program
#can now run faster than without the PDF process
#Compile all the files with -qpdf1=level=2. The static profiling
#information is recorded in a file named ._pdf_map by default
xlc -qpdf1=level=2 -O3 file1.c file2.c file3.c
#Run with one set of input data, the profiling information
#is recorded in a file named ._pdf by default
./a.out < sample.data
#Recompile all the files with -qpdf1=level=2 again
#The compiler reads the previous profiling information, refines
#instrumentation, and generates a new instrumented
#executable. The static profiling information
#is recorded in ._pdf.1_map
xlc -qpdf1=level=2 -O3 file1.c file2.c file3.c
#Run it again, the profiling information is recorded in
#._pdf.1
./a.out < sample.data
#Recompile all the files with -qpdf2
xlc -qpdf2 -O3 file1.c file2.c file3.c
#If the sample data is typical, the program
#can now run faster than without the PDF process
#Compile all the files with -qpdf1=level=1
xlc -qpdf1=level=1 -O3 file1.c file2.c file3.c
#Set PDF_BIND_PROCESSOR environment variable so that
#all processes for this executable are run on Processor 1
export PDF_BIND_PROCESSOR=1
#Run executable with sample input data
./a.out < sample.data
#Recompile all the files with -qpdf2
xlc -qpdf2 -O3 file1.c file2.c file3.c
#If the sample data is typical, the program
#can now run faster than without the PDF process
#Compile all the files with -qpdf1=exename
xlc -qpdf1=exename -O3 -o final file1.c file2.c file3.c
#Run executable with sample input data
./final < typical.data
#List the content of the directory
>ls -lrta
-rw-r--r-- 1 user staff 50 Dec 05 13:18 file1.c
-rw-r--r-- 1 user staff 50 Dec 05 13:18 file2.c
-rw-r--r-- 1 user staff 50 Dec 05 13:18 file3.c
-rwxr-xr-x 1 user staff 12243 Dec 05 17:00 final
-rwxr-Sr-- 1 user staff 762 Dec 05 17:03 .final_pdf
#Recompile all the files with -qpdf2=exename
xlc -qpdf2=exename -O3 -o final file1.c file2.c file3.c
#The program is now optimized using PDF information
#Compile all the files with -qpdf1=pdfname.The static profiling
#information is recorded in a file named final_map
xlc -qpdf1=pdfname=final -O3 file1.c file2.c file3.c
#Run executable with sample input data.The profiling
#information is recorded in a file named final
./a.out < typical.data
#List the content of the directory
>ls -lrta
-rw-r--r-- 1 user staff 50 Dec 05 13:18 file1.c
-rw-r--r-- 1 user staff 50 Dec 05 13:18 file2.c
-rw-r--r-- 1 user staff 50 Dec 05 13:18 file3.c
-rwxr-xr-x 1 user staff 12243 Dec 05 18:30 a.out
-rwxr-Sr-- 1 user staff 762 Dec 05 18:32 final
#Recompile all the files with -qpdf2=pdfname
xlc -qpdf2=pdfname=final -O3 file1.c file2.c file3.c
#The program is now optimized using PDF information