fdpr Command

Item Description
-analyse_asm_csects Analyze csects written in assembly (when used, must be specified at both the -1 and -3 phases).
-extra_safe_analysis Do not attempt to analyze unconventional csects containing hand-written assembly code (when used, must be specified at both the -1 and -3 phases).
-ignore_info Ignore .info sections produced with the -qfdpr option during compile time (when used, must be specified at both -1 and -3 phases).
-align bytes Align frequently executed code according to given number of bytes, for improving code prefetch buffer ratio. If this option is omitted, the fdpr command aligns the code with variable default number of bytes.
-lr_opt Eliminate stores and restores of the link register in frequently executed procedures.
-bt_csect_anchor_removal Eliminate load instructions related to the usage of branch tables in the code.
-dead_code_removal Remove unreachable code.
-selective_inline Perform selective inlining for functions that are frequently called from a single dominant call site.
-sid_fac percent Set a dominant factor percentage for selective inline optimization. The allowed range is between 50 - 100 (applicable only with the -selective_inline flag).
-inline_small_funcs size Inline all functions that are smaller or equal to the given size in bytes.
-inline_hot_funcs percent Inline all functions with an execution frequency equals or greater than the given percentage. The input percent range is between 0 - 100.
-inline Perform -inline_small_funcs 12 with -selective_inline.
-hco_resched Relocate instructions from frequently executed code to rarely executed code area, when possible.
-dcbt_opt Insert dcbt instructions to improve data-cache performance.
-killed_regs Eliminate stores and restores of registers that are killed (overwritten) after frequently executed function calls.
-tb Force the restructuring of traceback tables in reordered code. If -tb option is omitted, traceback tables are automatically restored for C++ applications using Try & Catch mechanism.
-pc Preserve csects' boundaries in reordered code.
-pp Preserve functions' boundaries in reordered code.
-RD Perform static data reordering.
-dpnf factor Data Placement Normalization Factor between 0 - 1; where 0 causes static variables to be reordered regardless of their size, whereas 1 will locate only small sized variables first (applicable only with the -RD flag).
-dpht threshold Data Placement Hotness Threshold between 0 - 1; where 0 reorders the static variables in large groups based on the control flow, and whereas 1 will reorder the variables in very small groups based on their access frequency (applicable only with the -RD flag).
-build_dcg Build DCG (Data Connectivity Graph) for enhanced data reordering (applicable only with the -RD flag).
-tocload Perform tocload optimization.
-reduce_toc removal_factor Perform TOC entries removal accordingly to removal factor between 0 - 1, where 0 removes only non-accessed TOC entries and 1 removes all non-exported TOC entries.
-strip Strip the output file (if any is produced).
-ptrgl_opt Perform optimization of indirect call instructions by way of registers by replacing them with direct jumps.
-no_ptrgl_r11 Do not perform removal of R11 load instruction in _ptrgl csect (the -ptrgl_r11 optimization is applied by default).
-O Perform code reordering with branch prediction bit setting, branch folding and NOOP instructions removal. The -O flag is applied by default.
-O2 Switch on all less aggressive optimization flags.
-O3 Switch on all aggressive optimization flags.
-O4 Switch on all aggressive optimization flags.

Purpose

A performance tuning utility for improving execution time and real memory utilization of user-level post-link application programs.

Syntax

Most Common Usage:

fdpr -p ProgramFile -x WorkloadCommand

Detailed Usage:

fdpr -p ProgramFile [ -M SegNum ] [ -fd Fdesc ] [ -o OutputFile ] [ -armember ArchiveMemberList ] [ OptimizationFlags ] [ -map ] [ -disasm ] [ -disasm_data] [ -disasm_bss] [ -profcount ] [ -quiet] [ -v ] [ -1 | -2 | -3 | -12 | -23 | -123] [ -x WorkloadCommand ]

Optimization Flags

[ -tb ] [ -pc ] [ -pp ] [ -O ][ -O2 ] [ -O3 ] [ -O4 ] [ -selective_inline] [ -sid_fac percent] [ -inline_small_funcs size] [ -inline_hot_funcs percent] [ -hco_resched] [ -killed_regs ] [ -lr_opt] [ -align bytes] [ -RD ] [ -dpnf factor] [ -dpht threshold] [ -build_dcg] [ -tocload ] [-ptrgl_opt ] [ -no_ptrgl_r11] [ -dcbt_opt ] [ -ignore_info] [ -dead_code_removal] [ -bt_csect_anchor_removal] [ -strip] [-analyse_asm_csects] [-extra_safe_analysis] [-inline] [-reduce_toc removal_factor]

Description

The fdpr command (Feedback Directed Program Restructuring) is a performance-tuning utility that may help improve the execution time and the real memory utilization of user-level application programs. The fdpr program optimizes the executable image of a program by collecting information on the behavior of the program while the program is used for some typical workload, and then creating a new version of the program that is optimized for that workload. The new program generated by fdpr typically runs faster and uses less real memory.

Attention: The fdpr command applies advanced optimization techniques to a program which may result in programs that do not behave as expected; programs which are optimized using this tool should be used with due caution and should be rigorously retested with, at a minimum, the same test suite used to test the original program in order to verify expected functionality. The optimized program is not supported.

The fdpr command builds an optimized executable program in 3 distinct phases:

  • Phase 1 (-1 flag): Creates an instrumented executable program and an empty template profile file.
  • Phase 2 (-2 flag): Runs the instrumented program and updates the profile data.
  • Phase 3 (-3 flag): Generates the optimized executable program file.
These phases can be run separately or in partial or full combination, but must be run in order (i.e., -1 then -2 then -3 or -12 then -3). The default is to run all three phases.
Note: The instrumented executable, created in phase 1 and run in phase 2, typically runs several times slower than the original program. Due to the increased execution time required by the instrumented program, the executable should be invoked in such a way as to minimize execution duration, while still fully exercising the desired code areas. The fdpr command user should also attempt to eliminate, where feasible, any time dependent aspects of the program.

Flags

Item Description
-1,-2, -3 Specifies the phase to run. The default is all 3 phases (-123). The -s flag must be used when running separate phases so that the succeeding phases can access the required intermediate files. The phases must be run in order (for example, -1, then -2, then -3, or -1, then -23). The -2 flag must be used along with the invocation flag -x.
-M SegNum Specifies where to map shared memory for profiling. The default is 0x30000000. Specify an alternate shared memory address if the program to be optimized or any of the workload command strings invoked with the -x flag use conflicting shared-memory addresses. Typical alternative values are 0x40000000, 0x50000000, ... up to 0xC0000000).
-fd Fdesc Specifies which file descriptor number is to be used for the profile file that is mapped to the above shared memory area. The default of Fdesc is set to 1999.
-o OutFile Specifies the name of the output file from the optimizer. The default is program.fdpr
-p ProgramFile Contains the name of the executable program file or shared object file or shared library containing shared objects/executables, to optimize. This program must be an unstripped executable.
-armember ArchiveMemberList List of archive members to be optimized, within a shared archive file specified by the -p flag. If -armember is not specified, all members of the archive file are optimized.
-map Print a map of basic blocks and static variables with their respective old -> new addresses into a suffixed .mapper file.
-disasm Prints the disassembled text section of the output optimized and instrumented program into a suffixed .dis_text file.
-disasm_data Prints the disassembled data section of the output optimized and instrumented program into a suffixed .dis_data file.
-disasm_bss Prints the disassembled bss section of the output optimized and instrumented program into a suffixed .dis_bss file.
-profcount Prints the profiling counters into a suffixed .ncounts file.
-quiet Quiet output mode.
-v Verbose output.
-x WorkloadCommand Specifies the command used for invoking the instrumented program. All the arguments after the -x flag are used for the invocation. Therefore, the -x flag must appear last in the command line. The -x flag is required when the -2 flag is used.

Optimization Flags

Optimization

The fdpr command performs, by default, the highest possible level of code reordering optimization together with the optimizations of branch prediction bit setting, branch folding, code alignment and removal of redundant NOOP instructions. The -pc flag reorders the entire code while preserving csects' boundaries and therefore, may result in less performance improvement than the default code reordering. Similarly, the -pp flag reorders the entire code while preserving procedures' boundaries.

Additional optimizations performed on the entire executable program file are available by the optimization flags above.

Executables built with the -qfdpr IBM® xl compiler flag contain information to assist fdpr in producing reordered programs. Modules which are not compiled with the -qfdpr option, are reordered based on the compiler signatures in the symbol table.

Additional performance enhancements may be realized by using static linking when building the program to be reordered. Since the fdpr program only reorders the instructions within the executable program specified, any dynamically linked shared library routines called by the program are not optimized. Statically linking these library routines to the executable allows for optimizing both the instructions in the program and all library routines used by the program. There are other advantages as well as disadvantages to building a statically linked program.

Output Files

All files created by the fdpr command are stored in the current directory with the exception of any files which may be created by running the workload command specified in the -x flag. During the optimization process, the original program is saved by renaming the program, and is only restored to the original program name upon successful completion of the final phase.

The profile file created by the fdpr command explicitly uses the full name of the current directory since scripts used to run the program may change the working directory before executing the program.

The files created and/or used by the fdpr command are:

Item Description
program Name of the unstripped executable to be optimized.
program.save Saved version of the original executable program.
program.nprof Name of the profile file.
program.instr Name of the instrumented version of program.
program.fdpr Default name of optimized executable output file.
program.instr.dis_text Default disassembly file in ASCII format produced by -disasm flag after instrumentation phase.
program.fdpr.dis_text Default disassembly file in ASCII format produced by -disasm flag after optimization phase.
program.instr.dis_data Default disassembly file in ASCII format produced by -disasm_data flag after instrumentation phase.
program.fdpr.dis_data Default disassembly file in ASCII format produced by -disasm_data flag after optimization phase.
program.instr.dis_bss Default disassembly file in ASCII format produced by -disasm_bss flag after instrumentation phase.
program.fdpr.dis_bss Default disassembly file in ASCII format produced by -disasm_bss flag after optimization phase.
program.instr.mapper Default mapping file in ASCII format produced by -map flag after instrumentation phase.
program.fdpr.mapper Default mapping file in ASCII format produced by -map flag after optimization phase.
program.ncounts Default profile counters file in ASCII format produced by -profcount flag.

Enhanced Debugging Capabilities

In order to enable a certain degree of debugging capability for optimized programs, FDPR updates the Symbol Table to reflect the changes that were made in the .text section.

Entry fields in the Symbol Table that specify addresses of symbols that were relocated during the reordering of FDPR, are modified to point to their new addresses in the .text section.

In addition, in the case where functions or files are split during reordering, FDPR creates new entries in the Symbol Table for each new part of the split function/file. These new parts of the same function are given new symbol names in the Symbol Table according to the following naming convention:

<original function name>__fdpr_<function's part number>

After code reordering all the new entries are suffixed with the __fdpr_ string.

Example: Originally, function "main" had the following entry in the Symbol Table:
  [Index] m   Value       Scn     Aux   Sclass    Type    Name
   [456]  m  0x00000230    2       1     0x02    0x0000   .main
If after code reordering, function main was split into 3 parts, then it would have 3 entries in the Symbol Table; one for each part as follows:
  [Index] m   Value       Scn     Aux   Sclass    Type    Name
   [456]  m  0x00000304    2       1     0x02    0x0000   .main
  [1447]  m  0x00003328    2       1     0x02    0x0000   .main__fdpr_1
  [1453]  m  0x000033b4    2       1     0x02    0x0000   .main__fdpr_2

Examples

The following are typical usage examples of the fdpr command.

  1. This example allows the user to run all three phases. In this example, test1 is the unstripped executable and test2 is a shell script that invokes test1. The current working directory is /tmp/fdpr.
    test2 script file:
    
    # code to exercise test1
    test1 -expand 100 -root $PATH file.jpg -quit
    # the end of test2
    Execute the fdpr command (using the default optimization):
    fdpr -p test1 -x test2
    This results in the new reordered executable test1.fdpr.
  2. To run one phase at a time, execute phase one of fdpr.
    fdpr -1 -p test1
    This command string creates an instrumented version with the name test1.instr and the empty template profile file test1.nprof.

    To execute phase two:

    fdpr -2 -p test1 -x test2
    This command string executes the script file test2 that runs the instrumented version of test1 to collect the profile data.

    To execute phase three:

    fdpr -3 -p test1
    Again, this results in the new reordered executable test1.fdpr.
  3. To run the first two phases followed by phase three, execute phase one and two.
    fdpr -12 -p test1 -x test2
    Execute phase three using optimization level three.
    fdpr -3 -O3 -p test1
  4. If an error occurs while running an fdpr optimized program, the dbx command can be used to determine what procedure the error occurred in as follows:
    dbx program.fdpr
    which produces the output similar to the following:
    Type 'help' for help.
    reading symbolic information ...warning: no source compiled with -g
     
    [using memory image in core]
     
    Segmentation fault in proc_d at 0x10000634
    0x10000634 (???) 98640000        stb   r3,0x0(r4)
    (dbx)

    A stack traceback, which is used to determine how the program arrived at the current location, is produced as follows:

    (dbx) where
    which produces the following output:
    proc_d(0x0) at 0x10000634
    proc_c(0x0) at 0x10000604
    proc_b(0x0) at 0x100005d0
    proc_a(0x0) at 0x1000059c
    main(0x2, 0x2ff7fba4) at 0x1000055c
    (dbx)
  5. The dbx subcommand stepi may also be used to single step through the instructions of a reordered executable program as follows:
    (dbx) stepi
    which produces the following output:
    stopped in proc_d at 0x1000061c
    0x1000061c (???) 9421ffc0       stwu   r1,-64(r1)
    (dbx)
    In this example, dbx indicates that the program stopped in routine proc_d at address 0x1000061c in the reordered text section.

Implementation Specifics

Software Product/Option: AIX® Performance Aide/ Local Performance Analysis & Control Commands.

Standards Compliance: None.

Files

Item Description
/usr/bin/fdpr Contains the fdpr command.
program Name of the unstripped executable to be optimized.
program.save Saved version of the original executable program.
program.nprof Name of the profile file.
program.instr Name of the instrumented version of program.
program.fdpr Default name of optimized executable output file.
program.instr.dis_text Default disassembly file in ASCII format produced by -disasm flag after instrumentation phase.
program.fdpr.dis_text Default disassembly file in ASCII format produced by -disasm flag after optimization phase.
program.instr.dis_data Default disassembly file in ASCII format produced by -disasm_data flag after instrumentation phase.
program.fdpr.dis_data Default disassembly file in ASCII format produced by -disasm_data flag after optimization phase.
program.instr.dis_bss Default disassembly file in ASCII format produced by -disasm_bss flag after instrumentation phase.
program.fdpr.dis_bss Default disassembly file in ASCII format produced by -disasm_bss flag after optimization phase.
program.instr.mapper Default mapping file in ASCII format produced by -map flag after instrumentation phase.
program.fdpr.mapper Default mapping file in ASCII format produced by -map flag after optimization phase.
program.ncounts Default profile counters file in ASCII format produced by -profcount flag.