IBM Support

Using the Postprocessor Parameters on the Load Information tab of the application definition

Question & Answer


Question

How do I modify the index values that have been extracted from the documents before loading them into Content Manager OnDemand? How do I use the postprocessor?

Cause

You need to make changes to some of your index values before they are loaded into Content Manager OnDemand.

Answer

The postprocessor is used to manipulate index data before it is loaded into the database. A postprocessor command can be written as shell script or C program to do the postprocessing. This document includes information about writing such a script or program.
Overview of the postprocessor
The postprocessor command is called by the ARSLOAD program after the index values have been collected.
It is called in the following way on AIX, Linux, and z/OS:
cat <filename>| <postprocessor command> > <new filename>
It is called in the following way on Windows:
type <filename>| <postprocessor command> > <new filename>
Where:
  • <filename> is the file that is created by the ARSLOAD program to hold the index values.
  • <postprocessor command> is the user defined command entered in the Postprocessor Parameters field of the Application->Load Information tab.
  • <new filename> is the output file that is created by the postprocessor command to and used by ARSLOAD to insert the manipulated index values into the database.
After the postprocessor command is finished, the ARSLOAD program loads the indexes from the new file into the database.
Example

As an example, the following script file replaces each account number with zeros. The full path name and name of the script file would be entered in the Application -> Load Information -> Postprocessor Parameters box of the OnDemand Administrator client.

while read a b
do
       if [[ ${a::1} != '<' ]]
       then
              echo -e "00000\t$b"
       else
              echo -e "$a\t$b"
       fi
done



A postprocessor command should read from stdin and write to stdout. Errors can be written to stderr. It receives from stdin the contents of the file that contains the index values after they are collected from the indexer that the ARSLOAD program called. This file is formatted into rows to load into the database. This file is written out in the locale of the environment that ARSLOAD is running in.
Here is an example of the format of the index file that the ARSLOAD program creates that contains the index values that are piped to the postprocessor. The name of this file is specified by the <filename> parameter of the postprocessor command. In this example, each row contains three index values: account, date, and name. A header line is included as the first row. Each line ends with the line feed command x0A, even on Windows. The index values are separated by tabs, not spaces. In the following example, the tabs appear as blank space.

<account><date>      <name>
12345    12/02/07    John Smith
34567    12/05/07    Mary Lark



This file must be read and written as a binary file because of the tabs and line feeds, even though it contains index values in text.

The following line from the script replaces each account number with 00000 and writes the line to stdout. Note that the tab character \t must be used to separate the index values:

echo -e "00000\t$b"


The following data is written to a file on stdout by the postprocessor command:

<account><date>      <name>
00000    12/02/07    John Smith
00000    12/05/07    Mary Lark



On Windows, write the postprocessor in C language, and use the gets_s or fgets command to read a line at a time from stdin until the end of the file is reached. The postprocessor can operate on each line and then write it to stdout. For information on the gets_s and fgets commands, refer to the Microsoft documentation.

Important: On Windows, end each line with x0A and not with x0D0A.

[{"Type":"MASTER","Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSEPCD","label":"Content Manager OnDemand for Multiplatforms"},"ARM Category":[{"code":"a8m0z0000001gP1AAI","label":"technote"}],"ARM Case Number":"","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF016","label":"Linux"},{"code":"PF033","label":"Windows"}],"Version":"All Versions"},{"Type":"MASTER","Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSQHWE","label":"Content Manager OnDemand for z\/OS"},"ARM Category":[{"code":"a8m0z0000001gP1AAI","label":"technote"}],"Platform":[{"code":"PF035","label":"z\/OS"}],"Version":"All Versions"}]

Document Information

Modified date:
12 January 2024

UID

swg21303532