The load SOURCEUSEREXIT option provides
a facility through which the load utility can execute a customized
script or executable, referred to herein as a user exit.
The purpose of the user exit is to populate one or more named
pipes with data that is simultaneously read from by the load utility.
In a multi-partition database, multiple instances of the user exit
can be invoked concurrently to achieve parallelism of the input data.
As Figure 1 shows, the load utility
creates a one or more named pipes and spawns a process to execute
your customized executable. Your user exit feeds data into the named
pipe(s) while the load utility simultaneously reads.
Figure 1. The load utility reads from the pipe and
processes the incoming data.
The data fed into the pipe must reflect the load options specified,
including the file type and any file type modifiers. The load utility
does not directly read the data files specified. Instead, the data
files specified are passed as arguments to your user exit when it
is executed.
Invoking your user exit
The user exit must
reside in the bin subdirectory of the
DB2® installation
directory (often known as sqllib). The load utility invokes the user
exit executable with the following command line arguments:
<base pipename> <number of source media>
<source media 1> <source media 2> ... <user exit ID>
<number of user exits> <database partition number>
Where:
- <base pipename >
- Is the base name for named-pipes that the load utility creates
and reads data from. The utility creates one pipe for every source
file provided to the LOAD command, and each of these pipes is appended
with .xxx, where xxx is the index
of the source file provided. For example, if there are 2 source files
provided to the LOAD command, and the <base pipename> argument
passed to the user exit is pipe123, then the two
named pipes that your user exit should feed with data are pipe123.000 and pipe123.001.
In a partitioned database environment, the load utility appends the
database partition (DBPARTITION) number .yyy to
the base pipe name, resulting in the pipe name pipe123.xxx.yyy..
- <number of source media>
- Is the number of media arguments which follow.
- <source media 1> <source media 2> ...
- Is the list of one or more source files specified in the LOAD
command. Each source file is placed inside double quotation marks.
- <user exit ID>
- Is a special value useful when the PARALLELIZE option
is enabled. This integer value (from 1 to N, where N is the total
number of user exits being spawned) identifies a particular instance
of a running user exit. When the PARALLELIZE option
is not enabled, this value defaults to 1.
- <number of user exits>
- Is a special value useful when the PARALLELIZE option
is enabled. This value represents the total number of concurrently
running user exits. When the PARALLELIZE option is
not enabled, this value defaults to 1.
- <database partition number>
- Is a special value useful when the PARALLELIZE option
is enabled. This is the database partition (DBPARTITION) number on
which the user exit is executing. When the PARALLELIZE option
is not enabled, this value defaults to 0.
Additional options and features
The following
section describes additional SOURCEUSEREXIT facility
options:
- REDIRECT
- This option allows you to pass data into the STDIN handle or capture
data from the STDOUT and STDERR handles of the user exit process.
- INPUT FROM BUFFER <buffer>
- Allows you to pass information directly into the STDIN input stream
of your user exit. After spawning the process which executes the user
exit, the load utility acquires the file-descriptor to the STDIN of
this new process and passes in the buffer provided. The user exit
reads from STDIN to acquire the information. The load utility simply
sends the contents of <buffer> to the user exit using STDIN
and does not interpret or modify its contents. For example, if your
user exit is designed to read two values from STDIN, an eight-byte
userid and an eight-byte password, your user exit executable written
in C might contain the following lines:
rc = read (stdin, pUserID, 8);
rc = read (stdin, pPasswd, 8);
A user could pass this
information using the INPUT FROM BUFFER option as
shown in the following LOAD command: LOAD FROM myfile1 OF DEL INSERT INTO table1
SOURCEUSEREXIT myuserexit1 REDIRECT INPUT FROM BUFFER myuseridmypasswd
Note: The
load utility limits the size of <buffer> to the maximum size
of a LOB value. However, from within the command line processor (CLP),
the size of <buffer> is restricted to the maximum size of a
CLP statement. From within CLP, it is also recommended that <buffer>
contain only traditional ASCII characters. These issues can be avoided
if the load utility is invoked using the db2Load API,
or if the INPUT FROM FILE option is used instead.
- INPUT FROM FILE <filename>
- Allows you to pass the contents of a client side file directly
into the STDIN input stream of your user exit. This option is almost
identical to the INPUT FROM BUFFER option, however
this option avoids the potential CLP limitation. The filename must
be a fully qualified client side file and must not be larger than
the maximum size of a LOB value.
- OUTPUT TO FILE <filename>
- Allows you to capture the STDOUT and STDERR streams from your
user exit process into a server side file. After spawning the process
which executes the user exit executable, the load utility redirects
the STDOUT and STDERR handles from this new process into the filename
specified. This option is useful for debugging and logging errors
and activity within your user exit. The filename must be a fully qualified
server side file. When the PARALLELIZE option is
enabled, one file exists per user exit and each file appends a three-digit
numeric identifier, such as filename.000.
- PARALLELIZE
- This option can increase the throughput of data coming into the
load utility by invoking multiple user exit processes simultaneously.
This option is only applicable to a multi-partition database. The
number of user exit instances invoked is equal to the number of partitioning
agents if data is to be distributed across multiple database partitions
during the load operation, otherwise it is equal to the number of
loading agents.
The <user exit ID>, <number of user exits>,
and <database partition number> arguments passed into each user
exit reflect the unique identifier (1 to N), the total number of
user exits (N), and the database partition (DBPARTITION) number on
which the user exit instance is running, respectively. You should
ensure that any data written to the named pipe by each user exit
process is not duplicated by the other concurrent processes. While
there are many ways your user exit application might accomplish this,
these values could be helpful to ensure data is not duplicated. For
example, if each record of data contains a unique integer column value,
your user exit application could use the <user exit ID> and <number
of user exits> values to ensure that each user exit instance returns
a unique result set into its named pipe. Your user exit application
might use the
MODULUS property in the following way:
i = <user exit ID>
N = <number of user exits>
foreach record
{
if ((unique-integer MOD N) == i)
{
write this record to my named-pipe
}
}
The number of user exit processes spawned depends on the
distribution mode specified for database partitioning:
- As Figure 2 shows, one user exit
process is spawned for every pre-partitioning agent when PARTITION_AND_LOAD (default)
or PARTITION_ONLY without PARALLEL is specified.
.
Figure 2. The various tasks performed when PARTITION_AND_LOAD (default)
or PARTITION_ONLY without PARALLEL is specified.
- As Figure 3 shows, one user exit
process is spawned for every partitioning agent when PARTITION_AND_LOAD (default)
or PARTITION_ONLY with PARALLEL is
specified.
Figure 3. The various tasks performed
when PARTITION_AND_LOAD (default) or PARTITION_ONLY with
PARALLEL is specified.
- As Figure 4 shows, one user exit
process is spawned for every load agent when LOAD_ONLY or LOAD_ONLY_VERIFY_PART is
specified.
Figure 4. The various tasks performed
when LOAD_ONLY or LOAD_ONLY_VERIFY_PART is
specified.
Restrictions
- The LOAD_ONLY and LOAD_ONLY_VERIFY_PART partitioned-db-cfg
mode options are not supported when the SOURCEUSEREXIT PARALLELIZE option
is not specified.
Examples
Example 1: A Load userexit
script that replaces all tab characters '\t' with comma characters
',' from every record of the source media file. To invoke the Load
utility using this userexit script, use a command similar to the following:
DB2 LOAD FROM /path/file1 OF DEL INSERT INTO schema1.table1
SOURCEUSEREXIT example1.pl REDIRECT OUTPUT TO FILE /path/ue_msgs.txt
Note
that the userexit must be placed into the
sqllib/bin/ folder,
and requires execute permissions.
example1.pl:
#!/bin/perl
# Filename: example1.pl
#
# This script is a simple example of a userexit for the Load utility
# SOURCEUSEREXIT feature. This script will replace all tab characters '\t'
# with comma characters ',' from every record of the source media file.
#
# To invoke the Load utility using this userexit, use a command similar to:
#
# db2 LOAD FROM /path/file1 OF DEL INSERT INTO schema1.table1
# SOURCEUSEREXIT example1.pl REDIRECT OUTPUT TO FILE /path/ue_msgs.txt
#
# The userexit must be placed into the sqllib/bin/ folder, and requires
# execute permissions.
#--------------------------------------------------------------------
if ($#ARGV < 5)
{
print "Invalid number of arguments:\n@ARGV\n";
print "Load utility should invoke userexit with 5 arguments (or more):\n";
print "<base pipename> <number of source media> ";
print "<source media 1> <source media 2> ... <user exit ID> ";
print "<number of user exits> <database partition number> ";
print "<optional: redirected input> \n";
die;
}
# Open the output fifo file (the Load utility is reading from this pipe)
#-----------------------------------------------------------------------
$basePipeName = $ARGV[0];
$outputPipeName = sprintf("%s.000", $basePipeName);
open(PIPETOLOAD, '>', $outputPipeName) || die "Could not open $outputPipeName";
# Get number of Media Files
#--------------------------
$NumMediaFiles = $ARGV[1];
# Open each media file, read the contents, replace '\t' with ',', send to Load
#-----------------------------------------------------------------------------
for ($i=0; $i<$NumMediaFiles; $i++)
{
# Open the media file
#--------------------
$mediaFileName = $ARGV[2+$i];
open(MEDIAFILETOREAD, '<', $mediaFileName) || die "Could not open $mediaFileName";
# Read each record of data
#------------------------
while ( $line = <MEDIAFILETOREAD> )
{
# Replace '\t' characters with ','
#---------------------------------
$line =~ s/\t/,/g;
# send this record to Load for processing
#-----------------------------------------
print PIPETOLOAD $line;
}
# Close the media file
#---------------------
close MEDIAFILETOREAD;
}
# Close the fifo
#---------------
close PIPETOLOAD;
exit 0;