Writing an external user-defined function

An external user-defined function is written in a programming language and is similar to other SQL programs. You can include static or dynamic SQL statements, IFI calls, and DB2® commands that are issued through IFI calls.

About this task

Your user-defined function can also access remote data using the following methods:

  • DRDA access using CONNECT or SET CONNECTION statements

You can write an external user-defined function in assembler, C, C++, COBOL, PL/I, or Java. User-defined functions that are written in COBOL can include object-oriented extensions, just as other DB2 COBOL programs can. User-defined functions that are written in Java follow coding guidelines and restrictions specific to Java.

Restrictions on user-defined function programs: Observe these restrictions when you write a user-defined function:
  • Because DB2 uses the Resource Recovery Services attachment facility (RRSAF) as its interface with your user-defined function, you must not include RRSAF calls in your user-defined function. DB2 rejects any RRSAF calls that it finds in a user-defined function.
  • If your user-defined function is not defined with parameters SCRATCHPAD or EXTERNAL ACTION, the user-defined function is not guaranteed to execute under the same task each time it is invoked.
  • You cannot execute COMMIT or ROLLBACK statements in your user-defined function.
  • You must close all cursors that were opened within a user-defined scalar function. DB2 returns an SQL error if a user-defined scalar function does not close all cursors that it opened before it completes.
  • When you choose the language in which to write a user-defined function program, be aware of restrictions on the number of parameters that can be passed to a routine in that language. User-defined table functions in particular can require large numbers of parameters. Consult the programming guide for the language in which you plan to write the user-defined function for information about the number of parameters that can be passed.
  • You cannot pass LOB file reference variables as parameters to user-defined functions.
  • User-defined functions cannot return LOB file reference variables.
  • You cannot pass parameters with the type XML to user-defined functions. You can specify tables or views that contain XML columns as table locator parameters. However, you cannot reference the XML columns in the body of the user-defined function.
Coding your user-defined function as a main program or as a subprogram: You can code your user-defined function as either a main program or a subprogram. The way that you code your program must agree with the way you defined the user-defined function: with the PROGRAM TYPE MAIN or PROGRAM TYPE SUB parameter. The main difference is that when a main program starts, Language Environment® allocates the application program storage that the external user-defined function uses. When a main program ends, Language Environment closes files and releases dynamically allocated storage.

If you code your user-defined function as a subprogram and manage the storage and files yourself, you can get better performance. The user-defined function should always free any allocated storage before it exits. To keep data between invocations of the user-defined function, use a scratchpad.

You must code a user-defined table function that accesses external resources as a subprogram. Also ensure that the definer specifies the EXTERNAL ACTION parameter in the CREATE FUNCTION or ALTER FUNCTION statement. Program variables for a subprogram persist between invocations of the user-defined function, and use of the EXTERNAL ACTION parameter ensures that the user-defined function stays in the same address space from one invocation to another.

Parallelism considerations: If the definer specifies the parameter ALLOW PARALLEL in the definition of a user-defined scalar function, and the invoking SQL statement runs in parallel, the function can run under a parallel task. DB2 executes a separate instance of the user-defined function for each parallel task. When you write your function program, you need to understand how the following parameter values interact with ALLOW PARALLEL so that you can avoid unexpected results:
  • SCRATCHPAD

    When an SQL statement invokes a user-defined function that is defined with the ALLOW PARALLEL parameter, DB2 allocates one scratchpad for each parallel task of each reference to the function. This can lead to unpredictable or incorrect results.

    For example, suppose that the user-defined function uses the scratchpad to count the number of times it is invoked. If a scratchpad is allocated for each parallel task, this count is the number of invocations done by the parallel task and not for the entire SQL statement, which is not the result that is wanted.

  • FINAL CALL

    If a user-defined function performs an external action, such as sending a note, for each final call to the function, one note is sent for each parallel task instead of once for the function invocation.

  • EXTERNAL ACTION

    Some user-defined functions with external actions can receive incorrect results if the function is executed by parallel tasks.

    For example, if the function sends a note for each initial call to the function, one note is sent for each parallel task instead of once for the function invocation.

  • NOT DETERMINISTIC

    A user-defined function that is non-deterministic can generate incorrect results if it is run under a parallel task.

    For example, suppose that you execute the following query under parallel tasks:
    SELECT * FROM T1 WHERE C1 = COUNTER();
    COUNTER is a user-defined function that increments a variable in the scratchpad every time it is invoked. Counter is non-deterministic because the same input does not always produce the same output. Table T1 contains one column, C1, that has the following values:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    When the query is executed with no parallelism, DB2 invokes COUNTER once for each row of table T1, and there is one scratchpad for counter, which DB2 initializes the first time that COUNTER executes. COUNTER returns 1 the first time it executes, 2 the second time, and so on. The result table for the query has the following values:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10

    Now suppose that the query is run with parallelism, and DB2 creates three parallel tasks. DB2 executes the predicate WHERE C1 = COUNTER() for each parallel task. This means that each parallel task invokes its own instance of the user-defined function and has its own scratchpad. DB2 initializes the scratchpad to zero on the first call to the user-defined function for each parallel task.

    If parallel task 1 processes rows 1 to 3, parallel task 2 processes rows 4 to 6, and parallel task 3 processes rows 7 to 10, the following results occur:
    • When parallel task 1 executes, C1 has values 1, 2, and 3, and COUNTER returns values 1, 2, and 3, so the query returns values 1, 2, and 3.
    • When parallel task 2 executes, C1 has values 4, 5, and 6, but COUNTER returns values 1, 2, and 3, so the query returns no rows.
    • When parallel task 3, executes, C1 has values 7, 8, 9, and 10, but COUNTER returns values 1, 2, 3, and 4, so the query returns no rows.
    Thus, instead of returning the 10 rows that you might expect from the query, DB2 returns only 3 rows.