IBM Support

Unfenced Oracle wrapper got unexpected errors when using Oracle 11gR2 client

Technote (troubleshooting)


Problem(Abstract)

If you configured unfenced Oracle wrapper working with Oracle 11gR2 client, you will encounter DB2 crash or ORA-01013 error intermittently.

Symptom

Currently, two cases were encountered.

Case 1: Running a query against Oracle nickname failed with an ORA-01013 error without any interrupt operation performed.

Case 2: DB2 instance crashed with a stack similar to below:


<POFDisassembly>
nsgetcinfo + 0x00cc (/oracle/app/oracle/product/11.2.0/client_1/lib/libclntsh.so.11.1)

0x00002B7A66C9DED8 : 4D8B942458050000
</POFDisassembly>
<StackTrace>
-----FUNC-ADDR---- ------FUNCTION + OFFSET------
0x00002B79F4F0C0E7 ossDumpStackTraceEx + 0x01f7
(/opt/ibm/db2/V9.5/lib64/libdb2osse.so.1)
0x00002B79F4F07BBA _ZN11OSSTrapFile6dumpExEmiP7siginfoPvm + 0x00b4
(/opt/ibm/db2/V9.5/lib64/libdb2osse.so.1)
0x00002B79F4F07C81 _ZN11OSSTrapFile4dumpEmiP7siginfoPv + 0x0009
(/opt/ibm/db2/V9.5/lib64/libdb2osse.so.1)
0x00002B79F1263C19 sqlo_trce + 0x0425
(/opt/ibm/db2/V9.5/lib64/libdb2e.so.1)
0x00002B79F12A5D24 sqloEDUCodeTrapHandler + 0x0138
(/opt/ibm/db2/V9.5/lib64/libdb2e.so.1)
0x000000398D40E4C0 address: 0x000000398D40E4C0 ; dladdress: 0x000000398D400000 ; offset in lib: 0x000000000000E4C0 ;
(/lib64/libpthread.so.0)
0x00002B7A66C9DED8 nsgetcinfo + 0x00cc
(/oracle/app/oracle/product/11.2.0/client_1/lib/libclntsh.so.11.1)
0x00002B7A67A0BFCD address: 0x00002B7A67A0BFCD ; dladdress: 0x00002B7A66793000 ; offset in lib: 0x0000000001278FCD ;
(/oracle/app/oracle/product/11.2.0/client_1/lib/libclntsh.so.11.1)
0x00002B7A67A13330 address: 0x00002B7A67A13330 ; dladdress: 0x00002B7A66793000 ; offset in lib: 0x0000000001280330 ;
(/oracle/app/oracle/product/11.2.0/client_1/lib/libclntsh.so.11.1)
0x00002B7A67A133BB address: 0x00002B7A67A133BB ; dladdress: 0x00002B7A66793000 ; offset in lib: 0x00000000012803BB ;
(/oracle/app/oracle/product/11.2.0/client_1/lib/libclntsh.so.11.1)
0x00002B7A67C2A886 sslssAsynchHdlr + 0x0160
(/oracle/app/oracle/product/11.2.0/client_1/lib/libclntsh.so.11.1)
0x00002B7A67C2A1A1 sslsshandler + 0x0091
(/oracle/app/oracle/product/11.2.0/client_1/lib/libclntsh.so.11.1)

If the DB2 crash happened on AIX, "errpt -a" will also print a similar stack:

nsgetcinf 54
nioqih 24
nioqih 24
nigsuiint 88
nigsuihdl 64
sslssAsyn 16C
sslsshand D4
??


Cause

When Oracle wrapper uses unfenced mode, Federation Server will load the Oracle client library into db2sysc process where DB2 signal handlers are stored. Oracle 11gR2 client rewrites DB2 signal handler SIGINT, which we don't see in previous Oracle versions. If some EDU sends an interrupt request, it will also be captured by Oracle client and cause unexpected issues such as DB2 crash or ORA-01013 error.
Oracle has confirmed it's their defect and opened Bug 12877221 "11.2 CLIENT OVERRIDES APPLICATION SIGNAL HANDLER SIGINIT" to track this issue.


Environment

OS: Linux/UNIX, 64bit

Federation Server: v9.5, v9.7, v10.1, v10.5

Oracle client: 11gR2


Diagnosing the problem

If db2 crash happened, go through the FODC trap file; if just encountered ORA-01013 error, use "db2pd -stack all" command to generate stack file in the directory specified by DIAGPATH parameter.

In the pid.tid.node.stack.txt file, where pid is the process id of db2sysc and tid is the EDU id reporting crash or ORA-01013 error in db2diag.log file, there is similar information as follows:

Entry 7:

Object name: /oracle/client/11.2.0/lib/libclntsh.a(shr.o)

Text range: [0x090000000f3a3100 - 0x09000000114b0851] (0x000000000210d751 bytes)

Data range: [0x09001000a1a9ff5d - 0x09001000a1c96c48] (0x00000000001f6ceb bytes)

......

Entry 59:

Object name: /opt/IBM/db2/V9.7/lib64/libdb2e.a(shr_64.o)

Text range: [0x09000000050d3100 - 0x090000000a971996] (0x000000000589e896 bytes)

Data range: [0x09001000a0e282dc - 0x09001000a159c040] (0x0000000000773d64 bytes)

......

<SignalHandlers>

SIGABRT : 9001000a0ed36c0

SIGBUS : 9001000a0ed36c0

SIGCHLD : ignored

SIGDANGER : 9001000a0ed36d8

SIGEMT : 9001000a0ed36c0

SIGGRANT : 9001000a0ed3840

SIGILL : 9001000a0ed36c0

SIGXCPU : ignored

SIGINT : 9001000a1be50b0

SIGPRE : 9001000a0ed37c8

SIGSEGV : 9001000a0ed36c0

SIGSYS : 9001000a0ed36c0

SIGTRAP : 9001000a0ed36c0

SIGALRM : 9001000a0ed0118

SIGURG : ignored

SIGPROF : ignored

SIGPIPE : ignored

SIGHUP : ignored

SIGFPE : 9001000a0ed36c0

SIGUSR1 : 9001000a0ed0f48

SIGUSR2 : 9001000a0ed0f48

</SignalHandlers>

SIGINT: 9001000a1be50b0 pointed to an address that locates in /oracle/client/11.2.0/lib/libclntsh.a(shr.o) [0x09001000a1a9ff5d - 0x09001000a1c96c48].


Resolving the problem

There are several ways to resolve the problems.

1) Altering Oracle wrapper to fenced mode can bypass both of crash and ORA-01013 problems. Because in fenced mode, Oracle client library will be loaded into db2fmp process which is a separate process from db2sysc.

Command syntax to modify Oracle wrapper mode to fenced:

ALTER WRAPPER oracle_wrapper_name OPTIONS (SET DB2_FENCED 'Y')

2) Degrade Oracle client to 11gR1 or lower, which can resolve both of the problems

3) Contact Oracle to get the patch and solution for Bug 12877221, which can resolve both of the problems. Note that in the fix of this bug, Oracle introduced a new parameter DISABLE_INTERRUPT, you need to set it in Oracle client sqlnet.ora file like below and restart Federation Server:

DISABLE_INTERRUPT=on

Related information

A simplified Chinese translation is available

Document information

More support for: InfoSphere Federation Server
Data Sources and Wrappers - Oracle

Software version: 9.5, 9.7, 10.1, 10.5

Operating system(s): AIX, HP-UX, Linux, Solaris

Reference #: 1499426

Modified date: 24 November 2016