IBM Support

[Microsoft][ODBC SQL Server Driver]Communication link failure' errors are being logged in the Process Engine elogs, how can they be corrected?

Troubleshooting


Problem

Process Engine 4.5.1.x running on Windows using Microsoft SQL Server database will intermittently log the following errors: 2012/06/05 10:06:38.492 121,0,41 VW/Process (1288.7084.64 0x508.1bac) ... [CRITICAL] Error in GDBY_logoff: SQLEndTran DBC, hdbc=13038296 (&01d3e0d0), COMMIT (..\src\GDBY.c, VERSION 4.1.1.1, @2628). SQLSTATE = 08S01, NativeError = 0, ErrorMsg = '[Microsoft][ODBC SQL Server Driver]Communication link failure'

Symptom

The following errors are logged in the Process Engine 4.5.1.x elog or in the Windows application event log:

    2012/06/05 10:06:38.476 121,0,41 <S103092> VW/Process  (1288.7084.64 0x508.1bac) ... [CRITICAL]
    Error in GDBY_fetch: SQLFetch, STMT 31792776 (&01d3e410) (..\src\GDBY.c, VERSION 4.1.1.1, @3680).
    SQLSTATE = 08S01, NativeError = 11,
    ErrorMsg = '[Microsoft][ODBC SQL Server Driver][DBNETLIB]General network error. Check your network documentation.'

    2012/06/05 10:06:38.492 121,0,41 <S103092> VW/Process  (1288.7084.64 0x508.1bac) ... [CRITICAL]
    Error in GDBY_logoff: SQLEndTran DBC, hdbc=13038296 (&01d3e0d0), COMMIT (..\src\GDBY.c, VERSION 4.1.1.1, @2628).
    SQLSTATE = 08S01, NativeError = 0,
    ErrorMsg = '[Microsoft][ODBC SQL Server Driver]Communication link failure'

Microsoft SQL Server will log a corresponding entry in the SQL Server error log
    Error: 7886, Severity: 20, State: 1.
    A read operation on a large object failed while sending data to the client. A common cause for this is if the application is running in READ UNCOMMITTED isolation level. This connection will be terminated.

This can occur on Microsoft SQL Server 2003 or 2008.

Cause

Process Engine 4.5.1 relies on the Image Services (IS) 4.1.2 database libraries to communicate with and execute transactions in the database. By default, the IS database libraries, when communicating with Microsoft SQL, uses READ UNCOMMITTED isolation level for transactions.

Microsoft provided the following explanation of the 7886 error, and a condition when this error may be returned by SQL Server:

    Read failure on LOBs, probably due to read uncommitted isolation level, or nolock scans. For example, User A has fired a query to delete a row and the query finds a pointer that points to a location where actual data is lying and it deletes that data. Now at the same time User B fires a query with NO LOCK hint to access that row , It finds the pointer that redirects it to the actual data page but when it reaches there it doesn’t find the data because it has been deleted So it might return this error message.

When Microsoft SQL Server returns this error to the client application, it will also terminate the connection.

This error can occur for Process Engine when the following conditions are met:
1) The BLOB objects associated with workflow work objects are large (90K+)
2) There is reasonable load on the Process Engine server, where the VWKs processes are all processing
3) The work objects are all being processed from the same database table in the database

According to Microsoft, the way to remove or eliminate the errors is to not use the READ_UNCOMMITTED isolation level. The Microsoft SQL 7886 error does not occur with READ_COMMITED isolation level.

Resolving The Problem


Please keep in mind that even with the [CRITICAL] messages in the elogs, this error does not impact Process Engine with respect to data integrity or affect usability. After receiving the Communication failure message, Process Engine will automatically recover the terminated database connection, retry the transaction, and continue to function.

Here is a sample message of the database connection recovery. This is typically logged immediately after the Microsoft SQL Server communication failure.

    2012/08/23 13:27:33.396  <fnsw> VW/Process  (1780.3112.65 0x6f4.c28) ...
    VW (VWKs.1) (Region=1): Recovered from RDBMS connection error.

In order to change the Microsoft SQL Server Isolation Level, to remove the Communication Failure messages / Microsoft SQL 7886 errors, you need to do the following:

1) Install IS 4.1.2 FP15.
2) Change the isolation level that IS uses to communicate with MSSQL to READ_COMMITTED
    a) Create a folder \FNSW_LOC\trigger
    b) Create a blank file named GDB_ISOLATION
3) Restart Process Engine

To verify that READ_COMMITTED is enabled, look for the following log message in the elogs:
    2012/08/23 08:15:39.098  <fnsw> VW/Process  (5776.676.10 0x1690.2a4) ... [INFO]
    GDB isolation trigger file 'C:\FNSW_LOC\trigger\GDB_ISOLATION' was detected. Isolation level would be set to 'READ COMMITTED'

Once the isolation level is changed, you should not see any more 'communication failure' messages.

Note: This is not an issue for Process Engine 5.0, as the default isolation level is READ_COMMITTED.

[{"Product":{"code":"SSTHRT","label":"IBM Case Foundation"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Component":"Process Engine","Platform":[{"code":"PF033","label":"Windows"}],"Version":"4.5.1","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
17 June 2018

UID

swg21609058