PM69279: ClearCase VOB shared memory becomes wedged and does not recover without manual intervention

Subscribe

You can track all active APARs for this component.

APAR status

  • Closed as program error.

Error description

  • ClearCase VOB shared memory becomes wedged and does not recover
    without manual intervention
    
    
    ClearCase 7.1.2.6
    
    RedHat 5 Update 7
       *VOB databases are stored on Network Attached Storage
    
    
    Description of the Problem:
    
    When storing VOBs on Network Attached Storage, individual VOBs
    may become unresponsive with the following messages within the
    db_server_log:
    
    db_server(*****): Error: CRRDM0656E *** db_VISTA database error
    -926 - problem in shared memory lock manager: Process *****
    timed out waiting for lock.
    db_server(*****): Error: Timeout getting lock in VOB
    '/net/nas/test.vbs/db'.
    db_server(*****): Warning: 'admin,pid==*****,euid==UNIX:UID-0,
    cleartool 'describe' 'vob:/vobs/test'' waited 299 seconds for a
    'r' lock in the usual area in /net/nas/test.vbs/db!
    
    
    A truss of the db_server process resembles the following:
    
    17331 13:36:41 futex(0xb7cb90b8, FUTEX_WAIT, 68, {59,
    360378300}) = -1 ETIMEDOUT (Connection timed out) ?59.369790?
    17331 13:37:41 kill(17331, SIG_0)       = 0 ?0.000008?
    17331 13:37:41 time(NULL)               = 1342039061 ?0.000006?
    17331 13:37:41 time(NULL)               = 1342039061 ?0.000006?
    17331 13:37:41 futex(0xb7cb900c, FUTEX_WAKE, 1) = 0 ?0.000008?
    17331 13:37:41 clock_gettime(CLOCK_REALTIME, {1342039061,
    9759788}) = 0 ?0.000007?
    17331 13:37:41 futex(0xb7cb90b8, FUTEX_WAIT, 70, {59,
    990240212}) = -1 ETIMEDOUT (Connection timed out) ?60.000617?
    17331 13:38:41 kill(17331, SIG_0)       = 0 ?0.000007?
    17331 13:38:41 time(NULL)               = 1342039121 ?0.000006?
    17331 13:38:41 time(NULL)               = 1342039121 ?0.000006?
    17331 13:38:41 futex(0xb7cb900c, FUTEX_WAKE, 1) = 0 ?0.000007?
    17331 13:38:41 clock_gettime(CLOCK_REALTIME, {1342039121,
    10693032}) = 0 ?0.000007?
    17331 13:38:41 futex(0xb7cb90b8, FUTEX_WAIT, 72, {59,
    989306968}) = -1 ETIMEDOUT (Connection timed out) ?60.000586?
    17331 13:39:41 kill(17331, SIG_0)       = 0 ?0.000007?
    17331 13:39:41 time(NULL)               = 1342039181 ?0.000006?
    17331 13:39:41 time(NULL)               = 1342039181 ?0.000006?
    17331 13:39:41 futex(0xb7cb900c, FUTEX_WAKE, 1) = 0 ?0.000008?
    17331 13:39:41 clock_gettime(CLOCK_REALTIME, {1342039181,
    11622577}) = 0 ?0.000010?
    17331 13:39:41 futex(0xb7cb90b8, FUTEX_WAIT, 74, {59, 988377423}
    ?unfinished ...?
    .
    .
    17331 13:40:41 ?... futex resumed? )    = -1 ETIMEDOUT
    (Connection timed out) ?59.998220?
    17331 13:40:41 kill(17331, SIG_0)       = 0 ?0.000007?
    17331 13:40:41 time(NULL)               = 1342039241 ?0.000006?
    17331 13:40:41 time(NULL)               = 1342039241 ?0.000006?
    17331 13:40:41 futex(0xb7cb900c, FUTEX_WAKE, 1) = 0 ?0.000007?
    17331 13:40:41 clock_gettime(CLOCK_REALTIME, {1342039241,
    10196451}) = 0 ?0.000006?
    17331 13:40:41 futex(0xb7cb90b8, FUTEX_WAIT, 76, {59, 989803549}
    ?unfinished ...?
    .
    .
    17331 13:41:41 ?... futex resumed? )    = -1 ETIMEDOUT
    (Connection timed out) ?60.000063?
    17331 13:41:41 kill(17331, SIG_0)       = 0 ?0.000008?
    17331 13:41:41 time(NULL)               = 1342039301 ?0.000006?
    17331 13:41:41 write(2, 'CRRDM0656E *** db_VISTA database'...,
    83) = 83 ?0.000012?
    17331 13:41:41 write(2, ': ', 2)        = 2 ?0.000008?
    17331 13:41:41 write(2, 'Process 17331 timed out waiting '...,
    41) = 41 ?0.000008?
    17331 13:41:41 write(2, '\n', 1)        = 1 ?0.000007?
    
    
    Workaround:  Use direct attached storage (iSCSI) to split the
    database from the rest of the VOB pools and store the database
    locally
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    ****************************************************************
    * RECOMMENDATION:                                              *
    ****************************************************************
    Under certain conditions, error conditions on NAS VOB
    storage could trigger a deadlock in access to the VOB
    database files, causing db_server processes to hang.
    

Problem conclusion

  • A fix is available in ClearCase 7.1.2.9 and 8.0.0.5.
    

Temporary fix

Comments

APAR Information

  • APAR number

    PM69279

  • Reported component name

    CLEARCASE UNIX

  • Reported component ID

    5724G2901

  • Reported release

    711

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2012-07-19

  • Closed date

    2012-12-15

  • Last modified date

    2012-12-15

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    CLEARCASE UNIX

  • Fixed component ID

    5724G2901

Applicable component levels

  • R711 PSN

       UP



Rate this page:

(0 users)Average rating

Add comments

Document information


More support for:

Rational ClearCase

Software version:

7.1.1

Reference #:

PM69279

Modified date:

2012-12-15

Translate my page

Machine Translation

Content navigation