Skip to main content

PM69279: ClearCase VOB shared memory becomes wedged and does not recover without manual intervention


Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • ClearCase VOB shared memory becomes wedged and does not recover
    without manual intervention
    
    
    ClearCase 7.1.2.6
    
    RedHat 5 Update 7
       *VOB databases are stored on Network Attached Storage
    
    
    Description of the Problem:
    
    When storing VOBs on Network Attached Storage, individual VOBs
    may become unresponsive with the following messages within the
    db_server_log:
    
    db_server(*****): Error: CRRDM0656E *** db_VISTA database error
    -926 - problem in shared memory lock manager: Process *****
    timed out waiting for lock.
    db_server(*****): Error: Timeout getting lock in VOB
    '/net/nas/test.vbs/db'.
    db_server(*****): Warning: 'admin,pid==*****,euid==UNIX:UID-0,
    cleartool 'describe' 'vob:/vobs/test'' waited 299 seconds for a
    'r' lock in the usual area in /net/nas/test.vbs/db!
    
    
    A truss of the db_server process resembles the following:
    
    17331 13:36:41 futex(0xb7cb90b8, FUTEX_WAIT, 68, {59,
    360378300}) = -1 ETIMEDOUT (Connection timed out) ?59.369790?
    17331 13:37:41 kill(17331, SIG_0)       = 0 ?0.000008?
    17331 13:37:41 time(NULL)               = 1342039061 ?0.000006?
    17331 13:37:41 time(NULL)               = 1342039061 ?0.000006?
    17331 13:37:41 futex(0xb7cb900c, FUTEX_WAKE, 1) = 0 ?0.000008?
    17331 13:37:41 clock_gettime(CLOCK_REALTIME, {1342039061,
    9759788}) = 0 ?0.000007?
    17331 13:37:41 futex(0xb7cb90b8, FUTEX_WAIT, 70, {59,
    990240212}) = -1 ETIMEDOUT (Connection timed out) ?60.000617?
    17331 13:38:41 kill(17331, SIG_0)       = 0 ?0.000007?
    17331 13:38:41 time(NULL)               = 1342039121 ?0.000006?
    17331 13:38:41 time(NULL)               = 1342039121 ?0.000006?
    17331 13:38:41 futex(0xb7cb900c, FUTEX_WAKE, 1) = 0 ?0.000007?
    17331 13:38:41 clock_gettime(CLOCK_REALTIME, {1342039121,
    10693032}) = 0 ?0.000007?
    17331 13:38:41 futex(0xb7cb90b8, FUTEX_WAIT, 72, {59,
    989306968}) = -1 ETIMEDOUT (Connection timed out) ?60.000586?
    17331 13:39:41 kill(17331, SIG_0)       = 0 ?0.000007?
    17331 13:39:41 time(NULL)               = 1342039181 ?0.000006?
    17331 13:39:41 time(NULL)               = 1342039181 ?0.000006?
    17331 13:39:41 futex(0xb7cb900c, FUTEX_WAKE, 1) = 0 ?0.000008?
    17331 13:39:41 clock_gettime(CLOCK_REALTIME, {1342039181,
    11622577}) = 0 ?0.000010?
    17331 13:39:41 futex(0xb7cb90b8, FUTEX_WAIT, 74, {59, 988377423}
    ?unfinished ...?
    .
    .
    17331 13:40:41 ?... futex resumed? )    = -1 ETIMEDOUT
    (Connection timed out) ?59.998220?
    17331 13:40:41 kill(17331, SIG_0)       = 0 ?0.000007?
    17331 13:40:41 time(NULL)               = 1342039241 ?0.000006?
    17331 13:40:41 time(NULL)               = 1342039241 ?0.000006?
    17331 13:40:41 futex(0xb7cb900c, FUTEX_WAKE, 1) = 0 ?0.000007?
    17331 13:40:41 clock_gettime(CLOCK_REALTIME, {1342039241,
    10196451}) = 0 ?0.000006?
    17331 13:40:41 futex(0xb7cb90b8, FUTEX_WAIT, 76, {59, 989803549}
    ?unfinished ...?
    .
    .
    17331 13:41:41 ?... futex resumed? )    = -1 ETIMEDOUT
    (Connection timed out) ?60.000063?
    17331 13:41:41 kill(17331, SIG_0)       = 0 ?0.000008?
    17331 13:41:41 time(NULL)               = 1342039301 ?0.000006?
    17331 13:41:41 write(2, 'CRRDM0656E *** db_VISTA database'...,
    83) = 83 ?0.000012?
    17331 13:41:41 write(2, ': ', 2)        = 2 ?0.000008?
    17331 13:41:41 write(2, 'Process 17331 timed out waiting '...,
    41) = 41 ?0.000008?
    17331 13:41:41 write(2, '\n', 1)        = 1 ?0.000007?
    
    
    Workaround:  Use direct attached storage (iSCSI) to split the
    database from the rest of the VOB pools and store the database
    locally
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    ****************************************************************
    * RECOMMENDATION:                                              *
    ****************************************************************
    Under certain conditions, error conditions on NAS VOB
    storage could trigger a deadlock in access to the VOB
    database files, causing db_server processes to hang.
    

Problem conclusion

  • A fix is available in ClearCase 7.1.2.9 and 8.0.0.5.
    

Temporary fix

Comments

APAR Information

  • APAR number

    PM69279

  • Reported component name

    CLEARCASE UNIX

  • Reported component ID

    5724G2901

  • Reported release

    711

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2012-07-19

  • Closed date

    2012-12-15

  • Last modified date

    2012-12-15

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    CLEARCASE UNIX

  • Fixed component ID

    5724G2901

Applicable component levels

  • R711 PSN

       UP

Rate this page:

(0 users)Average rating

Copyright and trademark information

IBM, the IBM logo and ibm.com are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.

Rate this page:


(0 users)Average rating

Add comments

Document information

Rational ClearCase


Software version:
7.1.1


Reference #:
PM69279


Modified date:
2012-12-15

Translate my page

Content navigation