IBM Support

IZ09423: LARGE NUMBER OF THREADS NEED 2 BUCKETS AND CRASHED.

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • AMOS 5.1.0.26 with hot fix IY95481_FP26 on Solaris  5.9 hang.
    The server was hanging and was not able to login to the machine
    at all. Crash dump performed.
    
    Solaris server: SunOS nus637 5.9 Generic_118558-26 sun4u sparc
    SUNW,Sun-Fire-V440
    
    Analysis by Sun: Looking at the crashdump, many processes (most
    of them are sendmail and mail.local) looks hanging up in kazndrv
    driver.
    
    
    Crash Dump Details:
    ==================
    SolarisCAT(nus637-vmcore/9U)> proc tree |grep sendmail | wc
      1155    4620   48064
    SolarisCAT(nus637-vmcore/9U)> proc tree |grep mail.local | wc
       973    2919   29124
    
    !! But hundreds of mail processes exist.
    
    ==== user (LWP_SYS) thread: 0x36849112800  PID: 3183 ====
    cmd: mail.local -l
    t_wchan: 0x781c18aa sobj: condition var
    (fromkazndrv:kenv_pathMemGet+0x190)
    t_procp: 0x3684c39e048
    p_as: 0x3684b5cee50  size: 3088384  rss: 1810432
    hat: 0x36851547b48  cnum: 0x1e99  cpusran: 0,1,2,3
    t_stk: 0x2a101ca5af0  sp: 0x2a101ca4b71  t_stkbase:
    0x2a101ca2000
    t_pri: 59(TS)  pctcpu: 0.000000
    t_lwp: 0x36849110038  machpcb: 0x2a101ca5af0
    mstate: LMS_SLEEP  ms_prev: LMS_SYSTEM
    ms_state_start: 1 days 4 hours 14 minutes 29.717327700 seconds
    earlier
    ms_start: 1 days 4 hours 16 minutes 15.774093800 seconds earlier
    psrset: 0  last CPU: 3
    idle: 10165761 ticks (1 days 4 hours 14 minutes 17.61 seconds)
    start: Mon Nov 12 18:41:39 2007
    age: 101760 seconds (1 days 4 hours 16 minutes)
    syscall: #9 link(, 0xffbfb860) (sysent: kazndrv:nct_link32+0x0)
    tstate: TS_SLEEP - awaiting an event
    tflg:   T_DFLTSTK - stack is default size
    tpflg:  TP_TWAIT - wait to be freed by lwp_wait
          TP_MSACCT - collect micro-state accounting information
    tsched: TS_LOAD - thread is in memory
          TS_DONT_SWAP - thread/LWP should not be swapped
    pflag:  SLOAD - in core
          SMSACCT - process is keeping micro-state accounting
          NOCD - new creds from VSxID, do not coredump
    
    pc:      0x1084e98      genunix:cv_wait+0x3c:   call
    unix:swtch
    
    genunix:cv_wait+0x3c(0x781c18aa, 0x781c1898, 0x0, 0x3,
    0x100c824, 0x0)
    kazndrv:kenv_pathMemGet+0x190(0x2a101ca5a08, 0x15,
    0x2a101ca59d0, 0x0,0x3684b15ca60, 0xff39cc00)
    kazndrv:kenv_procGetPathMem+0x24c(0x2a101ca5a08, ,
    0x2a101ca59d0,0x2a101ca579c, 0x81010100, 0xff0000)
    kazndrv:nct_copyPath+0x4c(, 0x5, 0x2a101ca59c8, 0x0, 0x15, 0x0)
    kazndrv:nct_addAuxPath+0x168(0xffbfbf28, 0x2a101ca59c8, 0x0,
    0x5,0x3684a4b5ad0, 0x0)
    kazndrv:nct_linkCommon+0xb4(0xffbfbf28, 0xff354780,
    0x2a101ca59c8,0x2a101ca5a58, 0x3684a4b5ad0, 0x2a101ca5aec)
    kazndrv:nct_link32+0x104(0xffbfbf28?, 0xff354780, 0x0,
    0xff13c000, 0x0,0x0)
    unix:syscall_trap32+0xa8()
    -- switch to user thread's user stack --
    
    !! The longet waiting mail.local has been stuck for over 1 day.
    
    
    1007 threads: 0x3000040b280 0x36859ba5280 0x36854efca80
    0x3684d8a2d20...
    genunix:cv_wait+0x3c
    kazndrv:kenv_pathMemGet+0x190
    kazndrv:kenv_procGetPathMem+0x24c
    kazndrv:nct_copyPath+0x4c
    kazndrv:nct_processPath+0x188
    kazndrv:nct_openCommon+0xf4
    kazndrv:c_open+0x168
    kazndrv:nct_open32+0xa0
    unix:syscall_trap32+0xa8
    
    227 threads: 0x36850a577e0 0x36859ba4800 0x3685357ea80
    0x368555f37c0...
    genunix:cv_wait+0x3c
    kazndrv:kenv_pathMemCheckWaitIfUse2+0x64
    kazndrv:kenv_pathMemGet+0x270
    kazndrv:kenv_procGetPathMem+0x24c
    kazndrv:nct_copyPath+0x4c
    kazndrv:nct_processPath+0x188
    kazndrv:nct_linkCommon+0x70
    kazndrv:nct_link32+0x104
    unix:syscall_trap32+0xa8
    
    !!  Over 1000 threads are hanging up in kazndrv driver.
    
    ==== user (LWP_SYS) thread: 0x3684a3aa020  PID: 4852 ====
    cmd: /usr/bin/ksh /usr/local/soe/bin/cpuhog
    t_wchan: 0x781c18aa sobj: condition var (from
    kazndrv:kenv_pathMemGet+0x190)
    t_procp: 0x3684b4f40b8
    p_as: 0x3684c1e7c18  size: 2015232  rss: 507904
    hat: 0x3684aa2d848  cnum: 0x9e  cpusran: 1
    t_stk: 0x2a102b2baf0  sp: 0x2a102b2ab31  t_stkbase:
    0x2a102b28000
    t_pri: 59(TS)  pctcpu: 0.000000
    t_lwp: 0x3684d855510  machpcb: 0x2a102b2baf0
    mstate: LMS_SLEEP  ms_prev: LMS_SYSTEM
    ms_state_start: 1 days 4 hours 14 minutes 28.589052000 seconds
    earlier
    ms_start: 1 days 4 hours 14 minutes 29.754985300 seconds earlier
    psrset: 0  last CPU: 1
    idle: 10165765 ticks (1 days 4 hours 14 minutes 17.65 seconds)
    start: Mon Nov 12 18:43:25 2007
    age: 101654 seconds (1 days 4 hours 14 minutes 14 seconds)
    syscall: #225 open64(, 0xffbfe650) (sysent:
    kazndrv:nct_open6432+0x0)
    tstate: TS_SLEEP - awaiting an event
    tflg:   T_DFLTSTK - stack is default size
    tpflg:  TP_TWAIT - wait to be freed by lwp_wait
          TP_MSACCT - collect micro-state accounting information
    tsched: TS_LOAD - thread is in memory
          TS_DONT_SWAP - thread/LWP should not be swapped
    pflag:  SLOAD - in core
          SMSACCT - process is keeping micro-state accounting
    
    pc:      0x1084e98      genunix:cv_wait+0x3c:   call
    unix:swtch
    
    genunix:cv_wait+0x3c(0x781c18aa, 0x781c1898, 0x45, 0x1000,
    0x22000000,0x0)
    kazndrv:kenv_pathMemGet+0x190(0x2a102b2ba98, 0xa, 0x2a102b2ba60,
    0x0,0x3684b4e5298, 0x0)
    kazndrv:kenv_procGetPathMem+0x24c(0x2a102b2ba98, ,
    0x2a102b2ba60,0x2a102b2b75c, 0x81010100, 0xff00)
    kazndrv:nct_copyPath+0x4c(, 0x2, 0x2a102b2ba58, 0x2a102b2b818,
    0xa,0x0)
    kazndrv:nct_processPath+0x188(0x47f61?, 0x2a102b2ba58, 0x28,
    0x2,0x3684a4c4610, 0x8)
    kazndrv:nct_openCommon+0xf4(0x47f61, 0x2a102b2ba58, ,
    0x3684a4c4610,0x2a102b2baec, 0x8)
    kazndrv:c_open+0x168(0x47f61, 0x301, 0x2a102b2ba58,
    0x3684a4c4610,0x2a102b2baec, 0x0)
    kazndrv:nct_open6432+0xa0(, 0x301, 0x1b6, 0x0, 0x0, 0x0)
    unix:syscall_trap32+0xa8()
    -- switch to user thread's user stack --
    
    !!  cpuhog; interesting name
    ========================
    

Local fix

  • No Workaround
    

Problem summary

  • After a period of time a system can hang. Heavily loaded
    systems with many CPUs are most vulnerable. On analysis of
    a forced dump from the hung system, many processes will be
    blocked in the TAMOS kazndrv kernel module. The will be
    waiting for a free shared memory slot used to communicate
    with pdosd. They will be waiting on the freeCond
    kenv_memState condition variable and will be blocked even
    though there is a single free shared memory slot.
    

Problem conclusion

  • | fixpack | 6.0.0-TIV-PDO-FP0012
    | fixpack | 5.1.0-TIV-PDO-FP0031
    

Temporary fix

Comments

APAR Information

  • APAR number

    IZ09423

  • Reported component name

    ACCESS MGR OS

  • Reported component ID

    5698PDO00

  • Reported release

    510

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2007-11-27

  • Closed date

    2007-12-31

  • Last modified date

    2008-07-08

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    ACCESS MGR OS

  • Fixed component ID

    5698PDO00

Applicable component levels

  • R510 PSY

       UP

  • R600 PSY

       UP

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSTFW4","label":"Tivoli Access Manager for Operating Systems"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"510","Line of Business":{"code":"LOB24","label":"Security Software"}}]

Document Information

Modified date:
13 November 2021