Fixes are available
APAR status
Closed as program error.
Error description
AMOS 5.1.0.26 with hot fix IY95481_FP26 on Solaris 5.9 hang. The server was hanging and was not able to login to the machine at all. Crash dump performed. Solaris server: SunOS nus637 5.9 Generic_118558-26 sun4u sparc SUNW,Sun-Fire-V440 Analysis by Sun: Looking at the crashdump, many processes (most of them are sendmail and mail.local) looks hanging up in kazndrv driver. Crash Dump Details: ================== SolarisCAT(nus637-vmcore/9U)> proc tree |grep sendmail | wc 1155 4620 48064 SolarisCAT(nus637-vmcore/9U)> proc tree |grep mail.local | wc 973 2919 29124 !! But hundreds of mail processes exist. ==== user (LWP_SYS) thread: 0x36849112800 PID: 3183 ==== cmd: mail.local -l t_wchan: 0x781c18aa sobj: condition var (fromkazndrv:kenv_pathMemGet+0x190) t_procp: 0x3684c39e048 p_as: 0x3684b5cee50 size: 3088384 rss: 1810432 hat: 0x36851547b48 cnum: 0x1e99 cpusran: 0,1,2,3 t_stk: 0x2a101ca5af0 sp: 0x2a101ca4b71 t_stkbase: 0x2a101ca2000 t_pri: 59(TS) pctcpu: 0.000000 t_lwp: 0x36849110038 machpcb: 0x2a101ca5af0 mstate: LMS_SLEEP ms_prev: LMS_SYSTEM ms_state_start: 1 days 4 hours 14 minutes 29.717327700 seconds earlier ms_start: 1 days 4 hours 16 minutes 15.774093800 seconds earlier psrset: 0 last CPU: 3 idle: 10165761 ticks (1 days 4 hours 14 minutes 17.61 seconds) start: Mon Nov 12 18:41:39 2007 age: 101760 seconds (1 days 4 hours 16 minutes) syscall: #9 link(, 0xffbfb860) (sysent: kazndrv:nct_link32+0x0) tstate: TS_SLEEP - awaiting an event tflg: T_DFLTSTK - stack is default size tpflg: TP_TWAIT - wait to be freed by lwp_wait TP_MSACCT - collect micro-state accounting information tsched: TS_LOAD - thread is in memory TS_DONT_SWAP - thread/LWP should not be swapped pflag: SLOAD - in core SMSACCT - process is keeping micro-state accounting NOCD - new creds from VSxID, do not coredump pc: 0x1084e98 genunix:cv_wait+0x3c: call unix:swtch genunix:cv_wait+0x3c(0x781c18aa, 0x781c1898, 0x0, 0x3, 0x100c824, 0x0) kazndrv:kenv_pathMemGet+0x190(0x2a101ca5a08, 0x15, 0x2a101ca59d0, 0x0,0x3684b15ca60, 0xff39cc00) kazndrv:kenv_procGetPathMem+0x24c(0x2a101ca5a08, , 0x2a101ca59d0,0x2a101ca579c, 0x81010100, 0xff0000) kazndrv:nct_copyPath+0x4c(, 0x5, 0x2a101ca59c8, 0x0, 0x15, 0x0) kazndrv:nct_addAuxPath+0x168(0xffbfbf28, 0x2a101ca59c8, 0x0, 0x5,0x3684a4b5ad0, 0x0) kazndrv:nct_linkCommon+0xb4(0xffbfbf28, 0xff354780, 0x2a101ca59c8,0x2a101ca5a58, 0x3684a4b5ad0, 0x2a101ca5aec) kazndrv:nct_link32+0x104(0xffbfbf28?, 0xff354780, 0x0, 0xff13c000, 0x0,0x0) unix:syscall_trap32+0xa8() -- switch to user thread's user stack -- !! The longet waiting mail.local has been stuck for over 1 day. 1007 threads: 0x3000040b280 0x36859ba5280 0x36854efca80 0x3684d8a2d20... genunix:cv_wait+0x3c kazndrv:kenv_pathMemGet+0x190 kazndrv:kenv_procGetPathMem+0x24c kazndrv:nct_copyPath+0x4c kazndrv:nct_processPath+0x188 kazndrv:nct_openCommon+0xf4 kazndrv:c_open+0x168 kazndrv:nct_open32+0xa0 unix:syscall_trap32+0xa8 227 threads: 0x36850a577e0 0x36859ba4800 0x3685357ea80 0x368555f37c0... genunix:cv_wait+0x3c kazndrv:kenv_pathMemCheckWaitIfUse2+0x64 kazndrv:kenv_pathMemGet+0x270 kazndrv:kenv_procGetPathMem+0x24c kazndrv:nct_copyPath+0x4c kazndrv:nct_processPath+0x188 kazndrv:nct_linkCommon+0x70 kazndrv:nct_link32+0x104 unix:syscall_trap32+0xa8 !! Over 1000 threads are hanging up in kazndrv driver. ==== user (LWP_SYS) thread: 0x3684a3aa020 PID: 4852 ==== cmd: /usr/bin/ksh /usr/local/soe/bin/cpuhog t_wchan: 0x781c18aa sobj: condition var (from kazndrv:kenv_pathMemGet+0x190) t_procp: 0x3684b4f40b8 p_as: 0x3684c1e7c18 size: 2015232 rss: 507904 hat: 0x3684aa2d848 cnum: 0x9e cpusran: 1 t_stk: 0x2a102b2baf0 sp: 0x2a102b2ab31 t_stkbase: 0x2a102b28000 t_pri: 59(TS) pctcpu: 0.000000 t_lwp: 0x3684d855510 machpcb: 0x2a102b2baf0 mstate: LMS_SLEEP ms_prev: LMS_SYSTEM ms_state_start: 1 days 4 hours 14 minutes 28.589052000 seconds earlier ms_start: 1 days 4 hours 14 minutes 29.754985300 seconds earlier psrset: 0 last CPU: 1 idle: 10165765 ticks (1 days 4 hours 14 minutes 17.65 seconds) start: Mon Nov 12 18:43:25 2007 age: 101654 seconds (1 days 4 hours 14 minutes 14 seconds) syscall: #225 open64(, 0xffbfe650) (sysent: kazndrv:nct_open6432+0x0) tstate: TS_SLEEP - awaiting an event tflg: T_DFLTSTK - stack is default size tpflg: TP_TWAIT - wait to be freed by lwp_wait TP_MSACCT - collect micro-state accounting information tsched: TS_LOAD - thread is in memory TS_DONT_SWAP - thread/LWP should not be swapped pflag: SLOAD - in core SMSACCT - process is keeping micro-state accounting pc: 0x1084e98 genunix:cv_wait+0x3c: call unix:swtch genunix:cv_wait+0x3c(0x781c18aa, 0x781c1898, 0x45, 0x1000, 0x22000000,0x0) kazndrv:kenv_pathMemGet+0x190(0x2a102b2ba98, 0xa, 0x2a102b2ba60, 0x0,0x3684b4e5298, 0x0) kazndrv:kenv_procGetPathMem+0x24c(0x2a102b2ba98, , 0x2a102b2ba60,0x2a102b2b75c, 0x81010100, 0xff00) kazndrv:nct_copyPath+0x4c(, 0x2, 0x2a102b2ba58, 0x2a102b2b818, 0xa,0x0) kazndrv:nct_processPath+0x188(0x47f61?, 0x2a102b2ba58, 0x28, 0x2,0x3684a4c4610, 0x8) kazndrv:nct_openCommon+0xf4(0x47f61, 0x2a102b2ba58, , 0x3684a4c4610,0x2a102b2baec, 0x8) kazndrv:c_open+0x168(0x47f61, 0x301, 0x2a102b2ba58, 0x3684a4c4610,0x2a102b2baec, 0x0) kazndrv:nct_open6432+0xa0(, 0x301, 0x1b6, 0x0, 0x0, 0x0) unix:syscall_trap32+0xa8() -- switch to user thread's user stack -- !! cpuhog; interesting name ========================
Local fix
No Workaround
Problem summary
After a period of time a system can hang. Heavily loaded systems with many CPUs are most vulnerable. On analysis of a forced dump from the hung system, many processes will be blocked in the TAMOS kazndrv kernel module. The will be waiting for a free shared memory slot used to communicate with pdosd. They will be waiting on the freeCond kenv_memState condition variable and will be blocked even though there is a single free shared memory slot.
Problem conclusion
| fixpack | 6.0.0-TIV-PDO-FP0012 | fixpack | 5.1.0-TIV-PDO-FP0031
Temporary fix
Comments
APAR Information
APAR number
IZ09423
Reported component name
ACCESS MGR OS
Reported component ID
5698PDO00
Reported release
510
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2007-11-27
Closed date
2007-12-31
Last modified date
2008-07-08
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
ACCESS MGR OS
Fixed component ID
5698PDO00
Applicable component levels
R510 PSY
UP
R600 PSY
UP
[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSTFW4","label":"Tivoli Access Manager for Operating Systems"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"510","Line of Business":{"code":"LOB24","label":"Security Software"}}]
Document Information
Modified date:
13 November 2021