IBM Support

Storage growth caused by message build up in NetView

Troubleshooting


Problem

Seeing messages about running low or out of storage in the NetView netlog, and want to try to find the cause without recycling or taking a dump. This technote is for storage issues caused by messages building up for an Operid or Autotask.

Symptom

Any of the following messages indicate NetView is running low on storage due to a Message Buildup for an Operid or Autotask.

BNH162I - THE domainid BELOW 16M STORAGE is nn% USED....

BNH163I - THE domainid ABOVE 16M STORAGE is nn% USED....

DSI374A - THRESHOLD REACHED, number BUFFERS ON MESSAGE

QUEUE OF task

DSI124I - STORAGE REQUEST FAILED FOR NCCF domainid

ABEND878 - Indicating that NetView is out of storage.

Cause

There are many reasons for a storage problem in NetView. This DCF focuses on a build up of messages on the Public or Held queues as the cause of a storage growth.

The messages listed above along with an Abend 878 indicate a problem with running out of storage. This DCF will determine if the storage shortage is caused by a message build up.

There are two places that messages can build up.
1) On the Public queue as shown in Taskutil. This is typically caused by a task that is looping or hung and is no longer processing messages. New messages continue to arrive but are not handled, so they buffer up and wait.

2) The Held queue as shown in the List Autotask/Operid command. This storage growth is caused when the message has been processed but not deleted via DOM (Delete Only Message). We often see this with OEM or User messages that are ACTION messages (via descriptor codes) but there is no DOM to delete it.

Diagnosing The Problem

There are several tools to use and identify this condition.

1) TASKUTIL

Taskutil will show messages building up on the Public queue - that is messages waiting to be processed by an Operid or Autotask. The places to look are in the MESSAGEQ column, the STORAGE-K column and the CMD column. You will see a increase in counts/storage for the same Operids/Autotasks over time. It is a good practice to have TASKUTIL running on a 30 minute timer so you can look back in the netlog for trends or to see when a storage growth started. Here is an example of TASKUTIL showing a high MESSAGEQ and STORAGE-K. You may also see the same command or exec listed in the CMD column indicating a hung task or exec.

This is a good indication of a message build up.


DWO022I
TASKNAME TYPE DPR CPU-TIME N-CPU% S-CPU% MESSAGEQ STORAGE-K    CMD
-------- ---- --- -------- ------ ------ -------- --------- ------
SCHED0   AUTO 250    66.55  21.07   0.00    92227     32427 PIPE
SCHED1   AUTO 250    56.88  19.75   0.00    87425     29210 USREXEC

2) RESOURCE

The Resource command can show storage growing over time. Here is how to read the output:



 * NTVDD    RESOURCE
 ' NTVDD
 DSI386I NETVIEW RESOURCE UTILIZATION 15:38:48
         TOTAL CPU %                  =         2.44
         T610NVSA CPU %               =         0.00
         T610NVSA CPU TIME USED       =       198.35 SEC.
         REAL STORAGE IN USE          =        80644K
         PRIVATE ALLOCATED < 16M      =         1348K
         PRIVATE ALLOCATED > 16M      =        65820K
         PRIVATE REGION    < 16M      =         7144K
         PRIVATE REGION    > 16M      =        98304K
 END OF DISPLAY

Monitor the ALLOCATED numbers - this is how much NetView is currently using below the line (<16M) and above the line (>16M). You will see these counts grow over time if storage is growing. Compare these counts to the REGION numbers - this is how much NetView has allocated to use. As the ALLOCATED counts get closer to the REGION counts, you have less and less storage available. Use the RESOURCE command to find that there is an overall storage growth, and then the TASKUTIL and/or the CHKHELD Pipe shown below to try and isolate the storage growth to a specific or group of operid/autotasks. If TASKUTIL or the CHKHELD Pipe does not show a Operator or Autotask with a message buildup then something else is using up storage. A dump of NetView would be required to identify what is using up storage.

This is also a good practice to run the RESOURCE command every 30 minutes so data is available in the netlog when investigating trends.

3) CHKHELD Pipe

If there are messages building up on the Held Message queues, TASKUTIL
will not show this. Held message counts are shown using the
LIST OPERID/AUTOTASK command on the Messages line.

LIST JOHNA                                            
STATION: JOHNA      TERM: NTDDL701                    
HCOPY: NOT ACTIVE   PROFILE: DSIPROFB                  
STATUS: ACTIVE      IDLE MINUTES: 0                    
ATTENDED: YES       CURRENT COMMAND: LIST              
AUTHRCVR: YES       CONTROL: GLOBAL                    
NGMFADMN: YES       DEFAULT MVS CONSOLE NAME: AJOHNA10
NGMFVSPN: NNNN (NO SPAN CHECKING ON NMC VIEWS)        
NGMFCMDS: YES       AUTOTASK: NO                      
IP ADDRESS:  N/A                                      
OP CLASS LIST: NONE                                    
DOMAIN LIST: NONE                                      
ACTIVE SPAN LIST: NONE                                
Task Serial:  417977  REXX Environments: 2 (1%)        
Messages  Pending:  0  Held: 4256                        
WLM Service Class:  Not Available                      
END OF STATUS DISPLAY                                  
-------------------------------------------------------


Here is a NetView Pipe to run and check the Held Message counts for growth. Set the PIPE up in a REXX to run in NetView.

/*************************************************************/
/* Rexx to determine held message counts for operators       */
/*                                                           */
/*    Issue LIST STATUS=TASKS to get a list of active tasks  */
/*    Issue LIST OPERID to get Held and Active messages      */
/*    Write output to the screen - best if used in a window  */
/*************************************************************/
address NETVASIS
 
'Pipe (Name CHKHELD debug 2)',
'     | NETV LIST STATUS=OPS',
'     | DROP LAST 1',
'     | EDIT WORD 2 1',
'     | EDIT /LIST / 1',
'            1.* 6',
'     | NETV ',
'     | LOCATE 1.8 /STATION:/ 1.S /Messages  Pending:/' ,
'     | CHOP AFTER STRING /TERM:/' ,
'     | JOINCONT /TERM:/' ,
'     | EDIT WORD 2.6' ,
'     | CONS CLEAR ONLY'

EXIT

The output will show the all operid/autotask counts for Messages Pending and Held. This can be used to spot growing message counts or trends in held message growth. Here is an example of the CHKHELD output:
JOHNA Messages Pending: 0 Held: 4256   
AUTOAON Messages Pending: 0 Held: 0  
AUTOEDAT Messages Pending: 0 Held: 0
AUTOPSAV Messages Pending: 0 Held: 0



4) Messages identifying a Storage Problem

The following messages can identify a storage problem caused by a message build up for a Operid/Autotask. If you see any of these messages, use the techniques described in 1-3 above to investigate.

BNH162I: Issued when NetView has used 80% of below the line storage

BNH163I: Issued when NetView has used 80% of above the line storage

DSI124I: Issued when a NetView storage request failed due to not
enough available storage.

DSI374A: Issued to report a storage growth on the Public Queues (shown
in TASKUTIL) depending on thresholds set in NetView constants
module DSICTMOD.

878 Abend: Indicating that NetView has run out of storage, or has requested
storage and was not able to get it.

Resolving The Problem

Remember, the goal if this DCF is to identify storage issues without a dump. The best we can do is identify a Operid or Autotask experiencing a message build up. This will not identify the actual message causing the backup. A dump is the only way to see what the actual messages are. All this DCF can do is identify a message buildup for a operid/autotask.

1) Public Storage Growth

Public storage growth is what TASKUTIL shows. The MESSAGEQ for an Operid/Autotask shows steady growth over time as messages build up in the queue waiting to be processed. The typical cause of this problem is that either the Operid/Autotask is hung or is looping, and is not processing any messages. Watch for DSI374A messages.

There are two things to try in this case:


  • RESET: The RESET command is the first thing to try. RESET will attempt to end the command or command procedure that is running. The command or exec running is shown in the Taskutil CMD column. It would be a good idea to review the Command or Exec identified to see if it will cause the problem again.

    There is a RESET NORMAL (the default) and RESET IMMED. Check HELP RESET for more information. Excmd the RESET to the Operator showing the buildup:

    EXCMD AUTOTASK/OPERID RESET

    EXCMD AUTOTASK/OPERID RESET IMMED

    LOGOFF: If RESET does not resolve the issue, the next course of action is to log the Operid/Autotask off. An Autotask can be logged off using EXCMD (EXCMD autotask,LOGOFF).

    If the Operid/Autotask does not accept the LOGOFF, you will have to recycle NetView to free up the storage and clear the Message buffer. Note that all of the messages in the buffer waiting to be processed will be lost and will not be run through NetView Message Automation.

    It would be a good idea to take a dump of NetView before logoff or recycling NetView to use for problem analysis. The dump will provide a picture of the problem.

 

2) NetView Storage Growth

If the Resource command shows storage growth for ALLOCATED above and below the 16M line, it would be a good time to run a TASKUTIL and look for high MESSAGEQ counts as well as running the CHKHELD pipe to look for a message build up in Held storage.

If TASKUTIL and the CHKHELD pipe don't show any message growth, then the problem is else where. A dump of NetView is needed to identify the cause of the storage growth.

3) HELD Storage Growth

There is no equivalent of a DSI374A message to identify a storage growth in a Held message queue. One indication of this scenario is TASKUTIL showing storage (STORAGE-K) growing but a MESSAGEQ count of zero. The CHKHELD Pipe does show messages building for a Operid/Autotask.

There are several things to try to free up Held storage.

DISPLAY THE MESSAGES: If the Operator with the message backup is identified using the CHKHELD pipe, it may be possible to display the messages on the HELD queue. You can excmd the following to the Operid/Autotask with the message backup to display the message:


EXCMD operid/autotask PIPE HELDMSG | CONSOLE
and it may show the messages.
You can also try this pipe to delete the help messages:
EXCMD operid/autotask PIPE HELDMSG | CONSOLE DELETE

HOLD(DISABLE):

There is a Default that determines if held messages on the Held queue are kept or deleted automatically. The Default is named: HOLD

HOLD DISABLE: Indicates that the HOLD action is not taken from the NetView automation table. In addition, action messages are not queued for rerouting to the authorized receiver upon logoff unless OVERRIDE and automation table settings indicate otherwise. In other words, HELD messages are automatically deleted.

HOLD ENABLE: Indicates that the HOLD action is taken from the NetView automation table. Queued action messages are rerouted to the authorized receiver upon logoff unless automation table actions or OVERRIDE settings indicate otherwise. This HOLD action will cause HELD messages to be queued up and wait to be deleted, and can be the cause of storage growth.

Issue NetView command: LIST DEFAULTS (you may want to put it in a window for easier viewing) to check what HOLD is set to.


* NTVDD    LIST DEFAULTS      
' NTVDD                      
DWO654I DISPLAY    DEFAULTS  
              HOLD: ENABLE    

It is also possible to have a override for an Operid or Autotask that can change the HOLD setting. Issue NetView command "LIST OVERRIDE=Operid/Autotask" to see this. In this example, the default for AUTO1 is ENABLE, but it has been overridden to DISABLE. In this case, held messages will be automatically deleted and not queued up for delete processing.


* NTVDD    LIST OVERRIDE=AUTO1            
' NTVDD                                  
DWO653I DISPLAY    DEFAULTS OVERRIDES    
              HOLD: ENABLE    DISABLE    

Issue NetView command "EXCMD OperID/Autotask,OVERRIDE HOLD=DISABLE" to override the existing HOLD default value to DISABLE. This will automatically delete any messages being sent to the Operid/Autotask. This will NOT delete the existing messages on the queue - just the new messages arriving on the queue. The existing messages on the queue can be deleted using EXCMD operid/autotask PIPE HELDMSG | CONSOLE DELETE as explained above. This will free up storage the current messages are using and prevent any future messages from building up.

Note: When an operator logs off and has actions messages still on the held queue these messages get rerouted to an authorized receiver.  If there is no auth receiver logged on then the messages go to the PPT task and the PPT task writes all its messages to the syslog.
This symptom may show as the Held messages being written to Syslog when NetView is shutdown due to operators/autotasks being logged off. 

4) Last word.

If all of the above is investigated and no messages buildup is found on the Public or HEL queues, then something else is using up storage. The only way to identify what is using up storage is to take a dump.

The NetView TroubleShooting Guide is a good place to find information on out-of-storage conditions in section 1.3.3.1.2 Out-of-storage condition leading to an ABEND

[{"Product":{"code":"SSZJDU","label":"IBM Z NetView"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"Not Applicable","Platform":[{"code":"PF035","label":"z\/OS"}],"Version":"All versions;All Versions","Edition":"","Line of Business":{"code":"LOB35","label":"Mainframe SW"}}]

Document Information

Modified date:
06 December 2022

UID

swg21653573