Abstract for Problem Management
Summary of changes for z/OS Problem Management
Problem management overview
Introduction
Overview of problem resolution
Steps for diagnosing problems on z/OS
Gathering diagnosis data
Problem categories
Searching problem reporting databases
Extracting problem symptoms and search arguments
Formats for symptoms
Searching for a known problem
Steps for searching problem reporting databases
Determining the level of z/OS
Common tools for problem determination
Messages
BPXMTEXT for z/OS UNIX reason codes
IPCS
Logs
Traces
Dumps
IBM Omegamon for z/OS Management Console
Sending problem documentation to IBM
IBM documentation
Best practices for large stand-alone dump
Using AutoIPL for stand-alone dumps
Planning a multivolume stand-alone dump data set
Creating the multivolume SADUMP
Defining a dump directory for large stand-alone and SVC dumps
Preparing the dump for further processing with IPCS COPYDUMP
Compressing data for faster transmission and analysis
Transmitting dump data to IBM
Setting up remote access
Testing your stand-alone dump operations
Automating the SADMP process
Sample JCL for post-processing
IBM System Test example
Runtime Diagnostics
Runtime Diagnostics
How Runtime Diagnostics works
Enabling Runtime Diagnostics
Running Runtime Diagnostics with mixed releases of z/OS
Reports from Runtime Diagnostics
Runtime Diagnostics symptoms
Runtime Diagnostics messages
Understanding the messages Runtime Diagnostics issues
Test messages ignored by Runtime Diagnostics
Runtime Diagnostics DEBUG options
Messages that Runtime Diagnostics analyzes
Using OPERLOG
Determining hardcopy medium settings
Setting up OPERLOG
Steps for setting up OPERLOG
Predictive Failure Analysis
Predictive Failure Analysis overview and installation
Avoiding soft failures
Overview of Predictive Failure Analysis
How PFA works with a typical remote check
How PFA interacts with IBM Health Checker for z/OS
How PFA invokes Runtime Diagnostics
Migration considerations for PFA
Using the migrate or new parameters when running AIRSHREP.sh
How PFA uses the ini file
Installing PFA
Steps for installing PFA
Installing PFA in a z/OS UNIX shared file system environment
Updating the Java path
Managing PFA checks
Understanding how to modify PFA checks
MODIFY PFA, DISPLAY
MODIFY PFA, UPDATE
Using and configuring supervised learning
Predictive Failure Analysis checks
PFA_COMMON_STORAGE_USAGE
PFA_ENQUEUE_REQUEST_RATE
PFA_JES_SPOOL_USAGE
PFA_LOGREC_ARRIVAL_RATE
PFA_MESSAGE_ARRIVAL_RATE
PFA_SMF_ARRIVAL_RATE
Diagnosing by problem type
Diagnosing an abend
Overview of an abend
Steps for diagnosing an abend
Obtaining the abend and reason code
Steps for obtaining the abend code
Identifying the module and component
Steps for identifying the module and component
Searching the problem reporting databases
Steps for searching the problem reporting databases
Gathering additional problem data for abends
Steps for gathering additional data for abends
Steps for gathering trace data for abends
Steps for collecting additional messages and logrec for abends
Steps for obtaining a dump for the error
Diagnosing a system hang or wait state
Overview of a hang or wait
Steps for diagnosing a system hang
Collecting the problem description
Steps for collecting the problem description
Diagnosing a hang or wait during IPL
Steps for diagnosing a hang or wait during IPL
Diagnosing an enabled wait state
Steps for diagnosing an enabled wait state
Diagnosing a coded disabled wait state
Steps for diagnosing a coded disabled wait state
Diagnosing a system partitioned from a sysplex because of status update missing
Steps for diagnosing a system partitioned because of status update missing
Searching the problem reporting databases
Steps for searching the problem reporting databases
Gathering additional data for hangs and waits
Steps for gathering messages and logrec for hangs
Diagnosing a job or subsystem hang
Overview of a hang or wait
Steps for diagnosing a job or subsystem hang
Gathering additional data for a job or subsystem hang
Step for gathering additional data
Determining the status of a hung job or subsystem
Steps for determining the status of a hung job or subsystem
Determining if a job is waiting for resources
Steps for determining if a job is waiting for resources
Determining address space dispatchability
Steps for examining address space dispatchability
Examining the SRB status
Steps for examining the SRB status
Examining the TCB status
Steps for examining the TCB status
Examining why a job is not running
Steps for examining why a job is not running
Diagnosing a loop
Overview of a loop
Steps for diagnosing a loop
Gathering additional data for a loop
Steps for gathering loop data
Analyzing the dump to determine the type of loop
Step for analyzing the dump for loop type
Diagnosing a disabled loop
Steps for diagnosing a disabled loop
Diagnosing an enabled loop
Steps for diagnosing an enabled loop
Diagnosing an excessive spin (spin loop)
Steps for diagnosing an excessive spin
Analyzing a logrec error record
Steps for analyzing a logrec error record
Searching the problem reporting databases
Steps for searching the problem reporting databases
Diagnosing an output problem
Overview of analyzing output problems
Steps for diagnosing output problems
Collecting problem data for an output problem
Step for collecting problem data
Analyzing data set allocation for an output problem
Steps for analyzing data set allocation
Analyzing the inputs and outputs
Steps for analyzing the inputs and outputs
Analyzing installation exits for an output problem
Steps for analyzing installation exits
Identifying the program or component
Steps for identifying the program or component
Searching the problem reporting databases for an output problem
Step for searching the problem reporting database
Gathering additional data for output problems
Steps for gathering additional information for output problems
Messages and logrec for output problems
Determine path for output problems
Teleprocessing for output problems
Reporting output problems to IBM
Diagnosing a performance problem
Overview of a performance problem
Steps for diagnosing a performance problem
Collecting data using commands
Steps for collecting data using DISPLAY
Steps for using JES2 commands to collect data
Checking for resource contention or loop
Steps for checking resource contention
Searching the problem reporting database
Steps for searching the problem reporting databases
Gathering additional data for performance problems
Steps for gathering additional information for performance problems
Analyzing a dump for performance problems
Steps for collecting and analyzing a dump for performance problems
Reporting performance problems to IBM
Diagnosing component-specific problems
Catalog component operational problem determination
Catalog component-specific problems and recovery
Hang in the Catalog address space or in the user address waiting on a request to the Catalog address space
Damaged or broken catalogs
Slow performance in various address spaces due to requests to the catalog address space taking excessive time
PDSE operational problem determination
PDSE specific problems
ABEND0F4 failures
MSGIGW038A possible PDSE problems
PDSE data set corruption
Failure of the SMSPDSE or SMSPDSE1 address space
RRS operational problem determination
Basic RRS problem determination functions
Collecting documentation for RRS
Dumping RRS information
Important RRS CTRACE information
RRS recovery options
RRS warm start
RRS cold start
RRS component-specific problems and recovery
RRS resource contention
Symptoms
How to investigate
Recovery actions
Actions to avoid recurrence
RRS suspended, waiting for signal from system logger
Symptoms
How to investigate
Recovery actions
RRS log stream gap condition
Symptoms
How to investigate
Recovery actions
RRS log stream data loss condition
Symptoms
How to investigate
Recovery actions
Actions to avoid recurrence
RRS high processor usage
Symptoms
Recovery actions
RRS address space hang
Symptoms
How to investigate
Recovery actions
RRS high storage usage
How to investigate
Recovery actions
Actions to avoid recurrence
Resource manager is unable to start with RRS
Symptoms
How to investigate
Recovery actions
Resource manager termination delay
Symptoms
How to investigate
Recovery actions
Actions to avoid recurrence
RRS transaction hang
Symptoms
How to investigate
Recovery actions
RRS severe error on RRS RM.DATA log stream, message ATR250E
Symptoms
How to investigate
Recovery actions
Actions to avoid recurrence
Resolving RRS problems in a sysplex cascaded transaction environment
Collecting documentation for a sysplex cascaded transaction environment
Sysplex cascaded transaction hang
Symptoms
How to investigate
Recovery actions
Sysplex cascaded transaction hang messages ATR246I and ATR247E
Symptoms
How to investigate
Recovery actions
Actions to avoid recurrence
System Data Mover (SDM) operational problem determination
SDM specific problems
ANTP0095I Unable to determine PPRC paths
ANTX5104E RC=0901 (XRC)
ANTX5104E RC=0647 REASON=0053 (XRC)
ANTX5104E RC=0647 REASON=0002 (XRC)
ANTAS00* ASIDs consuming excessive storage below 2GB
Converting to IR, RC=1017
Microcode issue impacting concurrent copy
VSAM component operational problem determination
VSAM specific problems
VSAM Index Trap
Hang in VSAM record management code
Loop in VSAM record management code
Unexpected return codes from VSAM record management
Issues opening, closing, extending VSAM data sets
VSAM record-level sharing (RLS) operational problem determination
VSAM record-level sharing (RLS) specific problems
ABEND0F4 failures
HANG/WAIT in RLS/TVS
SMSVSAM will not start up
Share Control Datasets not specified
Diagnosis reference material
Diagnosis information for z/OS base elements and features
Reporting problems to IBM
Software support service checklist
Automatic problem reporting
Invoking IPCS as a background job
Step for invoking IPCS as a background job
Problem diagnostic worksheet