II02889: ABEND002 RC04 WHEN WRITING SMF RECORDS TO AN OUTPUT FILE.

Subscribe

You can track all active APARs for this component.

APAR status

  • Closed as canceled.

Error description

  • SMFINFO   HOWTO
    An ABEND002 rc04 (rc4) writing SMF records is usually seen
    under the following conditions:
      While running IFASMFDP to dump the SMF data sets, an abendB37
    or abendD37 is received against an output dataset.  This
    condition is intercepted by IFASMFDP.  If there are other
    output datasets available, IFASMFDP will continue writing to
    them.  If the CLEAR or ALL option was specified and other
    output datasets were available, then IFASMFDP will clear the
    input SYS1.MAN dataset.  If there are no other output datasets,
    IFASMFDP will terminate processing.  In all cases, IFASMFDP
    will issue a message indicating a problem with the dataset and
    issue a non-zero return code.  The abendB37 (or abendD37)
    condition can result in a partial record being written to the
    end of the output dataset, which we will refer to as the DAILY
    TAPE. Later in the week, the user merges all of the DAILY TAPEs
    into a new WEEKLY TAPE.  The abend002 rc04 occurs when the user
    tries to run a utility such as IEBGENER against the WEEKLY
    TAPE, because that partial record is now somewhere in the
    middle of the WEEKLY TAPE.
         IFASMFDP is working as designed and this is considered to
    be an error in the user's procedure to dump the SMF datasets.
    .
    Suggestions:
    1. Check the Console Log from *ALL* runs of IFASMFDP to
       determine whether there was an abendx37.  If not, find the
       partial record and then queue to the component that builds
       the record.  The SMF SPL contains a description of each
       record, which tells what module the record is written by.
    2. When running IFASMFDP, always ensure that the output dataset
       is as large as the input dataset(s).
    3. If possible, edit the dataset to remove the partial record
       to avoid the abend002.
    4. When most programs encounter an abend, they allow the abend
       to terminate the step.  When IFASMFDP encounters an abend,
       its ESTAE gets control and -- by default -- retries, which
       means that no abend is surfaced, which means that the user
       is completely unaware that an abend was encountered.  This
       is controllable via a parameter in SMFPRMxx:
    
    DUMPABND     { (RETRY)  }
                 { (NORETRY)}
    
    Specifies whether the SMF dump program attempts to recover in
    the event an abend occurs.
         - RETRY specifies that the SMF dump program attempts to
           recover from abends and continue processing.
         - NORETRY specifies that the SMF dump program terminates
           when an abend occurs.
    
    Note: The SMF dump program will override this parameter and the
          ABEND parameter (specified on SMF dumps) if the input
          data set is to be dumped and cleared, and an ABEND occurs
          AFTER the input data set has been cleared. For this case,
          the SMF dump program will attempt to recover from the
          ABEND to prevent the output data set from being deleted
          and SMF data from being lost, when the SMF dump program
          abnormally ends. For more information about the SMF dump
          program, see z/OS MVS System Management Facilities (SMF).
    
    Default: RETRY
    
    The SMFPRMxx specification is overridable by a parameter in the
    IFASMFDP control statements:
    
    ABEND(RETRY|NORETRY)
    
    Specifies whether the SMF dump program attempts to recover from
    an abend (abnormal end of task). When specified, this option
    overrides the SMF parmlib option (DUMPABND).
         - If the RETRY parameter is issued, then the SMF dump
           program attempts to recover from the abend.
         - If the NORETRY parameter is issued, then the SMF dump
           program terminates after the abend has occurred. The SMF
           dump program overrides NORETRY when, while SMF dump is
           dumping and clearing the input data set, an ABEND occurs
           after the input data set has been cleared. In this case,
           the SMF dump program tries to recover from the ABEND to
           prevent the deletion of the output data set and the loss
           of SMF data when the SMF dump program abnormally ends.
    
    NOTE:  SMF Level 2 Support has a program called VBSFIX which
           can detect, report on, and delete "bad" (i.e., invalidly
           spanned) Variable Blocked Spanned (VBS) records.
           Please contact the Support Center for a copy of this
           program.
    
    The bad record can end up in the dataset as the result of
    a CANCEL of the dump job that is creating the dataset or
    some other interuption of the processing of the dump job.
    You can use the SMF Type 30 records to check for this case.
    If there is a Type 30, Subtype 1 record for a dump job, but no
    other records for the job, then that job was terminated; if
    there are Type 30, Subtype 4/5 records for the dump job then
    analyze the contents of the "Completion" Section to determine
    how the job ended.
    .
    In MVS/ESA Release 4.3.0 (HBB4430) the recovery processing in
    the dump utility (IFASMFDP) has been improved to better handle
    the ABENDx37 error situation.  The recovery processing will
    close and re-open the output dataset to reset the EOF mark on
    the dataset.  Also, the input dataset is NOT cleared and the
    return code from the program is set to '8'.
    .
    ADDITIONAL SYMPTOMS:
         msgIEC036I rc04
    
    Caution:
         Both the TSO TRANSMIT (XMIT) / RECEIVE commands and also
    the TERSE / UNTERSE (TRSMAIN) program will attempt to "correct"
    invalidly spanned VBS records, but they do it improperly.  The
    reason they do it improperly is:
         TSO XMIT does not save the spanning bits from VBS records,
    but rather, assumes that the file has correct spanning bits.
    When RECEIVE then rebuilds the actual records from the 80 byte
    records created by XMIT, it calculates the RLIs and sets the
    spanning bits based on the LRECL and BLKSIZE it's writing into.
    This is because RECEIVE can create a dataset with different DCB
    attributes from those of the original dataset.
         In order for XMIT/RECEIVE to maintain the exact same span-
    ning bits as existed in the original dataset, it would have to
    disallow changing LRECL and BLKSIZE on VBS datasets, as well as
    saving the actual spanning bits.  The former would not be
    desirable, and the latter would be a lot of work whose sole
    intent would be to allow a dataset with invalid spanning bits
    to retain those invalid bits.  This is neither desirable nor
    cost effective.  (The same is true for the TERSE / UNTERSE
    (TRSMAIN) program.
    

Local fix

  • Notes to Level 1:
    1. If a customer has this problem, please add them to the I/P
       page of this APAR.  Thanks.
    2. SMF Level 2 Support has a program called VBSFIX which can
       detect, report on, and delete "bad" (i.e., invalidly
       spanned) Variable Blocked Spanned (VBS) records.  Please
       contact SMF Level 2 for a copy of this program.
    

Problem summary

Problem conclusion

Temporary fix

  • .
    The following are instructions on how to recover from an
    abend002-4 when caused by a prior abendb37 or operator cancel.
    These instructions apply when the process is as follows:
      Each run of IFASMFDP dumps the SMF data to a cummulative disk
      file; disp=mod
    .
    Abend002-04 usually occurs as a result of previous abend which
    causes a partial record to be written out to the dataset.  When
    more records are mod'd to the end of the dataset, the partial
    record is then followed by valid records.  Since SMF records can
    span blocks, the segment descriptor word is used to determine if
    a record is complete; the 1st part of a segmented record, the
    middle part of a segmented record, or the end of a segmented
    record.  An abendb37 may cause only the first or first and
    middle parts of a SMF spanned record to be written out.  When
    another record is mod'd onto the partial record, the segment
    descriptor words may not match up.  This results in the
    abend002-4.
    .
    segment descriptor values:
        00 - complete record
        01 - first part of a segmented record
        10 - last part of a segmented record
        11 - middle of a segmented record
    .
    To locate the invalid record for repair you can use one of two
    methods:
    1) For a VSAM data set use IDCAMS to print the bad data set (see
       steps 1 and 2 in the procedure below).
    2) For a QSAM data set use the abend 002-4 dump from IFASMFDP to
       determine the last block written. At the time of the abend002
       IFASMFDP reg2 points to the DCB.  The following DCB fields
       will be of interest:
    DCB+x'5'  contains the full disk address of the record that was
              just read or written.  The address is in the form
              MBBCCHHR where CCHHR is the cylider and head.
    DCB+x'C'  is the number of blocks read so far. The block read
              may not be the one we failed on. The bad one my be the
              preceeding one.
    DCB+x'1C' is the pointer to the IOB
      IOB+x'20' is the last seek address.  This is most probably the
                address for the invalid record.
    .
    Since we mod the output from a new dump job to this, we know
    that either a complete, or the beginning of a record will
    follow.  We also need to know the SDW (segment descriptor word)
    of the preceeding record.  When we know the segment information
    of the preceeding and following record, we can zap the SDW of
    the partial record to make it compatible.
    .
    Procedure:
    1. Run IDCAMS print on the the data set that is incurring the
       error.  IDCAMS will fail (cond code=0012 usually).  Use the
       CHAR option on the PRINT statement to limit output.
    2. Scroll to the end of the IDCAMS print output and find the
       last record read and its RBA (relative byte address).
    3. Find the CI# of the last record read by calculating
          RBA (from step 2)
          ----------------------------  = CI#
          CISIZE (of dataset in error)
       Note: If the above calculation has a remainder then add 1 to
       the CI#
    4. Run IEHLIST with the FORMAT option against the data set in
       error and obtain the data set extent information.  Note that
       the dsname of the VSAM data set may be different from the
       VSAM cluster name.  IEHLIST requires the name that is used in
       the format 1 DSCB of the volume that the data set resides on.
    5. Calculate the number of CI's (control intervals) or physical
       records per cylinder:  Obtain cisize or physical recsize and
       use 3380/3390 Quick Reference Summary Tables.
       Example: A cisize of 4096 has 10 records per track
                                 ....15 tracks per cylinder
                                 ....150 CI's per cylinder
    6. Take the CI# calculated in step 3 and convert it to a CCHRR
       equivalent to determine what location to dump using AMASPZAP.
       Example: If CI# was 26682 then it would convert to
                26682
           CC = ---------------- = 177 (this is the relative cyl)
                150 CI's per cyl
    .
                177 cyl X 150 CI's per cyl = 26550
    .
                26682-26550 = 132
    .
                132
           HH = --------------- = 13 (this is the relative head #)
                10 CI's per trk
    .
            R = 132 - 130 = 2 (this is the relative record #)
    .
                                  CC     HH    R
           relative CCHHR        177     13    2
           data set extent
             from IEHLIST       1205      0    0
           ---------------------------------------
           real CCHHR           1382     13    2
    7. Convert the decimal CCHHR from step 6 to hex
       Example: decimal  1382 - 0013 - 02
                hex      0556 - 000d - 02
    8. Set up a AMASPZAP to dump CI's beginning at 1 record before
       the CCHHR in step 7 and dump at least 3 more CI's to look
       at what type of SMF records are being read:
       Example:
       //SDMUMP JOB
       //DUMP EXEC PGM=AMASPZAP
       //SYSPRINT DD SYSOUT=*
       //SYSLIB DD DSNAME=dataset name,DISP=SHR
       //SYSIN DD *
         ABSDUMPT 0566000D01 0566000D05
    9. Use the RDW from the SMF records to locate the record with
       the missing segment.  This can be done by matching up the
       segment descriptors. This byte is at offset 6 (beginning with
       0) into the record.
       Byte 6 in record at 0566000d01 is
         03, which maps to 0000 0011 - indicates the middle of a
                           --   segmented record
       Byte 6 in record at 0566000d02 is
         03, which maps to 0000 0011 - also maps to a middle
                           --
    We know the next record mod'd will be the beginning of a record,
    therefore, we need to zap the 6th byte in 0566000d02, to 02 so
    that it reflects the end of a segmented record.
    Example:
    //SDUMP JOB
    //DUMP EXEC PGM=AMASPZAP
    //SYSPRINT DD SYSOUT=*
    //SYSLIB DD DSNAME=datasetname,DISP=SHR
    //SYSIN DD *
      CCHHR 0566000d02
      VER 0000 1000,0000,0FFC,0300
      REP 0000 1000,0000,0FFC,0200
    

Comments

APAR Information

  • APAR number

    II02889

  • Reported component name

    V2 LIB INFO ITE

  • Reported component ID

    INFOV2LIB

  • Reported release

    001

  • Status

    CLOSED CAN

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    1987-04-27

  • Closed date

    1987-04-28

  • Last modified date

    2007-04-10

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

Applicable component levels



Rate this page:

(0 users)Average rating

Add comments

Document information


More support for:

z/OS family

Software version:

001

Operating system(s):

MVS, OS/390, z/OS

Reference #:

II02889

Modified date:

2007-04-10

Translate my page

Machine Translation

Content navigation