IBM Support

Common Causes of VIOS Update Failures Using Updateios Command

Troubleshooting


Problem

Failure updating VIO Server (VIOS) using updateios command.

Symptom

VIOS update failed or was interrupted (i.e. lost network connection while updateios command was running).

Cause

This document provides common causes and things to consider to resolve the problem.

Environment

VIOS 2.2.x and above.

Diagnosing The Problem

/home/padmin/install.log file is a key to understanding why the update failed.
Depending on your ioslevel, this file may be overwritten or appended each time updateios is ran.
Review the log with focus on the timestamp that coincides with when the updateios failure occurred. Pay close attention to the Installation Summary for FAILED results. Sometimes the log details may clearly show the cause of failure.

Resolving The Problem

Based on install.log details, determine if any of the following scenarios may be applicable to you.
1. The updateios process was interrupted before completing.
2. Hardware component (i.e. disk, adapter, etc) related to VIOS rootvg may be going bad.
3. Insufficient VIOS memory resources.
4. Updateios process hung.
5. Updateios may fail if an ifix exists on the VIOS.
6. Updateios fails for ios.cli.rte due to missing /home/padmin/config files.
7. Updateios fails for perfagent.tools due to missing /etc/perf directory/files.
8. Missing Requisites reported during a VIOS update.

If none of the above scenarios are applicable to you.

Scenario 1

The updateios process was interrupted before completing (i.e. network connection was lost before updateios completed).


    Recommendation

    Attempt to clean it up before retrying updateios again. As padmin, run:

    $ updateios -cleanup

    -cleanupSpecifies the cleanup flag to remove all incomplete pieces of the previous installation. Perform cleanup processing whenever any software product or update is after an interrupted installation or update is in a state of either applying or committing. You can run this flag manually, as needed.
Scenario 2

The following symptoms are commonly seen when a hardware component (i.e. disk, adapter, etc) related to VIOS rootvg may be going bad.


    a. If install.log shows I/O errors similar to the ones below, check for possible hardware failure.
      ...
      ar: There is an input or output error.                    
      ar: 0707-114 The fread system call failed.                        
      0503-037 inurest:  Failure on system call to execute command      
      OBJECT_MODE=32_64 /usr/ccs/bin/ar -or ./usr/lib/liblpmcommon.a.new
      ./usr/lib/inst_updt/liblpmcommon.a/shr.o.
      0503-464 installp:  The installation has FAILED for the "usr" part of the following filesets:  
      <fileset name>
      ...

    b. If install.log reports "allocp" errors, that may indicate the VIOS rootvg is mirrored and there may be an issue with one of the mirrored disks. The log may show something similar to the following:

      ...
      lquerypv: Warning, physical volume hdisk1 is excluded since it may be either missing or removed.
      0516-404 allocp: This system cannot fulfill the allocation request.
      There are not enough free partitions or not enough physical volumes to keep strictness and satisfy allocation requests.  The command should be retried with different allocation characteristics.

      lquerypv: Warning, physical volume hdisk1 is excluded since it may be either missing or removed.
      0516-404 allocp: This system cannot fulfill the allocation request.
      There are not enough free partitions or not enough physical volumes to keep strictness and satisfy allocation requests.  The command should be retried with different allocation characteristics.

      0503-008 installp:  There is not enough free disk space in filesystem /usr (3017866 more 512-byte blocks are required).
      An attempt to extend this filesystem was unsuccessful.
      Make more space available then retry this operation.
      ...

    c. Other symptoms that may indicate a potential hardware issue may include the updateios and/or other VIOS commands failing in padmin and/or oem_setup_env shell, i.e.

      $ snap
      ...
      /usr/sbin/snap[23]: 13697170 Bus error
      /usr/sbin/snap[32]: 13697172 Bus error
      /usr/sbin/snap[42]: 13697174 Bus error
      /usr/sbin/snap[52]: 13697176 Bus error
      /usr/sbin/snap[62]: 13697178 Bus error
      /usr/sbin/snap[72]: 13697180 Bus error
      /usr/sbin/snap[110]: 12189936 Bus error
      /usr/sbin/rsct/bin/ctsnap[2006]: 14680184 Bus error
      Bad read?
      ./etc/objrepos/lpp: I/O error
      Bad read?
      ./etc/objrepos/product: I/O error
      Bad read?
      ./usr/lib/objrepos/fix: I/O error
      Unable to open file: /home/ios/logs/ioscli_global.trace for append
      Error from cliCheckFile:-1
      ...

      # bosboot -a                                  


      /usr/bin/bosboot[24]: 14090370 Bus error(coredump)
      ...

    Recommendation

    Determine the disk and adapter used by VIOS rootvg. Then check VIOS errlog for disk, adapter, LVM, or file system errors related to VIOS rootvg or its underlying physical devices.

    To list all physical volume in VIOS rootvg, as padmin, run:


    $ lspv|grep rootvg
    hdisk0           00c1f1a077f16a01         rootvg           active
    OR
    $ lsvg -pv rootvg  
    rootvg:
    PV_NAME     PV STATE   TOTAL PPs   FREE PPs    FREE DISTRIBUTION
    hdisk0      active     546         328         109..15..13..82..109

    Ensure all disks show PV STATE "active". If there is more than one physical volume in the volume group and one shows in "Missing" or "Removed" state, correct the disk problem, and then try the update again.

    If the volume group is mirrored, you can temporarily break the mirror off the physical volume in Missing/Removed state and try the update again.

    To check if the volume group is mirrored, run:


    $ lsvg -lv rootvg  =>If # of PPs is double the # of LPs, then that logical volume is mirrored

      Note: If rootvg is mirrored and both mirrored disks show PV STATE active, check the number of FREE PPs in both disks. Sometimes the 0416-404 allocp and 0503-008 installp errors may be triggered if one of the mirrored disks has insufficient or no FREE PPs for the filesystem(s) to be increased during the updateios process on both mirror copies. In such case, you can break the mirror off the disk that may be almost full before re-attempting updateios.

    To unmirror the volume group, run:
    $ unmirrorios <hdisk#_in_missing_state>

    To determine the disk and adapter type, run:


    $ lsdev -type disk|grep hdisk0
    hdisk0           Available   SAS Disk Drive

    $ lsdev -dev hdisk0 -parent


    parent

    sas0

    $ lsdev -type adapter|grep sas0


    sissas0          Available   PCI-X266 Planar 3Gb SAS Adapter

    $ errlog|pg


    ...
    ...errors to watch out for include but are not limited to:
    ...
    E86653C3   0524184215 P H LVDD           I/O ERROR DETECTED BY LVM
    B6267342   0524184215 P H hdisk0         DISK OPERATION ERROR
    ...
    ...

    For adapter and internal disk errors, contact your local Hardware Support Representative.

    For issues with SAN-attached devices, contact your local Storage Support Representative.

    If no hardware errors are found related to any of the physical devices used by VIOS rootvg, but the errlog lists LVM I/O errors, consider running hardware diagnostics or contact your local Hardware Support Representative to get a clean bill of health from the hardware perspective before pursuing further from the software side.

    If filesystem corruption errors are found, schedule a maintenance window as soon as possible to boot the VIOS partition into maintenance mode and run a thorough filesystem check (fsck).


Scenario 3

Insufficient VIOS memory resources is known to cause updateios to fail. In some cases, you may see memory-related errors in the install.log, i.e.


    "Out of memory, malloc() failed."

    or

    "...


    /usr/lib/instl/cleanup: cannot fork: no swap space
    /usr/lib/instl/cleanup[3]: cannot fork: no swap space

    ...
    installp: fork system call failed
    installp: fork system call failed
    installp: fork system call failed

    ..."

    In other case, the install.log may not show memory errors, but after the updateios attempt, other VIOS commands may fail with similar memory errors, permission-related errors, or library errors like the ones below:


      sh: 0403-031 The fork function failed. There is not enough memory available.

      lppchk: 0504-210  Unable to read file
      /usr/sbin/rsct/install/bin/srMigrate because of permissions.


      ar: 0707-114 The fread system call failed.
      0503-037 inurest:  Failure on system call to execute command
      OBJECT_MODE=32_64 /usr/ccs/bin/ar -or ./usr/lib/liblpmcommon.a.new
      ./usr/lib/inst_updt/liblpmcommon.a/shr.o.

    Note: For VIOS 2.2, it is recommended a minimum of 4 GB of memory when the VIOS is first installed. Then, when the overall configuration is put in place (i.e. storage, virtual devices, etc have been added), the VIOS Performance Advisor tool should be run to get an educated recommendation based on the VIOS workload going on at the time the performance data is being collected.

    Recommendation

    Run VIOS Performance Advisor.


    Then, increase VIOS memory to the suggested value and try the update again.

Scenario 4

Updateios process hung.

    Recommendation

    Review /home/padmin/install.log to determine how far updateios got before it hang and whether or not your log details are a match to the Probable Cause below.


    On the other hand, if the log details do not reveal enough information to determine why it hung, refer to technote Troubleshooting a Hung Process or Command on PowerVM Virtual I/O Server. The technote describes how to gather a process dump for further investigation. Then contact your local IBM SupportLine Representative and be ready to provide the following testcase:
      1. /home/padmin/install.log &
      2. pdump data as per the technote
      3. VIOS snap
    Probable Cause #1 - updateios hangs on Common Agent, i.e.
      ...
      installp:  APPLYING software for:
              DirectorCommonAgent 6.3.5.0

      0513-044 The cas_agent Subsystem was requested to stop.
      Stopping The LWI Nonstop Profile...
      Waiting for the LWI Nonstop Profile to exit ...
      Waiting for the LWI Nonstop Profile to exit ...
      Waiting for the LWI Nonstop Profile to exit ...
      Waiting for the LWI Nonstop Profile to exit ...
      Waiting for the LWI Nonstop Profile to exit ...
      Waiting for the LWI Nonstop Profile to exit ...
      Stopped The LWI Nonstop Profile.

      . . . . . << Copyright notice for DirectorCommonAgent >> . . . . . . .
      Licensed Materials - Property of IBM

       5765-DRP
         Copyright International Business Machines Corp.  2010, 2011.

       All rights reserved.
       US Government Users Restricted Rights - Use, duplication or disclosure
       restricted by GSA ADP Schedule Contract with IBM Corp.
      . . . . . << End of copyright notice for DirectorCommonAgent >>. . . .

      Restarting the CAS nonstop service.......................................
      ...

      Recommendation

      Remove Systems Director from the VIO Server if it is not being used. Then try the update again.


      cas.agent is not the root of the problem. The complex set of dependencies for the install, which is made slightly more tricky with Systems Director on the server, is the concern.
      Below is a support document with the instructions to remove Systems Director. It contains a few links with good instructions to disable and remove System Director.IBM Systems Director Common Agent on AIX Options for Removing, Disabling, or Upgrading/Installing.
Scenario 5

Updateios may fail if one or more ifixes are installed as it can cause filesets to be locked by efix manager.


Next is a sample output from install.log:
    ...snip...
    +---------------------------------------------------------------+
    BUILDDATE Verification ...
    +---------------------------------------------------------------+
    Verifying build dates...done

    The updates being installed do not contain all the APARs to allow all existing interim fixes to be automatically removed.  Please ensure the interim fixes are enabled for automatic removal and obtain the updates that contain the APARs for the following interim fixes, or remove the interim fixes, as described below.
    IV62348s3b

    0503-006 installp:  Cannot change to directory /tmp/.workdir.8192072.6095004_1/usr/lpp/ios.cli.
    Check path name and permissions.

    EFIX MANAGER LOCKS
    ------------------

     * * * ATTENTION * * *

     The following selected filesets are locked by EFIX manager:

     ios.cli.rte

     installp has halted this operation because one or more files in the filesets listed above are registered as having an EFIX. You must remove these EFIXES before performing operations on the given fileset.

     To get a listing of all locked filesets and the locking EFIX label, execute the following command:
     # /usr/sbin/emgr -P

     To remove the given EFIX, execute the following command:
     # /usr/sbin/emgr -r -L <EFIX label>

     For more information on EFIX management please see the emgr man page and documentation.

    ...
    Recommendation

    Remove the ifix(es) prior to re-running updateios.


Scenario 6

Part of the updateios process is to check for the existence of the following Network Time Protocol (NTP) files:

    /home/padmin/config/ntp.conf
    /home/padmin/config/ntp.drift
These files are expected to exist on a VIOS partition by default and should not be removed. Failure to keep these default files will cause the updateios process to partially update ios.cli.rte, impacting dependencies. This will result in ioslevel returning the original level rather that the new one.
When these files are missing, the install.log may look similar to the sample output below
    . . .
    installp:  APPLYING software for:
    ios.cli.rte 6.1.9.201
    . . . . . << Copyright notice for ios.cli >> . . . . . . .
    Licensed Materials - Property of IBM
    5765G3400
      Copyright International Business Machines Corp. 2004, 2016.
    All rights reserved.
    US Government Users Restricted Rights - Use, duplication or disclosure
    restricted by GSA ADP Schedule Contract with IBM Corp.
    . . . . . << End of copyright notice for ios.cli >>. . . .
    sysck: 3001-022
    The file
    /home/padmin/config/ntp.conf
    was not found.

    sysck: 3001-022
    The file 
    /home/padmin/config/ntp.drift
    was not found.

    chmod: /etc/snmpd.conf: A file or directory in the path name does not exist.
    chown: /home/padmin/config/ntp.conf: A file or directory in the path name does not exist.
    chown: /home/padmin/config/ntp.drift: A file or directory in the path name does not exist.
    sysck: 3001-022 The file
    /home/padmin/config/ntp.conf
    was not found.
    sysck: 3001-022 The file
    /home/padmin/config/ntp.drift
    was not found.
    sysck: 3001-017
    Errors were detected validating the files
    for package ios.cli.rte
    .
    0503-464 installp:  The
    installation has FAILED for the "usr" part
    of the following filesets:
    ios.cli.rte
     6.1.9.201
    installp:  Cleaning up software for:
    ios.cli.rte 6.1.9.201

    sysck: 3001-022 The file
    /home/padmin/config/ntp.conf
    was not found.
    sysck: 3001-022 The file
    /home/padmin/config/ntp.drift
    was not found.
    sysck: 3001-017 Errors were detected validating the files
    for package ios.cli.rte.
    Filesets processed:  403 of 461  (Total time:  24 mins 31 secs).

    ...
    Installation Summary
    --------------------
    Name          Level     Part   Event    Result
    -------------------------------------------------

    . . .
    ios.cli.rte   6.1.9.201 USR    APPLY    FAILED     
    ios.cli.rte   6.1.9.201 USR    CLEANUP  SUCCESS

    . . .
    Recommendation

    If /home/padmin/config directory was removed, it should be recreated and the files copied to it from another VIOS ensuring the permissions match those of a working VIOS. Then try VIOS update again.


Scenario 7

Part of the updateios process involves accessing certain directories/files that are expected to exist on a VIOS partition by default. Such is the case of /etc/perf directory.


This directory and its contents are expected to exist on a VIOS partition by default and should not be removed. If removed, the updateios process may partially update perfagent.tools fileset. The install.log may look similar to the sample output below (obtained from an updateios attempt to 2.2.4.30):
...
installp:  APPLYING software for:
perfagent.tools 6.1.9.200

. . . . . << Copyright notice for perfagent.tools >> . . . . . . .
Licensed Materials - Property of IBM

5765G6200
  Copyright International Business Machines Corp. 1993, 2016.
  Copyright Regents of the University of California 1982, 1986, 1987.
  Copyright BULL 1993, 2016.

All rights reserved.
US Government Users Restricted Rights - Use, duplication or disclosure
restricted by GSA ADP Schedule Contract with IBM Corp.
. . . . . << End of copyright notice for perfagent.tools >>. . . .

mkdir: 0653-357
Cannot access directory /etc/perf.
/etc/perf: A file or directory in the path name does not exist.
chmod: /etc/perf/daily: A file or directory in the path name does not exist.
touch: 0652-046 Cannot create /etc/perf/xmtopas.log1.
chmod: /etc/perf/xmtopas.log1: A file or directory in the path name does not exist.
touch: 0652-046 Cannot create /etc/perf/xmtopas.log2.
chmod: /etc/perf/xmtopas.log2: A file or directory in the path name does not exist.
touch: 0652-046 Cannot create /etc/perf/xmwlm.log1.
chmod: /etc/perf/xmwlm.log1: A file or directory in the path name does not exist.
touch: 0652-046 Cannot create /etc/perf/xmwlm.log2.
chmod: /etc/perf/xmwlm.log2: A file or directory in the path name does not exist.
error creating /etc/perf directories or files for xmtopas/xmwlm
update:  Failed while executing the perfagent.tools.config_u script.

0503-464 installp:  
The installation has FAILED for the "root" part
of the following filesets:
perfagent.tools 6.1.9.200


installp:  Cleaning up software for:
perfagent.tools 6.1.9.200

...
Installation Summary
--------------------
Name             Level       Part   Event    Result
-------------------------------------------------

. . .
perfagent.tools  6.1.9.200   USR    APPLY    SUCCESS    
perfagent.tools  6.1.9.200   ROOT   APPLY    FAILED    
perfagent.tools  6.1.9.200   ROOT   CLEANUP  SUCCESS
   
    Recommendation

    If /etc/perf directory was removed, it should be recreated and its contents copied to it from another VIOS ensuring the permissions match those of a working VIOS. Then try VIOS update again.



If none of the above scenarios are applicable to you

Contact your local IBM SupportLine Representative and be ready to provide the following testcase:


[{"Product":{"code":"SSPHKW","label":"PowerVM Virtual I\/O Server"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":null,"Platform":[{"code":"","label":"Other"}],"Version":"2.2.5;2.2.4;2.2.3;2.2.2;2.2.1;2.2.0","Edition":"Enterprise;Express;Standard","Line of Business":{"code":"LOB57","label":"Power"}}]

Document Information

Modified date:
19 February 2022

UID

isg3T1022699