APAR status
Closed as program error.
Error description
When a repository gets replaced the clvdisk attribute of cluster0 device in ODM is updated with the UUID of the new primary repository disk. The savebase command is executed to update the Boot Logical Volume. In case an entire data centre fails: * One of two PowerHA/CAA nodes fails * One of two storage subsystems fails. The failed storage subsystem hosts - the primary repository disk - the rootvg disk of the remaining node, which has been used at boot. the savebase of the Automatic Reposirory Replacement (ARR) operation fails. syslog.caa looks like: ... Apr 3 12:36:54 ha1clC caa:info cluster[8454150]: cluster_utils.c cl_run_log_method 11951 523 START '/usr/sbin/chdev -l cluster0 -a clvdisk='2e199ead-1ed0-aeb4-e331-8c59c3f668f6'' ... Apr 3 12:36:55 ha1clC caa:info cluster[8454150]: cluster_utils.c cl_run_log_method 11951 523 START '/usr/sbin/savebase ' Apr 3 12:36:55 ha1clC caa:info cluster[8454150]: cluster_utils.c cl_run_log_method 11982 523 FINISH return = 1 ... Apr 3 12:36:55 ha1clC caa:err|error cluster[8454150]: caa_config.c cl_th_cfg_msg 6691 523 savebase failed on ha1clC.mainz.de.ibm.com. Please run savebase on ha1clC.mainz.de.ibm.com. ... Apr 3 12:36:57 ha1clC caa:info cluster[8454150]: cl_chrepos.c automatic_repository_update 2297 1 FINISH rc = 0 Apr 3 12:36:57 ha1clC caa:info cluster[8454150]: caa_protocols.c recv_protocol_slave 1542 1 Returning from Automatic Repository replacement rc = 0 ... CAA will not start after a reboot of remaining node. syslog.caa looks like: ... Apr 3 14:02:37 ha1clC caa:err|error cluster[3473558]: cluster_utils.c cluster_repository_read_data 4777 1 Could not get name of cluster repository disk from ODM (ODMDIR=/etc/objrepos). Apr 3 14:02:37 ha1clC caa:info cluster[3473558]: cluster_utils.c cl_kern_repos_check 11858 1 Could not read the respository. ... Apr 3 14:02:37 ha1clC caa:err|error cluster[3473558]: clusterconf_lib.c _find_and_load_repos 1482 1 cluster_repository_query() found a UUID but no corresponding disk. This condition may be temporary. Apr 3 14:02:37 ha1clC caa:warn|warning cluster[3473558]: 1035-242 clusterconf: Non-fatal error when loading the topology. ...
Local fix
Run $ clusterconf -r <new primary rep disk> at the remaining node after reboot.
Problem summary
savebase can fail on a mirrored rootvg if one disk is not accessible.
Problem conclusion
made savebase to look for blvs on all the disks in the mirrored rootvg.
Temporary fix
Comments
APAR Information
APAR number
IV95025
Reported component name
AIX V7.1
Reported component ID
5765H4000
Reported release
710
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2017-04-12
Closed date
2017-06-30
Last modified date
2018-09-21
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
IV97722 IV97730 IV97733 IV97748 IV97751 IJ01657 U882085
Fix information
Fixed component name
AIX V7.1
Fixed component ID
5765H4000
Applicable component levels
R710 PSY U882085
UP18/08/22 I 1000
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SG11R"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"710","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}}]
Document Information
Modified date:
19 April 2022