IBM Support

IV61112: POWERHA: MKCLUSTER MAY FAIL IF HACMPCLUSTER EXISTS ON <NODE2> APPLIES TO AIX 7100-04

A fix is available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • During the initial PowerHA setup, the CAA cluster is
    created using verify and sync ("clmgr sync cluster" or
    via "smitty sysmirror").
    
    In some cases, the mkcluster will fail to create CAA and
    users will see this on /var/hacmp/log/clutils.log:
    
    <snipped>
    INFO: START '/usr/es/sbin/cluster/sbin/smcaactrl -O
    JOIN_NODE -P POST -T 2 -cERROR: ADD_NODE failed for
    <node2>
    1035-264 mkcluster: Could not add all new entities.
    1035-305 mkcluster: Could not create cluster.
            The device is not ready for operation.
    <snipped>
    INFO: = = END JOIN_NODE Op = = POST Stage = =
    INFO: START
    INFO: FINISH return = 1
    <snipped>
    INFO: nodename = <node2>
    INFO: return = -1, Failed to receive message: sock=8,
    recv rc=0, msgbytes=32, errno=73
    INFO: Failed to receive request message.
    INFO: FINISH return = -1
    
    Also to make sure user is hitting this APAR, with
    caa.debug enabled, the /var/adm/ras/syslog.caa of node2
    will show a looping with these messages below during that
    mkcluster attempt:
    
    <snipped>
    May 15 14:53:12 node2 caa:debug cluster[14876770]:
    cluster_utils.c    cl_canonical_nodename   11060   1
          START
    May 15 14:53:12 node2 caa:info cluster[14876770]:
    cluster_utils.c     am_i_a_powerha  10789   1       START
    May 15 14:53:12 node2 caa:debug cluster[14876770]:
    caa_query.c        entlist_start   700     1
     Failed to get kernext topology information,
    cluster_query_ext_v2 failed with 2: A file or directory
    in
     the path name does not exist.
    May 15 14:53:12 node2 caa:debug cluster[14876770]:
    caa_query.c        cl_query        2334    1
    Could not get the topology entlist:
    The system call does not exist on this system. (109)
    May 15 14:53:12 node2 caa:info cluster[14876770]:
    caa_query.c cl_query        2370    1       Query failed:
    line 2332: The system call does not exist on this system.
    May 15 14:53:12 node2 caa:info cluster[14876770]:
    clusterconf_lib.c   _find_and_load_repos    1358    1
         got hdisk# from ODM
    May 15 14:53:12 node2 caa:debug cluster[14876770]:
    cluster_bootutils.c        is_clvdisk      310     1
        START in_type=3
    May 15 14:53:12 node2 caa:debug cluster[14876770]:
    cluster_bootutils.c        cluster_repos_disk_read 71
         1       START fd=6
    May 15 14:53:12 node2 caa:debug cluster[14876770]:
    cluster_bootutils.c        cluster_repos_disk_read 112
       1       FINISH
    May 15 14:53:12 node2 caa:debug cluster[14876770]:
     cluster_bootutils.c        is_clvdisk      456     1
        FINISH return = 1
    May 15 14:53:12 node2 caa:debug cluster[14876770]:
    cluster_utils.c    cluster_repository_read_data    4709
      1       START cr_read_type=17
    May 15 14:53:12 node2 caa:debug cluster[14876770]:
     cluster_utils.c
    cluster_repository_read_data_from_disk
    4752    1       START
    May 15 14:53:12 node2 caa:debug cluster[14876770]:
     cluster_utils.c
    cluster_repository_read_data_from_disk
      4857    1       FINISH return = 0
    May 15 14:53:12 node2 caa:debug cluster[14876770]:
    cluster_utils.c    cluster_node_in_nodelist_by_uuid
      713     1       START
    May 15 14:53:12 node2 caa:debug cluster[14876770]:
     cluster_utils.c    cluster_node_in_nodelist_by_uuid
         750     1       return = NULL
    May 15 14:53:12 node2 caa:debug cluster[14876770]:
     cluster_utils.c    cluster_node_in_nodelist_by_uuid
         760     1       FINISH
    <snipped>
    

Local fix

  • 1) on node1 (take a snapshot):
    #clmgr add snapshot <snapshot_name>
    2) on node2 (delete cluster definitions)
    #clmgr delete cluster
    (this will make all the HACMP* ODM classes to be empty)
    3) on node1:
    #clmgr sync cluster
    +++++++++++++++++++
    NOTES:
    1) Be very careful when running "clmgr delete cluster".
    This will completely remove the cluster definitions; in case of
    doubt where to run this command, call IBM services before.
    2) This Local fix DOES NOT apply if you are doing migration.
    Please
    apply the APAR IV61112 first or contact IBM services before you
    proceed.
    

Problem summary

  • mkcluster fails with:
    INFO: START '/usr/es/sbin/cluster/sbin/smcaactrl -O
    JOIN_NODE -P POST -T 2 -cERROR: ADD_NODE failed for
    <node2>
    1035-264 mkcluster: Could not add all new entities.
    1035-305 mkcluster: Could not create cluster.
    The device is not ready for operation.
    <snipped>
    INFO: = = END JOIN_NODE Op = = POST Stage = =
    INFO: START
    INFO: FINISH return = 1
    <snipped>
    INFO: nodename = <node2>
    INFO: return = -1, Failed to receive message: sock=8,
    recv rc=0, msgbytes=32, errno=73
    INFO: Failed to receive request message.
    INFO: FINISH return = -1
    

Problem conclusion

  • Added check that cluster exists when confirming node is part
    of PowerHA.
    

Temporary fix

Comments

  • 6100-09 - use AIX APAR IV61060
    6100-09 - use AIX APAR IV61060
    6100-09 - use AIX APAR IV61060
    7100-03 - use AIX APAR IV60736
    7100-04 - use AIX APAR IV61112
    

APAR Information

  • APAR number

    IV61112

  • Reported component name

    AIX V7.1

  • Reported component ID

    5765H4000

  • Reported release

    710

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Submitted date

    2014-06-02

  • Closed date

    2014-06-02

  • Last modified date

    2016-05-11

  • APAR is sysrouted FROM one or more of the following:

    IV60736

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    AIX V7.1

  • Fixed component ID

    5765H4000

Applicable component levels

  • R710 PSY U861566

       UP15/11/22 I 1000

PTF to Fileset Mapping

[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SSMV87","label":"AIX 6.1 Enterprise Edition"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"710","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSMVAX","label":"AIX Express Edition"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"710","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG11R","label":"AIX 7.1 HIPERS, APARs and Fixes"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"710","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
11 May 2016