IBM Support

IV49687: POWERHA CLUSTER MANAGER CANNOT RE-CONNECT TO SNMP DAEMON APPLIES TO AIX 7100-03

A fix is available

Subscribe

You can track all active APARs for this component.

APAR status

  • Closed as program error.

Error description

  • If snmpd daemon has been stopped for a time shorter than
    60 seconds then it is very probable that the PowerHA
    cluster manager will not be able to re-connect to
    the snmpd daemon.
    It will try to get the connection back once a minute
    failing each time until it will give up after 60 minutes.
    
    The following symptoms can be observed:
    
    - the cldump command gets hung instead of returning
    reasonable output:
    $ cldump
    
    cldump: Waiting for the Cluster SMUX peer (clstrmgrES
    to stabilize.............
    
    - the cluster manager debug log:
    /var/hacmp/log/clstrmgr.debug
    is reporting the following entries:
    ...
    Thu Aug  1 03:36:53 ConnectToSnmp(): called, smuxFd is
    -1,
     retry count 17
    Thu Aug  1 03:36:53 ConnectToSnmp: smux_init returned -1,
     scheduled retry timer 11865
    Thu Aug  1 03:37:53 ConnectToSnmp(): called, smuxFd is
    -1,
     retry count 18
    Thu Aug  1 03:37:53 ConnectToSnmp: smux_init returned -1,
     scheduled retry timer 11868
    Thu Aug  1 03:38:53 ConnectToSnmp(): called, smuxFd is
    -1,
     retry count 19
    Thu Aug  1 03:38:53 ConnectToSnmp: smux_init returned -1,
     scheduled retry timer 11872
    Thu Aug  1 03:39:53 ConnectToSnmp(): called, smuxFd is
    -1,
     retry count 20
    Thu Aug  1 03:39:53 ConnectToSnmp: smux_init returned -1,
     scheduled retry timer 11876
    ...
    Once the cluster manager has given up to connect to snmp
    it reports this:
    ...
    Thu Aug  1 04:29:53 ConnectToSnmp(): called, smuxFd is
    -1,
     retry count 59
    Thu Aug  1 04:29:53 ConnectToSnmp: smux_init returned -1,
     scheduled retry timer 12026
    Thu Aug  1 04:30:53 ConnectToSnmp(): called, smuxFd is
    -1,
     retry count 60
    Thu Aug  1 04:30:53 ConnectToSnmp: smux_init returned -1,
     scheduled retry timer 12029
    Thu Aug  1 04:31:53 ConnectToSnmp(): called, smuxFd is
    -1,
     retry count 61
    Thu Aug  1 04:31:54 HACMP: clstrmgrES: Unable to connect
    to SNMP after 62 retries, giving up.
    To restart connection attempts, refresh or restart
    clstrmgr
    ...
    

Local fix

  • The is a workaround to stop and restart the snmpd daemon
    and all snmpd dependent daemons next in a proper order.
    It is suggested to let the snmpd daemon be disabled for
    a time longer than 60 seconds before it restarts.
    Refreshing the cluster manager daemon may be required
    in certain cases.
    

Problem summary

  • On restarting process snmpd, console may show log message like
    below.
    example:
     snmpd: 1473-191 bind() function failed for SMUX inet socket.
    

Problem conclusion

  • Code modified on snmpd to bind to smux port on restarts.
    

Temporary fix

Comments

  • 6100-07 - use AIX APAR IV58285
    6100-08 - use AIX APAR IV57918
    6100-09 - use AIX APAR IV48759
    7100-01 - use AIX APAR IV47777
    7100-02 - use AIX APAR IV48108
    7100-03 - use AIX APAR IV49687
    

APAR Information

  • APAR number

    IV49687

  • Reported component name

    AIX V7.1

  • Reported component ID

    5765H4000

  • Reported release

    710

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Submitted date

    2013-09-19

  • Closed date

    2013-09-19

  • Last modified date

    2014-08-14

  • APAR is sysrouted FROM one or more of the following:

    IV47777

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    AIX V7.1

  • Fixed component ID

    5765H4000

Applicable component levels

  • R710 PSY U859722

       UP14/05/22 I 1000

PTF to Fileset Mapping



Document information

More support for: AIX Enterprise Edition

Software version: 710

Operating system(s): AIX

Reference #: IV49687

Modified date: 14 August 2014