IBM Support

IBM PureData System for Operational Analytics has numerous SHIENT errors in errpt.

Troubleshooting


Problem

IBM PureData System for Operational Analytics Version 1.0 FP4 and V1.1 ship with an AIX version that has APAR IV66360. This results in unnecessary errors in the AIX errrpt on all hosts for adapters that are not connected.

Symptom

On one or more hosts running 'errpt' will display many messages resembling the following line.

    76C587C0 0719222915 T H ent2 Physical link down

errpt -a will show messages like the following:

    ---------------------------------------------------------------------------
    LABEL:          SHIENT_PLINK_DOWN
    IDENTIFIER:     76C587C0

    Date/Time:       Mon Oct 19 20:46:36 IST 2015
    Sequence Number: 79424
    Machine Id:      00F968BF4C00
    Node Id:         hostname01
    Class:           H
    Type:            TEMP
    WPAR:            Global
    Resource Name:   ent7
    Resource Class:  adapter
    Resource Type:   e4148a169404
    Location:        U78C9.001.WZS02F5-P1-C6-T4

    VPD:
          PCIe2 4-Port (10GbE SFP+ & 1GbE RJ45) Adapter:
            FRU Number..................00E2715
            EC Level....................D77452
            Customer Card ID Number.....2CC3
            Part Number.................00E2719
            Feature Code/Marketing ID...EN0S
            Serial Number...............Y050NY44I617
            Manufacture ID..............40F2E9D34CFC
            Network Address.............40F2E9D34CFF
            ROM Level.(alterable).......30100150

    Description
    Physical link down

            Recommended Actions
            PERFORM PROBLEM DETERMINATION PROCEDURES

    Detail Data
    FILE NAME
    line: 442 file: entcore_link.c
    MAC ADDRESS
    40F2 E9D3 4CFF
    DEVICE DRIVER INTERNAL STATE
    0000 0000 2000 0000 0000 0000 0000 0001 0000 0000 0000 0000
    PCI ETHERNET STATISTICS
    0061 0852 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
    0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
    0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
    0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
    0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
    0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
    0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
    0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
    TRACE RECORD SEQUENCE NUMBER
    e:0 l:442 f:entcore_link_change r:0x0 s:0 o:0
    NUMBER OF BYTES

    SENSE DATA


    Diagnostic Analysis
    Diagnostic Log sequence number: 236860
    Resource tested:        ent7
    Menu Number:            2E43702
    Description:


    No trouble was found with this resource.  However
    Error Log Analysis indicates that there recently may
    have been a network problem.

    If your Ethernet device is connected to a network,
    and if you are experiencing problems with network
    communications, check for a loose or defective
    cable or connection.

    If a switch or another system is directly attached
    to the Ethernet device, verify it is powered up,
    configured, and functioning correctly.

These messages tend to repeat every 7 minutes for all Available adapters that are not assigned an IP address and are not part of an EtherChannel or are not VLAN adapters.

Cause


The cause is described in the APAR link for AIX found at the following url http://www-01.ibm.com/support/docview.wss?uid=isg1IV66350.

Environment

IBM PureData System for Operational Analytics V1.0 FP4 or earlier, V1.1

Diagnosing The Problem


Look for messages in the errpt in any host in the environment for adapters that are not assigned to ent11 and are listed as Available in lsdev output.

    76C587C0 0719222915 T H ent2 Physical link down

If there are no messages, you can test the problem using the following one line script.

eclist="$(lsdev | grep ent | grep EtherChannel | awk '{print $1}')";for ec in $eclist;do adapters="$(lsattr -EOl ent11 -a adapter_names | grep -v '#' | sed 's|,| |g')";for adapter in ${adapters};do cmd="entstat ${adapter}";echo $cmd;$cmd;done;done
entstat ent4

entstat: 0909-003 Unable to connect to device ent4, errno = 19
entstat ent0

entstat: 0909-003 Unable to connect to device ent0, errno = 19
entstat ent5

entstat: 0909-003 Unable to connect to device ent5, errno = 19
entstat ent1

entstat: 0909-003 Unable to connect to device ent1, errno = 19


Then check errpt to look for errpt messages for any of the adapters listed in the stderr output.

Resolving The Problem

The following script can be created on all hosts in the environment and either run at startup through an inittab entry, run via cron job, or run by hand. This script implements the workaround mentioned in the APAR bulletin by removing adapters that are not involved in an Etherchannel and are not VLAN adapters. This workaround has been proven to prevent the unecessary message in errpt.

If any of the free adapters are in use update the 'good_adapter_filter' list variable with @ delimited list of adapter names.

    #!/bin/sh

    cat<<COMMENT

    DATE    : 2015-08-27
    AUTHOR  : GLS
    Purpose : Find adapters that trigger this apar, check to see if they are active, if so, put them in a defined state.

    COMMENT

    export LANG=en_US

    good_adapter_list=$(lsattr -EOl ent11 -a adapter_names | grep -v "^#" | sed "s|,| |g")
    good_adapter_filter=

    for i in ${good_adapter_list}
    do
        printf "Found adapter $i as part of ent11.\n"
        good_adapter_filter="@$i@$good_adapter_filter"
    done

    printf "Good adapter filter has been created as $good_adapter_filter.\n"

    all_adapters=$(lsdev | grep "^ent[0-9]" | egrep -v 'EtherChannel|VLAN' | grep 'Available' |  awk '{print $1}')

    reccommands=
    for acheck in ${all_adapters}
    do
        printf "Found adapter $acheck.\n"
        echo "$good_adapter_filter" | grep "@${acheck}@" > /dev/null
        rc=$?


        if [ $rc -eq 0 ]
        then
            printf "The adapter ${acheck} is a valid adapter.\n"
        else
            printf "The adapter ${acheck} should be in the defined state due to this APAR.\n"
            printf "Run the following: rmdev -l ${acheck} to set the adapter to defined.\n"
            reccommands="rmdev -l ${acheck}\n${reccommands}"
        printf "Running the command: rmdev -l ${acheck}\n"
        rmdev -l ${acheck}
        fi

    done

    printf "Summary:\n"
    printf "-------------\n"
    printf "$reccommands\n"
    printf "-------------\n"

This script can be run at startup or as part of a regular cron job. It can be run more than once.

NOTE:


    This script must be re-run after reboot or after running cfgmgr. Both will reset the adapters back to Available which will result in the extraneous messages in errpt.

Related Information

[{"Product":{"code":"SSH2TE","label":"PureData System for Operational Analytics A1801"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":["Not Applicable","Not Applicable"],"Platform":[{"code":"PF002","label":"AIX"}],"Version":"1.0;1.1","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
17 October 2019

UID

swg21969287