IBM Support

IC69428: MAKE DB2V97_MONITOR.KSH SCRIPT MORE ROBUST TO REDUCE FALSE NEGATIVES

Subscribe

You can track all active APARs for this component.

APAR status

  • Closed as program error.

Error description

  • ** This APAR applies only to integrated HA solutions **
    
    On memory constrained systems or very busy systems ps behavior
    is unpredictable where ps may return the process name in square
    brackets.
    Hence there's a chance that this check:
    
    p_pid=$(ps -u ${DB2INSTANCE?} -o args | grep -v "^db2sysc [0-9]"
    | grep -c "^db2sysc")
    
    returns 0, which in turns makes this script return a status of 2
    i.e. the instance is down.
    

Local fix

  • To avoid these "false negatives", modify this line to check for
    square bracket and return unknown if it is found.
    
    
    Original:
      p_pid=$(ps -u ${DB2INSTANCE?} -o args | grep -c "^db2sysc
    ${NN?}[ ]*$")
       if [[ $p_pid == 0 && $NN -eq 0 ]]; then
          p_pid=$(ps -u ${DB2INSTANCE?} -o args | grep -v "^db2sysc
    [0-9]" | grep -c "^db2sysc")
       fi
    
    New:
      p_out=$(ps -u ${DB2INSTANCE?} -o args | egrep "^db2sysc
    ${NN?}[ ]*$|^db2sysc[ ]*$|^\[db2sysc\]")
       p_pid=$(echo $p_out | grep -c "\[db2sysc\]")
       if [[ $p_pid != 0 ]]; then
          logger -i -p err -t $0 "ps returns [db2sysc]"
          echo 0
          return 0
       fi
    
       p_pid=$(echo $p_out | grep -c "db2sysc ${NN?}")
       if [[ $p_pid == 0 && $NN -eq 0 ]]; then
          p_pid=$(echo $p_out | grep -c "db2sysc")
       fi
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * All users using integrated HA solution                       *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * The ps command in highly stressed environments can return    *
    * wrong                                                        *
    * values which can throw off the script and cause involuntary  *
    *                                                              *
    * cycling of the instance. The fix will provide a fool proof   *
    * way                                                          *
    * of verification of the instance so that the false negatives  *
    * are                                                          *
    * reduced to almost zero.                                      *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Fixed in v97fp3.                                             *
    ****************************************************************
    The ps command in highly stressed environments can return wrong
    values which can throw off the script and cause involuntary
    cycling of the instance. The fix will provide a fool proof way
    of verification of the instance so that the false negatives are
    reduced to almost zero.
    

Problem conclusion

  • Fixed in v97fp3.
    
    In the script db2V97_monitor.ksh modify the ps command checks
    with the following lines:
    
    # If home dir not accessible, use plain old ps ...
    
    
      p_out=$(ps -u ${DB2INSTANCE?} -o args | egrep "^db2sysc
    
    
    ${NN?}[ ]*$|^db2sysc[ ]*$|^\[db2sysc\]")
    
    
    
      p_pid=$(echo $p_out | grep -c "db2sysc ${NN?}")
    
    
      if [[ $p_pid == 0 && $NN -eq 0 ]]; then
          p_pid=$(echo $p_out | grep -c "db2sysc")
    
    
      fi
    
      if [[ $p_pid == 0 ]]; then
          p_pid=$(echo $p_out | grep -c "\[db2sysc\]")
    
    
          if [[ $p_pid != 0 ]]; then
     logger -i -p err -t $0 "ps returns [db2sysc]: returning
    
    
    0"
    
    
     echo 0
     return 0
          fi
          rc=1
      else
          rc=0
      fi
    
    Pls. ratify it with DB2 support before deploying it in
    production.
    

Temporary fix

  • In the script db2V97_monitor.ksh modify the ps command checks
    with the following lines:
           #
    If home dir not accessible, use plain old ps ...
    
    p_out=$(ps -u ${DB2INSTANCE?} -o args | egrep "^db2sysc
    
    ${NN?}[ ]*$|^db2sysc[ ]*$|^\[db2sysc\]")
    
    
     p_pid=$(echo $p_out | grep -c "db2sysc ${NN?}")
    
      if [[ $p_pid == 0 && $NN -eq 0 ]]; then
         p_pid=$(echo $p_out | grep -c "db2sysc")
    
      fi
    
    if [[ $p_pid == 0 ]]; then
       p_pid=$(echo $p_out | grep -c "\[db2sysc\]")
    
        if [[ $p_pid != 0 ]]; then
         logger -i -p err -t $0 "ps returns [db2sysc]: returning
    
    0"
    
     echo 0
           return 0
        fi
       rc=1
    else
       rc=0
    fi
    
    Pls. ratify it with DB2 support before deploying it in
    production.
    

Comments

APAR Information

  • APAR number

    IC69428

  • Reported component name

    DB2 FOR LUW

  • Reported component ID

    DB2FORLUW

  • Reported release

    970

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2010-06-23

  • Closed date

    2011-04-13

  • Last modified date

    2011-04-13

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    IC69573

Fix information

  • Fixed component name

    DB2 FOR LUW

  • Fixed component ID

    DB2FORLUW

Applicable component levels

  • R970 PSY

       UP



Document information

More support for: DB2 for Linux, UNIX and Windows

Software version: 9.7

Reference #: IC69428

Modified date: 13 April 2011