Link to additional resources

Additional links:

Click to send email feedback

This presentation is also available as PDF: rsct_event_notification.pdf

IBM Tivoli System Automation for Multiplatforms V3.2An example of using RSCT's Event Notification feature IBM Tivoli System Automation for Multiplatforms V3.2An example of using RSCT's Event Notification feature Objectives When you have completed this module, you are able to perform these tasks: Create a condition Create a response Associate a condition with a response Activate and deactivate the condition-response pair 2 An example of using RSCT's Event Notification feature Introduction RSCT (the Cluster software) provides event notification function, which is serviced by the IBM.ERRMd Resource Manager At a high level, you define a condition, define a response, associate the condition with the response, and finally you activate the condition-response pair You can have multiple responses per condition You can have multiple condition-response pairs active simultaneously. You can toggle the state (active or not active) for any condition-response pair with relative easefor example, to turn off event notification during planned restarts or other maintenance activities that would otherwise trigger an unwanted alert This education module is a primer only, and you can do far more than the simple example covered here Use the man pages for each of the commands that are demonstrated in these slides to learn how to use this event notification feature further 3 An example of using RSCT's Event Notification feature Scenario that is detailed in this education module Assume a two-node clustered environment: node01, your designated primary under normal circumstances, and node02, your designated standby under normal circumstances In a simple high-availability (HA) setup, if node01 stopped functioning and the resources that were running on node01 fail over to node02, it would be helpful to be notified of such a failover Instead of alerting on each individual resource that moves to node02, this education module focuses on just monitoring the status of node01 and sending an alert if it goes down. Essentially the alerting is for failovers that are triggered by node level outages. The reverse can be defined as well, that is, if node02 goes down, then also send an alert Event Notification can be expanded as needed because the concepts for setting up the condition/response definitions are the same regardless of the resource being monitored or the alert type The rest of this education module takes you through the four steps necessary to set up an alert for when node01 goes down. The steps need to be performed on node02 (your standby), as the root user 4 An example of using RSCT's Event Notification feature Step A. Make the condition (1 of 3) You want to monitor the attribute OpState for the IBM.PeerNode resource called node01, and you want the condition to trigger if the OpState changes to 2 (Offline): root@node02:# export CT_MANAGEMENT_SCOPE=2 root@node02:# mkcondition -r IBM.PeerNode -e "OpState == 2" -d "Node node01 went offline" -E "OpState == 1" -D "Node node01 is back online" -s "Name = 'node01'" -n node01 -m p -S w "PrimaryNodeDown" Some key points This condition resets when the OpState of this node changes back to 1 (Online) Although the condition name is PrimaryNodeName, you can change this name to something more meaningful, for example, replace Primary with the actual host name, if you prefer Substitute all references to node01 with the host name of your designated primary server. Your host name format must match what is shown from the output of lsrsrc IBM.PeerNode 5 An example of using RSCT's Event Notification feature Step A. Make the condition (2 of 3) You can list all conditions that are defined: root@node02:# lscondition Displaying condition information: Name Node MonitorStatus "PrimaryNodeDown" "node02" "Monitored" 6 An example of using RSCT's Event Notification feature Step A. Make the condition (3 of 3) You can list all the attribute=value pairs for an individual condition: root@node02:# lscondition "PrimaryNodeDown" Displaying condition information: condition 1: Name = "PrimaryNodeDown" Node = "node02" MonitorStatus = "Monitored" ResourceClass = "IBM.PeerNode" EventExpression = "OpState == 2" EventDescription = "Node node01 went offline" RearmExpression = "OpState == 1" RearmDescription = "Node node01 is back online" SelectionString = "Name = 'node01'" Severity = "w" NodeNames = {"node01"} MgtScope = "p" Toggle = "Yes" EventBatchingInterval = 0 EventBatchingMaxEvents = 0 BatchedEventRetentionPeriod = 0 BattchedEventMaxTotalSize = 0 RecordAuditLog = "ALL" This node attribute shows where the condition is being monitored in this case because you want to check whether the primary server (node01) is up or down, you create and start the condition on node02 7 An example of using RSCT's Event Notification feature Step B. Make the response (1 of 3) This example uses a utility that is called notifyevent (provided with RSCT) to send an email as the alert mechanism: root@node02:# mkresponse -n EmailAction -s "/usr/sbin/rsct/bin/notifyevent jsmith@company.com" -d 1-7 -t 0000-2400 -e a "Email 24x7 Notification" Some key points You can add multiple email addresses that are separated by spaces after the notifyevent command -d specifies the days of the week this response is valid, such as 1-7, which means Sunday through Saturday, inclusive -t specifies the time period within the day, or 24-hour coverage in this example The name of the response is called Email 24x7 Notification 8 An example of using RSCT's Event Notification feature Step B. Make the response (2 of 3) You can list all responses that are defined, including predefined responses: root@node02:# lsresponse Displaying response information: ResponseName Node "Email 24x7 Notification" "node01" "Broadcast details of event any time" "node01" "Generate SNMP trap" "node01" "Critical notifications" "node01" "Warning notifications" "node01" "Informational notifications" "node01" "Log event anytime" "node01" "E-mail root anytime" "node01" "E-mail root off-shift" "node01" "Broadcast event on-shift" "node01" "Broadcast details of event any time" "node02" "Generate SNMP trap" "node02" "Critical notifications" "node02" "Warning notifications" "node02" "Informational notifications" "node02" "Log event anytime" "node02" "E-mail root anytime" "node02" "E-mail root off-shift" "node02" "Broadcast event on-shift" "node02" 9 An example of using RSCT's Event Notification feature Step B. Make the response (3 of 3) You can list all the attribute=value pairs for an individual response: root@node02:# lsresponse "Email 24x7 Notification" Displaying response information: ResponseName = "Email 24x7 Notification" Node = "node02" Action = "EmailAction" DaysOfWeek = 1-7 TimeOfDay = 0000-2400 ActionScript = "/usr/sbin/rsct/bin/notifyevent MJohnson@nycm.com" ReturnCode = 0 CheckReturnCode = "n" EventType = "a" StandardOut = "n" EnvironmentVars = "" UndefRes = "n" EventBatching = "n" The man pages for mkresponse provide more detail about how to define more complex day and time periods, like sending notifications only on Saturday or Sunday and only after 5pm for all other days You can create your own response script (mkresponse -s myscript ) to do far more than a simple email alert, for example, to automatically collect diagnostic data if a certain condition is triggered 10 An example of using RSCT's Event Notification feature Step C. Associate a condition with a response (1 of 2) Associate the newly created condition and the custom response: root@node02:# mkcondresp "PrimaryNodeDown" "Email 24x7 Notification" If the condition within PrimaryNodeDown is met, the actions within Email 24x7 Notification are carried out after Step D (activation), still to be covered Again, as reinforcement, all these commands have been run on node02: Where the condition is monitored from Where the response is executed 11 An example of using RSCT's Event Notification feature Step C. Associate a condition with a response (2 of 2) You can list all the condition-response pairs: root@node02:# lscondresp Displaying condition with response information: Condition Response Node State "PrimaryNodeDown" "Email 24x7 Notification" "node02" "Not active" The state is not active 12 An example of using RSCT's Event Notification feature Step D. Activate or deactivate a condition and response The last step is to activate the new condition and response: root@node02:# startcondresp "PrimaryNodeDown" List the condition-response again to check whether the state changed to active: root@node02:# lscondresp Displaying condition with response information: Condition Response Node State "PrimaryNodeDown" "Email 24x7 Notification" "node02" "Active" To deactivate a condition-response: root@node02:# stopcondresp "PrimaryNodeDown" 13 An example of using RSCT's Event Notification feature Completion and beyond That completes the setup for the scenario where node01 goes down To cover the case where node02 goes down, you create a condition specifically for it and activate a suitable condition and response on node01, that is, repeat the previous steps (with some changes to the node names) on node01 14 An example of using RSCT's Event Notification feature Summary Now that you have completed this module, you can accomplish these tasks: Create a condition Create a response Associate a condition with a response Activate and deactivate the condition-response pair 15 An example of using RSCT's Event Notification feature 16 An example of using RSCT's Event Notification feature