How does the attribute KQ5_NODE_TO_ACTIVE_GROUP.RG_Node_Changed work in a cluster environment?
This attribute is of type "isValueChanged".
Its definition is something like this
<Attribute RG_Node_Changed >
isValueChanged type implies that the attribute of this type will return value as 'True' if its value has changed since it was read last time.
So here, the value of RG_Node_Changed attribute will be True if the value of OwnerNodeName has changed since it was accessed last time.
Here is what happens during failover.
Lets say we have a resource group containing SQL service.
It is currently owned by Node_A. when failover occurs , the owner of this resource group changes to Node_B.
The situation Node_Changed is evaluated and as the value of OwnerNodeName changed from Node_A (previous value) to Node_B (current value) so value of RG_Node_Changed will become true and situation will be triggered.
Now after 30 seconds, situation will again be evaluated but this time the value of OwnerNodeName is still same as that of last time i.e. Node_B.
So RG_Node_Changed becomes false and situation is cleared.
Now what is happening in the environment ? The cluster agent service resource is also present along with the SQL resource in the same resource group. So when the SQL group is failed over, it leads to the failover of cluster agent service too. After failover, cluster agent on the previously active node goes offline and agent on the currently active node becomes online. And as this newly alive cluster agent is just started its functioning from scratch, it will first start all its situations and Node_Changed situation will check RG_Node_Changed attribute which is still false for this agent on current node (Node_B)
So, bottom line is that if the agent service is also failed over from Node_A to Node_B, the agent on currently active node (Node_B) will not be able to detect that owning node of the resource has been changed.
To accomplish this, it is recommended that we create separate resource group for cluster agent service.
So that agent service is not failed over along with other resources when other service resources crash.
If the whole Node crashes, we cannot prevent failover of cluster agent service, so in that case, this situation will never fire as explained above.
The design of cluster agent is such that, it monitors all the resources and resource groups present in a cluster.
It does not matter which node in the cluster owns them.
So even if the cluster agent is on Node A, it will monitor the resources owned by Node B too (including the resources owned by Node_A of course).
So when the failover occurs and SQL group will move to Node B but will be kept on monitored by cluster agent present on Node_A itself. And detecting a node change for SQL group, it will fire the Node_Changed situation.
You can try this practically too by manually doing a failover of the SQL group and keeping the cluster agent service in separate group.
We should be having cluster agent installed on the other nodes of cluster as well (Passive Nodes) and agent on passive node will be kept as stopped. Having cluster agent installed only on the active
node will defeat the purpose of having a cluster environment and of course cluster will not allow us to add such service as a service resource in cluster.
So to conclude,
1. Install the cluster agent on both nodes of the cluster.
2. Create separate resource group for cluster agent service.
3. Keep the agent on active node as up and running and agent on the passive node as stopped so that in case if active node crashes, agent service can be failed over and agent on the other node can take over.
4. There is no need to keep agents running on both nodes as agent on active node will monitor the resources present on active node as well as resources (if any) present on passive node.