The standard high availability (HA) architecture for the IBM Smart Analytics System 7700 uses a single RSCT domain to manage the resources in the solution. However, the maximum limit of 130 nodes for Tivoli SA MP can be addressed through the use of multiple RSCT domains.
The use of a single RSCT (IBM Reliable Scalable Cluster Technology) domain or multiple RSCT domains in the system is transparent to the operation of the core warehouse database. No changes are required in the DB2 software to support multiple domains. The HA resources managed within each domain have no cross dependencies. Therefore the maximum number of nodes for Tivoli SA MP does not limit the overall size of an IBM Smart Analytics System 7700 environment. The only limitation on size is that one database can contain a maximum of 1000 database partitions.
The HA configuration for the IBM Smart Analytics System contains Tivoli SA MP resources for managing the database partitions, the JFS2 file systems on shared external storage, the volume groups on shared external storage, and the service IPs. The HA configuration uses Tivoli SA MP equivalencies to monitor network interfaces, local SSD (solid-state device) file systems, and the instance home file system that is shared using IBM General Parallel File System (GPFS) software. Relationships between the components of the solution are used to define dependencies and the start order of resources.
Resources and equivalencies that are grouped together, that have defined interrelationships, or that are both grouped together and have defined interrelationships must be contained in a common RSCT domain. The smallest subset of nodes that have self-contained resources and equivalencies and relationships is an HA group.
Each HA group contains the following components:
- One internal application (FMC) network equivalency containing all of the interfaces from each host in the HA group
- One corporate network equivalency containing two or more interfaces from the host in the HA group, typically is defined only on the host where the coordinator partition runs.
- Eight SSD equivalencies containing one SSD file system from each host in the HA group
- One database instance home equivalency containing the /db2home resource from each host in the HA group
- A maximum of four host resource groups, one for each active host in the HA group. Each host resource group contains the following resources:
- One volume group resource group that contains:
- Eight volume group resources, one per database partition
- One database partition resource group that contains:
- One database partition resource
- Four or five JSF2 file system resources
- One corporate service IP resource, which is typically defined only on the coordinator partition. Optionally this might be defined for each host.
- One volume group resource group that contains:
- Each database partition resource depends on the internal application network equivalency
- Each database partition resource depends on an SSD equivalency
- Each database partition resource depends on the GPFS instance home equivalency
- Each database partition resource depends on the file systems in its database partition resource group
- JFS2 file system from each database partition resource group depend on the volume group resource group in the common DB2 host resource group
Since each HA group consists of a self-contained set of interdependent components with no external dependencies, each set of five hosts (four active, one standby) in an HA group can be managed in a separate RSCT domain. This illustrates that the smallest possible RSCT domain is an HA group, however a multiple RSCT domain solution would be defined using N x HA groups per domain where N is less than or equal to 26.
The maximum node limit for the GPFS cluster is 1530 nodes on AIX, therefore a single GPFS cluster can be used in the solution to support 1000 database partitions. The instance home file system is shared from the administrator node and its standby host to all other hosts in the solution.
Managing a system with multiple RSCT resource domains
With multiple RSCT domains, the Tivoli SA MP commands for starting, stopping, and moving DB2 resources must be issued to each domain. For a single domain configuration, a set of HA management tools are used that issue the relevant Tivoli SA MP commands for resource management.
For a multiple domain configuration, the HA toolkit can be extended to issue Tivoli SA MP commands to each relevant RSCT domain.
- hastartdb2: for each domain issue chrg –o online resource_groups
- hastopdb2: for each domain issue chrg –o offline resource_group
- hafailover: on a node in the domain containing the resources to be moved, issue rgreq –o move resource_group
- hafailback: on a node in the domain containing the resources to be moved, issue rgreq –o move resource_group
- hareset: on each domain containing resources that are to be reset, issue resetrsrc –s "Name = resource_name"
- hals: collect lssam output from each domain and summarize resource status
Start and stop procedures
When starting or stopping nodes in the cluster, a multiple RSCT domain configuration requires that the Tivoli SA MP resources to be stopped or started and the RSCT domain stop or start steps to be performed on each domain.