Managing auto-scaling and auto-recovery services

IBM® Cloud Manager with OpenStack leverages support for both auto-scaling and auto-recovery services.

About this task

These services are based on the orchestration capabilities that are provided by the OpenStack Heat service.
Auto-scaling
Auto-scaling (based on usage) gives you the ability to maintain application availability by increasing or decreasing the number of instances based on usage rates. The purpose of this function is to allow cloud application developers to deploy applications that respond to changes in demand for their applications and use resources only when needed. At a high level, here is a description of how the function works.
  • A collection or group of Nova instances that are called AutoScalingGroup is created. The definition of the AutoScalingGroup includes the details of the Nova instances that must be created within the group.
  • A Ceilometer alarm is created to monitor the average usage on a Ceilometer meter for all instances within the AutoScalingGroup. To enable scale up and scale down function, individual alarms must be created to monitor both scenarios.
  • A scalability policy is created to define the action that is performed when an alarm triggers. Individual policies must be created to support scale up and scale down actions.
  • When the average usage for all instances within the AutoScalingGroup violates the alarm threshold, the scalability policy that is associated with the alarm is called.
  • The scalability policy triggers the creation or deletion of instances within the group.
Auto-recovery
Auto-recovery is the ability to maintain the necessary compute capacity by automatically replacing damaged or unresponsive instances. Auto-recovery allows the cloud application developer to automatically start a new instance when predefined monitors indicate that something is amiss with a specific instance. These problems might include network connectivity issues, software issues on the physical host, hardware issues on the physical host or anything that prevents the instance to respond. At a high level, here is a description of how the function works.
  • Instances capable of auto recovery must be added to a neutron load balancer pool.
  • A load balancer health monitor is created to monitor the state of all instances in the pool.
  • An individual Ceilometer alarm must be created for each instance in the load balancer pool. To bridge the load balancer health monitor results with the Ceilometer alarm, the alarm is configured to use an existing Ceilometer meter that looks into the health’s monitor samples per instance.
  • For each instance, a Heat HARestarter resource is created to orchestrate the deletion of the sick instance and the creation of the replacement.
  • When an instance is determined to be inactive per the load balancer health monitor’s result, the Ceilometer alarm is triggered and the HARestarter action is called.
  • The HARestarter deletes the unresponsive virtual machine and creates a replacement for it. Other resources such as the alarm associated with the virtual machine and the PoolMember resource that is represented by the instance are also deleted and re-created again based on the new virtual machine.

To review an example Heat template file that uses auto-recovery, see Sample templates: Heat (auto-scaling and auto-recovery).

For more general information about managing these services (that are provided by Heat), see the following resources.