IBM WebSphere Cast Iron appliances in High Availability (HA) configuration may rarely experience internal database corruption during a failover event. The affected databases are used to store projects, configurations, running jobs and job history.
The corruption may occur when an appliance is actively writing data to these databases at the time of HA failover. Typical behavior is that an HA appliance is unable to start or operate normally after an HA failover.
The workaround is to empty the database which has been affected. The following commands reset the internal databases, causing them to be re-created in an empty state.
system clean orchmon
system clean running
system clean deploy
Alternatively, the command system clean all empties all internal databases and returns the appliance to factory default settings. Note that running "system clean ..." commands results in permanently removing contents on the appliances. Care should be taken to ensure proper configuration backups are available for restoring after issuing "system clean ...".
The following stated workaround can be used to restore an affected HA appliance, but on its own does not prevent the situation from re-occurring. Cast Iron is providing a firmware fix to alter the way the internal databases are created which prevents the problem from re-occurring. The new settings only take effect when the internal databases are created. Therefore customers who have experienced this behavior in the past or wish to eliminate the possibility of it occurring should follow this procedure:
1. Back up the appliance:
- In the Command Line Interface (CLI), issue the command config save system ftp [ftphostname] user [username] passwd [userpassword] file [filename].cfg and replace values in square brackets () with custom values.
- Or in the Web Management Console (WMC), on the left click "Repository" and then "Import/Export" and choose to export "Project and user settings" to a file.
2. Upgrade to a version of the Cast Iron firmware which contains the fix. Announcements of new fix packs and releases are made public on the WebSphere Cast Iron Support Portal.
3. Clear the databases with one of following methods:
- Issue the command system clean all. Wait a few minutes for the operation to complete. The appliance will reboot when the reset is complete. This command will clear all settings, including network settings, and they will need to be reconfigured again.
- Issue the following commands: system clean orchmon, wait until you can log into the WMC again before issuing system clean running, wait until you can log into the WMC again before issuing system clean deploy. Wait until you can log into the WMC again before moving onto the next step.
4. Re-enable HA using the commands system haconfig enable ... on both appliances
5. Import the backed-up settings using the file from step 1:
- In th CLI, issue the command config load system ftp [ftphostname] user [username] passwd [userpassword] file [filename].cfg and replace values in square brackets () with custom values.
- Or in WMC navigate to "Repository" > "Import/Export" and import the file previously exported in step 1.