What are the requirements for networked storage for multi-instance queue managers? Which environments have IBM used to test multi-instance queue managers?
This document is not a support statement. The support statement can be found in IBM MQ's support position on Virtualization, low-level hardware, file systems on networks and high availability. This document defines testing that IBM has conducted on network file systems for use with the IBM MQ multi-instance queue manager feature.
To validate an environment which IBM has not tested, please follow the guidance in Testing a shared file system for compatibility with WebSphere MQ Multi-instance Queue Managers.
To use the multi-instance queue manager feature of IBM MQ, you will need a shared file system on networked storage, such as a NAS, or a cluster file system, such as IBM's General Parallel File System (GPFS). You can use a SAN as the storage infrastructure for the shared file system.
It can be advantageous to use a cluster file system, such as GPFS, in preference to a standard network file system, such as NFS. Cluster file systems differ in that both the server and client parts of the solution are usually provided by the same vendor, often making problem diagnosis and resolution quicker.
There are three fundamental requirements that a shared file system must meet to work reliably with IBM MQ:
- Data write integrity. Data write integrity is sometimes called "Write through to disk on flush". The queue manager must be able to synchronize with data being successfully committed to the physical device. In a transactional system, you need to be sure that some writes have been safely committed before continuing with other processing and that the ordering of writes across multiple files is honored.
- Guaranteed exclusive access to files. In order to synchronize multiple queue managers, there needs to be a mechanism for a queue manager to obtain an exclusive lock on a file.
- Release locks on failure. If a queue manager fails, or if there is a communication failure with the file system, files locked by the queue manager need to be unlocked and made available to other processes without waiting for the queue manager to be reconnected to the file system. Modern file systems, such as NFS v4, use leased locks to detect failures and then release locks following a failure. Older file systems, such as NFS v3 that do not have a reliable mechanism to release locks after a failure, must not be used with multi-instance queue managers.
If a shared file system does not meet these requirements, the queue manager data and logs might get corrupted when using the shared file system in a multi-instance queue manager configuration. This might result in a failure to start IBM MQ, and possible data loss.
On operating systems other than Microsoft Windows, IBM MQ provides a tool called amqmfsck to assist with checking the suitability of networked storage for use with multi-instance queue managers. This can be used to verify the basic configuration of the networked storage, such as access permissions. It can also assist with the second and third of the requirements above. It cannot check that data write integrity is maintained because it cannot observe whether data is being safely committed to disk as opposed to being held in a cache.
Multi-instance queue managers do not work with mandatory file locking. The NFS support provided by some NAS devices enforces mandatory file locking. Although this is permitted by the NFS v4 specification, multi-instance queue managers were designed to use the less restrictive advisory file locking scheme and are not compatible with mandatory file locking. IBM has encountered mandatory file locking only with NAS devices from the EMC Celerra family. Please note that the version of amqmfsck supplied with IBM MQ v7.0.1 does not test for mandatory file locking, although later versions do.
The following file systems are known not to work as they do not meet IBM MQ's technical requirements:
- Network File System (NFS) version 3 - does not provide lease-based file locking.
- Red Hat Global File System (GFS, or GFS1) - does not provide the correct locking semantics.
- Oracle Cluster File System version 2 (OCFS2) - does not provide the correct locking semantics in version 1.4.
- Oracle ASM Cluster File System (ACFS) - does not provide the correct locking semantics.
The following file systems have been tested by IBM MQ and have been found to meet IBM MQ's technical requirements:
- IBM AIX 5.3 TL10 NFS v4 server12345
- IBM General Parallel File System 3.2.1
- IBM General Parallel File System 3.4.0
- IBM i5/OS NetServer V6R1
- IBM System Storage N series Data ONTAP 7.3.2 NFS v4 server12345
- Microsoft Windows 86
- Microsoft Windows Server 20086
- Microsoft Windows Server 20126
- Red Hat Enterprise Linux 5.3 NFS v4 server12345
- Red Hat Enterprise Linux 6.5 NFS v4 server12345
- Red Hat Global File System 2 (GFS2)
- SUSE Linux Enterprise Server 10 NFS v4 server12345
- Veritas Storage Foundation V5.0 MP3 RP3 Cluster File System
- Veritas Storage Foundation V5.1 SP1 Cluster File System
- Multi-instance queue managers on IBM AIX 5.3 TL6 to TL9 using NFS v4 require AIX APAR IZ29559 (or equivalent for the specific technology level).
- NFS v4 has been found not to work with IBM i.
- NFS v4 was tested using the following mount options 'rw,bg,hard,intr,vers=4,sec=sys' and the following export options 'rw,sync,no_wdelay,fsid=0'.
- SUSE Linux Enterprise Server V10 Update 3 introduced a suspected problem in the NFS v4 server which prevents correct operation of multi-instance queue managers. The problem was rectified in kernel level 22.214.171.124-0.60.1.
- Multi-instance queue managers on Solaris using NFS v4 require Solaris 10 with patch 147440-13 (SPARC) or patch 147441-13 (x86-64). This patch supersedes IDR 145513 revision 3 and patch 147268-01 (SPARC), and IDR 145514 revision 3 and patch 147269-01 (x86-64) which are no longer supported.
- If using a Windows cluster file system, a failover of the cluster file system will trigger an IBM MQ multi-instance queue manager failover as Windows will return a file system error to IBM MQ. To avoid this use SMB 3.0 or later with the 'Continuous Availability' option. SMB 3.0 became available in Microsoft Windows Server 2012.