Requirements for shared file systems

Shared files systems must provide data write integrity, guaranteed exclusive access to files and release locks on failure to work reliably with IBM® WebSphere® MQ.

Requirements that a shared file system must meet

There are three fundamental requirements that a shared file system must meet to log messages reliably:
  1. Data write integrity

    Data write integrity is sometimes called Write through to disk on flush. The queue manager must be able to synchronize with data being successfully committed to the physical device. In a transactional system, you need to be sure that some writes have been safely committed before continuing with other processing.

    More specifically, IBM WebSphere MQ on UNIX platforms uses the O_SYNC open option and the fsync() system call to explicitly force writes to recoverable media, and is dependent upon these options operating correctly.

    LinuxAttention: You should mount the file system with the async option, which still supports the option of synchronous writes and gives better performance than the sync option.

    Note, however, that if the file system has been exported from Linux®, you must still export the file system using the sync option.

  2. Guaranteed exclusive access to files

    In order to synchronize multiple queue managers, there needs to be a mechanism for a queue manager to obtain an exclusive lock on a file.

  3. Release locks on failure

    If a queue manager fails, or if there is a communication failure with the file system, files locked by the queue manager need to be unlocked and made available to other processes without waiting for the queue manager to be reconnected to the file system.

A shared file system must meet these requirements for IBM WebSphere MQ to operate reliably. If it does not, the queue manager data and logs get corrupted when using the shared file system in a multi-instance queue manager configuration.

For multi-instance queue managers on Microsoft Windows, the networked storage must be accessed by the Common Internet File System (CIFS) protocol used by Microsoft Windows networks. The Common Internet File System (CIFS) client does not meet IBM WebSphere MQ's requirements for locking semantics on platforms other than Microsoft Windows, so multi-instance queue managers running on platforms other than Microsoft Windows must not use Common Internet File System (CIFS) as their shared file system.

For multi-instance queue managers on other supported platforms, the storage must be accessed by a network file system protocol which is Posix-compliant and supports lease-based locking. Modern file systems, such as Network File System (NFS) Version 4, use leased locks to detect failures and then release locks following a failure. Older file systems such as Network File System Version 3, which do not have a reliable mechanism to release locks after a failure, must not be used with multi-instance queue managers.

Checks on whether the shared file system meets the requirements

You must check whether the shared file system you plan to use meets these requirements. You must also check whether the file system is correctly configured for reliability. Shared file systems sometimes provide configuration options to improve performance at the expense of reliability.

Under normal circumstances IBM WebSphere MQ operates correctly with attribute caching and it is not necessary to disable caching, for example by setting NOAC on an NFS mount. Attribute caching can cause issues when multiple file system clients are contending for write access to the same file on the file system server, as the cached attributes used by each client might not be the same as those attributes on the server. An example of files accessed in this way are queue manager error logs for a multi-instance queue manager. The queue manager error logs might be written to by both an active and a standby queue manager instance and cached file attributes might cause the error logs to grow larger than expected, before rollover of the files occurs.

To help to check the file system, run the task Verifying shared file system behavior. This task checks if your shared file system meets requirements 2 and 3. You need to verify requirement 1 in your shared file system documentation, or by experimenting with logging data to the disk.

Disk faults can cause errors when writing to disk, which IBM WebSphere MQ reports as First Failure Data Capture errors. You can run the file system checker for your operating system to check the shared file system for any disk faults. For example, on UNIX and Linux platforms the file system checker is called fsck. On Windows platforms the file system checker is called CHKDSK, or SCANDISK.

NFS server security

Note: You should put only queue manager data on a Network File System (NFS) server. On the NFS, use the following three options with the mount command to make the system secure:
noexec
By using this option, you stop binary files from being run on the NFS, which prevents a remote user from running unwanted code on the system.
nosuid
By using this option, you prevent the use of the set-user-identifier and set-group-identifier bits, which prevents a remote user from gaining higher privileges.
nodev
By using this option, you stop character and block special devices from being used or defined, which prevents a remote user from getting out of a chroot jail.