IBM Support

IBM Tivoli Storage Manager for Space Management (HSM) V6.3 known problems

Question & Answer


Question

This document describes warnings and known problems for the Tivoli Storage Manager for Space Management (HSM) V6.3.

Answer

Tivoli Storage Manager for Space Management known problems and limitations



Contents


HSM warnings
  • Do not run dsmmigundelete when a target replication TSM server is used for this activity. See APAR IC94316 for additional information.
  • Do not copy premigrated or stub files to a new or different file system by using block level commands (for example dd, cpio, mksysb, etc). Instead, use those commands provided by TSM. Using block level commands can lead to changed file attributes. This in turn can cause that reconciliation fails to expire obsolete migrated files from the TSM server storage.
  • If a volume group containing HSM file systems is imported to an AIX system, make sure that the major number of the device does NOT change. Otherwise the handles of the files in the HSM file systems will change, leading to the inability to expire obsolete copies.
  • Do not use the NFSTIMEOUT option with HSM. Using the NFSTIMEOUT option on a system with HSM can lead to unpredictable behavior of the HSM applications, which includes applications stopping unexpectedly. The NFSTIMEOUT option is used for backing up NFS file systems, which cannot be managed by HSM. If this option is required for the backup-archive client, then it is recommended that you use two different server stanzas for the backup-archive client and HSM.
  • The HSM root daemon (dsmrootd) uses the registered rpc program number 300767 (decimal). Do not use this rpc program number for other applications.
  • The HSM recall daemon (dsmrecalld) uses the registered rpc program number 300781 (decimal). Do not use this rpc program number for other applications.
  • The HSM boot time script sets the file size resource limit to unlimited before starting HSM daemons.

Back to Contents

HSM GPFS warnings
  • When accessing the HSM managed file system via NFS, set the dmapiEventTimeout option of GPFS to a finite value of some seconds. The option is changed using mmchconfig, for example:

  •  /usr/lpp/mmfs/bin/mmchconfig dmapiEventTimeout=1000

    Refer to the GPFS documentation for more information.

  • HSM adds the call of /etc/rc.gpfshsm to the /var/mmfs/etc/gpfsready script. This script might be changed back with each GPFS update. Be sure to add the call of /etc/rc.gpfshsm again to /var/mmfs/etc/gpfsready. For more information refer to the installation log and "GPFS: Concepts, Planning, and Installation Guide".
  • If failover is enabled on the local node, failover is triggered in the following cases:
    • GPFS shutdown or failure
    • Reboot

    The success of failover depends on whether there is an eligible node for taking over the file systems from the failing node. This means that another node in the same cluster must have:
    • Failover enabled
    • A synchronous time on all HSM nodes
    • GPFS running
    • Mounted the file systems the failing node managed before. The same HSM package must be installed on the failing node and on the node that should take over the file systems.
  • Before you uninstall HSM on a node where it still manages a file system, move the file system's ownership to another HSM node in the same cluster, for example, via using the dsmmigfs takeover command.
  • HSM only manages file systems belonging to the local (home) GPFS cluster, but not remotely mounted file systems.
  • If you are using CSM/CFM for maintaining /etc/inittab, be sure to add the dsmwatchd's entry also to the file in the file server repository. Refer to the CSM manual for more information.
  • In order to prevent problems with handling of ENOSPC condition, use the following configuration of the GPFS file systems:
    • Configure at least one disk for metadataOnly
    • Configure all remaining disks for dataAndMetadata
  • AIX GPFS HSM and Linux GPFS HSM have the capability to distribute recall requests over several cluster nodes. To enable this feature the following requirements must be met:
    • The recall daemons must run on every cluster node which takes part on distributed recall.
    • All nodes need to be able to access the same filespace on the TSM server. This can be achieved in two ways:

    • 1. All nodes are using the same node name.
      2. All nodes are using their own node name and using the asnodename option to get access to the filespace.

      Recommendation: Use option 2 to avoid the necessity of sharing the same password for the shared node name on all nodes.

    • The option files should be identical on all nodes with the only exception regarding the node name option as explained before.
  • HSM can no longer support stub size as a multiple of the file system fragment size. Specify a value of zero or a multiple of the file system block size. The default is zero.

Back to Contents

HSM known problems and limitations
  • When HSM logging is activated and the hsm log file is located on a GPFS file system and the file is removed or it's content is deleted, subsequent log entries can be corrupted. This is a GPFS issue. Choose another file system type for the log file (for example ext3, ext4 or jfs2).
  • Problems with viewing managed file systems

  • Managed file systems might not be viewable using the dsmdf command after moving from Tivoli Storage Manager Version 6.3 to 5.5 or a lower version.

    If Tivoli Storage Manager Version 6.3 is installed and file systems are set up to be managed at this level, these file systems need to be removed from space management before installing Tivoli Storage Manager at versions 5.5.x. If space management is not removed before uninstalling TSM V6.3, these files systems are not viewable via the dsmdf command at the lower versions of TSM.

    If file systems are created and space managed at TSM version 5.5.x, these file systems continue to be viewable via the dsmdf command even if you move to TSM Version 6.3 and then return to a prior version. If changes are made to these managed file systems with TSM Version 6.3 (for example, new values are used for high and low threshold or for another TSM server), then these changes are not viewable with TSM version 5.5.x when you return to this prior level.

    Any file systems that are created and added to space management while at the TSM version 6.3 level will have to be removed from space management before moving to the prior TSM version.

  • If, for any reason, it is required to kill a running HSM process (e.g. dsmrecalld, dsmmonitord, dsmscoutd), use the command:

  • kill -15 pid
    NEVER kill an HSM process with signal 9 (SIGKILL). This can result in corrupting files that are being migrated or recalled at that moment. This also applies to the dsmrecalld, dsmmonitord, and dsmscoutd daemons.

    Killing a running dsmmigrate or dsmrecall process with -9 (SIGKILL) leads to an invalid state for the referred files and to problems related to the migration or recall of those files. A restart of dsmrecalld will fix that problem. The dsmrecalld will clean up the state of those files. An interrupted migration-corrupted file will be set to resident and an interrupted recall-corrupted file will be set to migrated.

  • When the scout daemon is scanning the file system, information about files found is stored in the CFI database. The more files have been scanned, the more information is stored. Therefore more time is needed to process stored data when updating information about newly found files. As a result, the average scan rate might decrease during the scan.
  • In its current implementation, the reconciliation process, including orphan check, is a long running task. It requires a considerable amount of main memory for a file system containing several millions of files in combination with several million objects migrated to the TSM server. The procedures of removing a file system, full file system recovery by the Backup-Archive Client or initiating a dsmmigundelete will start a run of an orphan check on a file system for HSM levels lower than 6.1. For HSM 6.1 and higher, removing space management from a file system will start a run of an orphan check.
  • All files with ACL data attached can be migrated, but the ACLs will not be written back during recall. This might result in the loss of ACL data if a stub file has been deleted and later recreated using the dsmmigundelete command. If a backup copy of the of the stub file exists on the TSM server, use the dsmc restore command. This will copy the file data and ACL data back to the local file system.
  • If ACL data of a premigrated file are modified, the changes are not written to the TSM server if the file is migrated after this change. To avoid losing the modified ACL data, use the MIGREQUIRESBACKUP YES option. This setting will not allow the migration of files whose ACL data have been modified and no current backup version exists on the server.
  • The "Space Management Agent" (hsmagent) does not start if DSM_DIR is set and no link pointing to the hsmagent.opt configuration file was set.
  • Automigration does not migrate hidden files or directories. Hidden files or directories are files or directories that begin with a ".".
  • Automigration does not migrate files if the file names exceed an internal TSM limit. The maximum combined length of the file name and path name is 1024 characters.
  • Files created during 20 the seconds preceding an out-of-space condition are not available for demand automigration.
  • The CFI (Complete File Index) file of the dsmscoutd cannot be created if the ulimit setting prevents the creation of large files.
  • In rare cases, the output of the dsmscoutd scanplan shows negative values. To solve this problem, restart the scout daemon with the command 'dsmscoutd restart'.
  • When running the "dsmmigundelete" command for a file system that does not exist or is not managed by HSM, no error message is issued.
  • HSM Java GUI known problems and limitations
    • Some JRE versions have problems transferring the focus to the GUI components depending on the version of the operating system you are using. For example, accessing shortcuts on a menu (for example, by pressing ALT-F to open the file menu) and combo-boxes in a modal dialog by clicking the mouse button might not get the focus correctly. To solve this problem, you need to transfer the focus manually to that component by pressing the TAB key (or CTRL-TAB) several times.
    • If you have set the DSM_DIR environment variable, you need to create a link in DSM_DIR pointing to the hsmagent.opt XML configuration file, otherwise the "Space Management Agent" will not start correctly. For example:
         ln -s /opt/tivoli/tsm/client/hsm/bin/hsmagent.opt  \
                $DSM_DIR/hsmagent.opt
     
    • Changes to the hsmagent.opt file will be effective only after the hsmagent is restarted.
    • "IBM Tivoli Enterprise Space Management Console" needs to be disconnected from all the HSM nodes if you change dsm.opt or dsm.sys configuration files, otherwise the previous configuration will be used.
    • Adding Space Management to a file system might result in showing a yellow warning icon in the "state" column of the File System table regardless of whether the Master scout daemon is running. The "Space Management Agent" requires some time to update this information.
    • The "Stub File Size" option is disabled with this version. Please use the command line as workaround to change the size of the stub file that replaces a migrated file in a HSM managed file system.
    • The "Import Setting" menu item is disabled with this version. Customized settings like the list of HSM nodes, and other table view customization cannot be imported from other machines.
    • Secure Sockets Layer (SSL) is not yet supported on this version.
    • Status and warning messages for Monitor, Recall, Watch and Root daemons are not yet implemented.
    • The functionality to change the TSM password is not yet implemented.
    • TSM HSM Node properties cannot yet be modified from the GUI.
    • It is not yet possible to register a new node on the TSM server.
    • It is not yet possible to start "IBM Tivoli Enterprise Space Management Console" remotely from a browser by using the "Space Management Agent".
    • To display the "IBM Tivoli Enterprise Space Management Console" in the system default language (if it is not English), install the related language pack as mentioned in the "Installing the product" section.
    • The non-root user should set the TSM system environment variable DSM_LOG to a directory with write permission (e.g. export DSM_LOG=/home/) and verify that the non-root user has write permission to the configuration and log files (dsmsm.cfg and dsmsm.log).
    • Russian, Czech, Hungarian and Polish are not localized in the JRE by Sun. For this reason, some system messages such as "Ok" and "Cancel" might be displayed in English regardless of whether the language pack is installed.
    • Some NLS messages (uil_nls.jar) are not provided for Russian, Czech, Hungarian and Polish languages. Messages that are affected are located in the status bar and by right-clicking (filter and sorter) the header columns of the HSM Nodes table and File System table.
    • Czech, Hungarian and Polish languages requires the following font: -dt-*-medium-r-normal-*-15-100-100-100-m-70-iso8859-2
    • Russian language requires the following font: -dt-*-medium-r-normal-*-15-100-100-100-m-70-iso8859-5
    • The help link "TSM for Space Management Technical Support Home Page" in the "Help" menu can result in the "Problem with Shortcut" Windows dialog message. This is a url.dll problem detected on Windows XP SP1 only. All other supported Windows versions are not affected by this problem. As a workaround for Windows XP SP1, open the following link directly from your browser: http://www.ibm.com/software/sysmgmt/products/support/IBMTivoliStorageManagerforSpaceManagement.html
Back to Contents

HSM GPFS known problems and limitations
  • If an application reads a file that is currently being migrated, the application might only read zeros instead of the real data. This happens rarely when the reading of the file occurs exactly at the point when the migration process starts stubbing the file. In this situation, the file is not corrupted nor is the data lost. It is a synchronization problem where a read does not trigger a recall of the migrated file. A second read to the file recalls it and the correct data is presented to the application. The problem will be fixed by GPFS in a future release.
  • When a file system is full, demand migration occurs. Valid files in the file system are migrated and free space is available. However, a GPFS limitation results in an error message like 'No space left on device'.
  • In a GPFS environment, the migration of a small file that is less than or equal to two GPFS fragments will not free up space. Migration of a file that is less than or equal to a GPFS fragment will lead to an increase in the file system space usage. This is due to GPFS allocating at least two GPFS fragments for storing HSM migration attributes. As a result, if a file system is filled to its maximum capacity with many small files, it is possible that the file system will run out of space during file migration. As a workaround, set the MINMIGFILESIZE option to two GPFS fragments. You can use the GPFS command "mmlsfs" to view the GPFS fragment size. For example: "mmlsfs /dev/gpfs -f".
  • After uninstallation and installation of HSM you might observe problems with mounting the HSM file systems. If this happens, you must reboot the GPFS cluster.
  • In case of initiating a takeover, make sure that the running jobs regarding the involved file system have been stopped.
  • Newly created files on large GPFS file systems might fail during migration with the following warning message: 'ANS9288W File: <filename> of size 0 is too small to qualify for migration.' The reason is that GPFS has not committed the file changes on disk until that moment. Retry the migration attempt a few minutes later.
  • If you are going to change the HSMDISABLEAUTOMIGDAEMONS option setting upon HSM installation, you must restart the running dsmwatchd daemon process via terminating it ("kill -15 dsmwatchd-pid"), so that it can re-read the new option value during its next startup by the init process.
  • If the primary or secondary GPFS cluster data repository server fails, HSM cannot properly update its failover information. The 'dsmmigfs q -f' command might show incorrect information after an HSM node failure.
  • Running dsmls or dsmdu on remotely mounted GPFS file systems does not show correct values.
  • Streaming recall with the options: MAXRECALLD=2 and MINRECALLD=2 leads to normal recall.
  • When issuing the "dsmmigundelete" command on a node that is not currently managing a file system, the following error is returned:

  • ANS9085E dsmmigundelete: file system <FILE SYSTEM NAME> is not managed by space management.

Back to Contents

HSM AIX GPFS known problems and limitations
  • For files with a size between 4 and 8 KB, dsmmigrate fails with ANS9523E, ANS9999E. Files greater than 8 KB can be migrated.
  • The restore of stub files does not work with multiple storage pools, or with files that have ACLs.
  • Filesets are not supported.
  • Unlink of Filesets is not allowed.
  • The ctime option of GPFS should be set to no (default) to prevent unwanted backups of files with the backup-archive client after GPFS file migration from pool to pool.
  • If migrated files having streaming mode or partial file recall activated are deleted and restored afterwards (by using the commands "dsmc rest", "dsmc retr" or "dsmmigundelete"), the files will get the normal recall mode.
  • During backup of a migrated file with ACLs, the file is recalled and is in the premigrated state afterwards.
  • If you need to delete and re-add an HSM node from the GPFS cluster, the GPFS node number might change (use 'mmlscluster' for verification). In this case, HSM node numbering must be adjusted by executing the following procedure:

  •   -- dsmmigfs stop

    The administrator must wait until all HSM daemons, except the dsmwatchd, to have stopped, before proceeding to the next step.

      -- rm /etc/adsm/SpaceMan/config/instance
     -- kill -15 <DSMWATCHD PID>

    The administrator must wait until the dsmwatchd has been restarted by the system, before proceeding to the next step.

      -- dsmmigfs start

    If there are file systems that have been managed by this node previously, issue the following command:

      -- dsmmigfs takeover </fs>  (for each of these file systems)
     

    Afterwards you will see two entries for this node when executing 'dsmmigfs q -f'. One with the old number and failover deactivated and one for the new node number. If you observe problems with mounting HSM managed file systems afterwards, you must reboot the GPFS cluster.

  • In case HSM is globally deactivated on the owner node (GPFS HSM environment with at least 2 nodes), a recall attempt of a migrated file from another node will produce the following message:

  • "ANS4007E Error processing 'xxx': access to the object is denied"
     

    In order to correctly recall the migrated files, HSM must first be globally reactivated on the owner node. Note that the same ANS4007E message is produced also in other cases (when the HSM environment is not correctly set up or a recall fails due to other reasons).


Back to Contents

HSM AIX JFS2 known problems and limitations
  • Tivoli Storage Manager for Space Management (HSM) does not support local automounts for directories contained in HSM managed file systems by using the "autofs" file system type introduced with AIX 4.3. The workaround is to set the environment variable COMPAT_AUTOMOUNT to 1 and restart the automounter.
  • Threshold migration might migrate slightly below the low threshold limit.
  • Threshold values of Ht=Lt=100% and Ht=Lt=0% should not be defined since threshold migration will not be reliable with these settings.
  • If stub file sizes are larger then the size of the original file sizes and these files are manually recalled using the dsmrecall command, these files will be recalled to resident state. Note: Stub files sizes can only be larger then the original file sizes if the option MINMIGFILESIZE is set with a value smaller than the value of the stub size.
  • After a HACMP failover, restart the recall daemon. The best solution is to integrate the restart of the recall daemon in the start_HSM script (see the Space Management for UNIX and Linux User's Guide) after the import of your file systems.
      ...
     `echo "Starting to import FS.." >> $LOG`;
     for(my $i=0; $i < @FS; $i++){
       `dsmmigfs import $FS[$i] 2>&1 1>> $LOG`;
     }

    `echo "Killing dsmrecalld ..." >> $LOG`;
    `kill -15 \$(ps -aef|grep dsmrecalld |grep -v grep \
            |awk '{print \$2}') 2>&1 1>> $LOG`;
    `sleep 5`;
    `echo "Starting dsmrecalld ..." >> $LOG`;
    `echo \$DSM_DIR \$DSM_CONFIG >> $LOG; dsmrecalld 2>&1 1>> $LOG`;
    ...

Back to Contents




HSM Linux warnings
  • Do not update HSM using "rpm -U" or "rpm -F". Follow the update procedure documented in the HSM manual instead.
  • All HSM nodes must have the same OS level and HSM version installed. You can have AIX 6.1, SLES 10 , etc., nodes in the same cluster. But if you want to install HSM on several nodes within one cluster, you must do this on a subset of nodes, with the same OS level and with the same HSM version. Thus you can install HSM on three AIX 6.1 nodes, or you can install HSM on three SLES 10 nodes, but you cannot install HSM on one AIX 6.1 node and one SLES 10 node belonging to the same cluster.
  • GPFS limitations for parallel processes (for example different recall processes at the same time):
    • To prevent problems with limited GPFS resources, increase the number of dmapiWorkerThreads and worker1Threads.
  • Setting the DSM_LOG environment variable on Linux86 does not work with /etc/environment on SuSE and RedHat. Use the correlating configuration file on Linux86 to set the environment for HSM daemons.

Back to Contents




HSM Linux problems and limitations
  • When SELINUX is used in enforced or permissive mode the restore of stub files does not work. These files will be restored to resident state or in other words they will be restored completely.
  • The Java GUI supports Japanese, Korean, Simplified and Traditional Chinese. However, Java supports these encodings only on some Linux platforms. For more information see http://java.sun.com/javase/6/webnotes/install/locales.html#jfc-table for JRE 1.6. If you would like to display the Java GUI on Linux for other languages, you can modify the font configuration as described in this document: http://java.sun.com/javase/6/docs/technotes/guides/intl/fontconfig.html for JRE 1.6.
  • When GPFS 3.4 is used HSM cannot be run with SELinux enabled in enforced mode. In the enforced mode, HSM commands will fail with the error message: "Cannot restore segment prot after reloc: Permission denied". This error is generated by the Linux dynamic linker code, due to the fact that GPFS does not support SELinux enabled in enforcement mode. - HSM on GPFS 3.5 is not affected by this restriction.
  • If you shut down GPFS on a node in your GPFS cluster where HSM daemons are active, then GPFS will not be able to unload all kernel modules on this specific node. This leads to DMAPI session problems with HSM daemons and it can cause possible issues if the GPFS level is upgraded on that node. The only way to avoid this behavior is to stop all running HSM daemons before shutting down that GPFS node. The recommendation is to uninstall HSM on the GPFS node where the upgrade shall take place. If no GPFS node upgrade is planned, then this issue does not affect HSM operations. Non-GPFS systems are not affected.
  • When using command-line help, the Table 3 description for the dmkilld command should state:

  •    Only valid on AIX GPFS and Linux x86/x86_64 GPFS
      Stops the master recall daemon and all of its children, and interrupts all active recalls.
  • The restore of stub files does not work with files that have ACLs.
  • Filesets are not supported.
  • Unlink of Filesets is not allowed.
  • The ctime option of GPFS should be set to no (default) to prevent unwanted backups of files with the backup-archive client after GPFS file migration from pool to pool.
  • Some severe performance problems have been observed regarding HSM non-root user support.
  • During error/recovery, HSM daemons log information into the dsmerror.log file. This file could grow large causing the root file system to run out of space. You can prevent this by using a separate file system (i.e. /tmp) and redirecting dsmerror.log to that file system by setting the DSM_LOG environment variable.
  • In case of a GPFS restart, sometimes GPFS kernel modules are not unloaded. This leads to DMAPI session problems with HSM daemons.
  • Ensure that you do not unmount HSM managed GPFS file systems if they are busy.
  • It is possible to set the recall mode to migonclose or readwithoutrecall using the command dsmattr. Although dsmls will displayed these modes, they are ignored.
  • If you need to delete and re-add an HSM node from the GPFS cluster, the GPFS node number might change (use 'mmlscluster' for verification). In this case the HSM node numbering must be adjusted by executing the following procedure:

  •   -- dsmmigfs stop
     -- rm /etc/adsm/SpaceMan/config/instance
     -- kill -15 <DSMWATCHD PID>
     -- dsmmigfs start

    If there are any file systems that have been managed by this node previously, execute:


        -- dsmmigfs takeover </fs> (for each of these file systems)

    Afterwards you will observe two entries for this node when executing 'dsmmigfs q -f'. One with the old number and failover deactivated and one for the new node number. If you observe problems mounting HSM managed file systems afterwards, you need to reboot the GPFS cluster.
  • In case HSM is globally deactivated on the owner node (GPFS HSM environment with at least 2 nodes), recall attempts of migrated files from another node will produce the following message : "ANS4007E Error processing 'xxx': access to the object is denied". In order to correctly recall the migrated files, HSM must be first globally reactivated on the owner node. Please note that the same ANS4007E message is produced also in other cases ( when the HSM environment is not correctly set up or recall fails due to other reasons).

Back to Contents

[{"Product":{"code":"SSSR2R","label":"Tivoli Storage Manager for Space Management"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"--","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF016","label":"Linux"}],"Version":"6.3","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
17 June 2018

UID

swg21508282