Policy rules: Examples and tips

Before you write and apply policies, consider the following advice.

You are advised to test your rules using the mmapplypolicy command with the -I test option and the -L 3 (or higher) option. This will help you understand which files are selected as candidates, and which candidates are chosen.

Do not apply a policy to an entire file system of vital files until you are confident that the rules correctly express your intentions. To test your rules, find or create a subdirectory with a modest number of files, some that you expect to be selected by your SQL policy rules and some that you expect will be skipped.

Then run the following command:
mmapplypolicy /TestSubdirectory  -L 6  -I test
The output will show you exactly which files are scanned, and which match rules or no rules. If a problem is not apparent, you can add a SHOW() clause to your rule or rules to examine the values of the file attributes of interest or the value of any SQL expression. To examine several, use the following:
SHOW('x1=' || varchar(Expression1) || ' x2='  || varchar(Expression2) || ... )
where ExpressionX is the SQL variable or expression of function that you suspect or do not understand. Beware that if any expression evaluates to SQL NULL, the entire show clause will be NULL, by the rules of SQL. One way to show null vs. non-null values is to define a macro and use it as in the following example:
define(DISPLAY_NULL,[CASE WHEN ($1) IS NULL THEN '_NULL_' ELSE varchar($1) END])

rule list a SHOW( 'x1=' || DISPLAY_NULL(xattr('user.marc')) || ' and  x2=' || DISPLAY_NULL(xattr('user.eric')))
Note: For examples and more information on the -L flag, see the topic The mmapplypolicy -L command in the IBM Spectrum Scale: Problem Determination Guide.
  1. Delete files from the storage pool named pool_1 that have not been accessed in the last 30 days, and are named like temporary files or appear in any directory that is named tmp:
    RULE 'del1' DELETE FROM POOL 'pool_1'
       WHERE (DAYS(CURRENT_TIMESTAMP) – DAYS(ACCESS_TIME) > 30)
            AND (lower(NAME) LIKE '%.tmp' OR PATH_NAME LIKE '%/tmp/%')
  2. Use the SQL LIKE predicate to test file names and path names:
    RULE '*/_*'  DELETE WHERE PATH_NAME LIKE '%/x_%' ESCAPE 'x'
    RULE '*XYZ*' DELETE WHERE NAME LIKE '%XYZ%'
    RULE '12_45' DELETE WHERE NAME LIKE '12x_45' ESCAPE 'x'
    RULE '12%45' DELETE WHERE NAME LIKE '12x%45' ESCAPE 'x'
    RULE '12?45' DELETE WHERE NAME LIKE '12_45'
    RULE '12*45' DELETE WHERE NAME LIKE '12%45'
    RULE '*_*'   DELETE WHERE NAME LIKE '%x_%' ESCAPE 'x'
    Where:
    • A percent % wildcard in the name represents zero or more characters.
    • An underscore (_) wildcard in the name represents one byte.

    Use the optional ESCAPE clause to establish an escape character, when you need to match '_' or '%' exactly.

  3. Use the SQL UPPER and LOWER functions to ignore case when testing names:
    RULE 'UPPER'  DELETE WHERE upper(PATH_NAME) LIKE '%/TMP/OLD/%' 
    RULE 'lower'  DELETE WHERE lower(PATH_NAME) LIKE '%/tmp/old/%' 
  4. Use the SQL SUBSTR or SUBSTRING functions to test a substring of a name:
    RULE 's1' DELETE WHERE SUBSTRING(NAME FROM 1 FOR 5)='XXXX-'
    RULE 's2' DELETE WHERE SUBSTR(NAME,1,5)='YYYY-'
  5. Use the SQL SUBSTR and LENGTH functions to test the suffix of a name:
    RULE 'sfx' DELETE WHERE SUBSTR(NAME,LENGTH(NAME)-3)='.tmp'
  6. Use a WHEN clause to restrict rule applicability to a particular day of the week:
    RULE 'D_SUN' WHEN (DayOfWeek(CURRENT_DATE)=1) /* Sunday */
      DELETE WHERE PATH_NAME LIKE '%/tmp/%'

    CURRENT_DATE is an SQL built in operand that returns the date portion of the CURRENT_TIMESTAMP value.

  7. Use the SQL IN operator to test several possibilities:
    RULE 'D_WEEKEND' WHEN (DayOfWeek(CURRENT_DATE) IN (7,1)) /*  Saturday or Sunday */
      DELETE WHERE PATH_NAME LIKE '%/tmp/%'

    For information on how to use a macro processor such as m4 to make reading and writing policy rules easier, see Using macro processing utilities with policy rules.

  8. Use a FILESET clause to restrict the rule to files within particular filesets:
    RULE 'fsrule1' MIGRATE TO POOL 'pool_2'
             FOR FILESET('root','fset1')

    In this example there is no FROM POOL clause, so regardless of their current storage pool placement, all files from the named filesets are subject to migration to storage pool pool_2.

  9. Use an EXCLUDE rule to exclude a set of files from all subsequent rules:
    RULE 'Xsuper' EXCLUDE WHERE USER_ID=0
    RULE 'mpg'    DELETE  WHERE lower(NAME) LIKE '%.mpg' AND FILE_SIZE>20123456
    Notes:
    1. Specify the EXCLUDE rule before rules that might match the files that are being excluded.
    2. You cannot define a list and what to exclude from the list in a single rule. You must define two LIST statements, one specifying which files will be in the list, and one specifying what to exclude from the list. For example, to exclude files containing the word test from the LIST rule allfiles, define the following:
      RULE EXTERNAL LIST 'allfiles' EXEC '/u/brownap/policy/CHE/exec.list'
      
      RULE 'exclude_allfiles' LIST 'allfiles' EXCLUDE where name like '%test%'
      
      RULE 'all' LIST 'allfiles' SHOW('misc_attr ='|| MISC_ATTRIBUTES || HEX(MISC_ATTRIBUTES)) \
        where name like '%'
  10. Use the SQL NOT operator with keywords, along with AND and OR:
    RULE 'D_WEEKDAY' WHEN (DayOfWeek(CURRENT_DATE) NOT IN (7,1)) /*  a weekday */
      DELETE WHERE (PATH_NAME LIKE '%/tmp/%' OR NAME LIKE '%.tmp')  
              AND (KB_ALLOCATED > 9999 AND NOT USER_ID=0)
  11. Use a REPLICATE clause to increase the availability of selected files:
    RULE 'R2' MIGRATE FROM POOL 'ypooly' TO POOL 'ypooly'
      REPLICATE(2) WHERE USER_ID=0

    Before increasing the data replication factor for any file, the file system must be configured to support data replication.

  12. The difference of two SQL Timestamp values may be compared to an SQL Interval value:
    rule 'a' migrate to pool 'A' where CURRENT_TIMESTAMP - MODIFICATION_TIME > INTERVAL  '10' DAYS
    rule 'b' migrate to pool 'B' where CURRENT_TIMESTAMP - MODIFICATION_TIME > INTERVAL  '10' HOURS
    rule 'c' migrate to pool 'C' where CURRENT_TIMESTAMP - MODIFICATION_TIME > INTERVAL  '10' MINUTES
    rule 'd' migrate to pool 'D' where CURRENT_TIMESTAMP - MODIFICATION_TIME > INTERVAL  '10' SECONDS

    For the best precision, use the INTERVAL...SECONDS construct.

  13. By carefully assigning both weights and thresholds, the administrator can formally express rules like this:

    If the storage pool named pool_X has an occupancy percentage above 90% now, bring the occupancy percentage of storage pool named pool_X down to 80% by migrating files that are three months or older to the storage pool named pool_ZZ. But, if you can find enough year-old files to bring the occupancy percentage down to 50%, do that also.

    RULE 'year-old' MIGRATE FROM POOL 'pool_X'
       THRESHOLD(90,50) WEIGHT(weight_expression) 
       TO POOL 'pool_ZZ'
       WHERE  DAYS(CURRENT_TIMESTAMP) – DAYS(ACCESS_TIME) > 365
    
    RULE '3month-old' MIGRATE FROM POOL 'pool_X'
       THRESHOLD(90,80) WEIGHT(weight_expression) 
       TO POOL 'pool_ZZ'
       WHERE  DAYS(CURRENT_TIMESTAMP) – DAYS(ACCESS_TIME) > 90
    More information about weights is available in the next example.
    A goal of this mmapplypolicy job is to reduce the occupancy percentage of the FROM POOL to the low occupancy percentage specified on the THRESHOLD clause, if possible. The mmapplypolicy job does not migrate or delete more files than are necessary to produce this occupancy percentage. The task consists of these steps:
    1. Each candidate file is assigned a weight.
    2. All candidate files are sorted by weight.
    3. The highest weight files are chosen to MIGRATE or DELETE until the low occupancy percentage is achieved, or there are no more candidates.
    The administrator who writes the rules must ensure that the computed weights are as intended, and that the comparisons are meaningful. This is similar to the TSM convention, where the weighting function for each file is determined by the equation:
    X * access_age + Y * file_size
    where:
    • access_age is DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME)
    • file_size is FILE_SIZE or KB_ALLOCATED
    • X and Y are weight factors chosen by the system administrator.
  14. The WEIGHT clause can be used to express ideas like this (stated informally):
    IF access_age > 365 days THEN weight = 100000 + access_age
    ELSE IF access_age < 30 days THEN weight = 0
    ELSE weight= KB_ALLOCATED
    This means:
    • Give a very large weight bias to any file older than a year.
    • Force the weight of any file younger than 30 days to 0.
    • Assign weights to all other files according to the number of kilobytes occupied.
    The formal SQL syntax is this:
    CASE
     WHEN DAYS(CURRENT_TIMESTAMP) – DAYS(ACCESS_TIME) > 365
      THEN 100000 + DAYS(CURRENT_TIMESTAMP) – DAYS(ACCESS_TIME)
     WHEN DAYS(CURRENT_TIMESTAMP) – DAYS(ACCESS_TIME) < 30
      THEN 0
     ELSE
      KB_ALLOCATED
    END
  15. The SHOW clause has no effect in matching files but can be used to define additional attributes to be exported with the candidate file lists. It may be used for any purpose but is primarily used to support file aggregation.

    To support aggregation, you can use the SHOW clause to output an aggregation value for each file selected by a rule. You can then output those values to a file list and input that list to an external program that groups the files into aggregates.

  16. If you have a large number of filesets against which to test, use the FILESET_NAME variable as shown in the following example:
    RULE 'x' SET POOL 'gold' WHERE FILESET_NAME LIKE 'xyz.%.xyz' 
    However, if you are testing against just a few filesets, you can use the FOR FILESET(’xyz1’, ’xyz2’) form instead.
  17. You may convert a time interval value to a number of seconds using SQL cast syntax; for example:
    define([toSeconds],[(($1) SECONDS(12,6))])
    
    define([toUnixSeconds],[toSeconds($1 - '1970-1-1@0:00')])
    
    RULE external list b
    RULE list b SHOW('sinceNow=' toSeconds(current_timestamp-modification_time) )
    RULE external list c
    RULE list c SHOW('sinceUnixEpoch=' toUnixSeconds(modification_time) )
    The following is also supported:
    define(access_age_in_days,( INTEGER(( (CURRENT_TIMESTAMP - ACCESS_TIME) SECONDS)) /(24*3600.0) ) )
    
    RULE external list w exec ''
    RULE list w weight(access_age_in_days) show(access_age_in_days)