Phase two: Choosing and scheduling files

In the second phase of the mmapplypolicy job, some or all of the candidate files are chosen.

Chosen files are scheduled for migration or deletion, taking into account the weights and thresholds determined in Phase one: Selecting candidate files, as well as the actual pool occupancy percentages. Generally, candidates with higher weights are chosen ahead of those with lower weights.

File migrations to and from external pools are done before migrations and deletions that involve only GPFS disk pools.

File migrations that do not target group pools are done before file migrations to group pools.

File migrations that target a group pool are done so that candidate files with higher weights are migrated to the more preferred GPFS disk pools within the group pool, but respecting the LIMITs specified in the group pool definition.

The following two options can be used to adjust the method by which candidates are chosen:
--choice-algorithm {best | exact | fast}
Specifies one of the following types of algorithms that the policy engine is to use when selecting candidate files:
best
Chooses the optimal method based on the rest of the input parameters.
exact
Sorts all of the candidate files completely by weight, then serially considers each file from highest weight to lowest weight, choosing feasible candidates for migration, deletion, or listing according to any applicable rule LIMITs and current storage-pool occupancy. This is the default.
fast
Works together with the parallelized -g /shared-tmp -N node-list selection method. The fast choice method does not completely sort the candidates by weight. It uses a combination of statistical, heuristic, and parallel computing methods to favor higher weight candidate files over those of lower weight, but the set of chosen candidates may be somewhat different than those of the exact method, and the order in which the candidates are migrated, deleted, or listed is somewhat more random. The fast method uses statistics gathered during the policy evaluation phase. The fast choice method is especially fast when the collected statistics indicate that either all or none of the candidates are feasible.
--split-margin n.n
A floating-point number that specifies the percentage within which the fast-choice algorithm is allowed to deviate from the LIMIT and THRESHOLD targets specified by the policy rules. For example if you specified a THRESHOLD number of 80% and a split-margin value of 0.2, the fast-choice algorithm could finish choosing files when it reached 80.2%, or it might choose files that bring the occupancy down to 79.8%. A nonzero value for split-margin can greatly accelerate the execution of the fast-choice algorithm when there are many small files. The default is 0.2.

File grouping and the SIZE clause

When scheduling files, mmapplypolicy simply groups together either the next 100 files by default, or the number of files explicitly set using the -B option.

However, you can set up mmapplypolicy to schedule files so that each invocation of the InterfaceScript gets approximately the same amount of file data to process. To do so, use the SIZE clause of certain policy rules to specify that scheduling be based on the sum of the sizes of the files. The SIZE clause can be applied to the following rules (for details, see Policy rules):
  • DELETE
  • EXTERNAL LIST
  • EXTERNAL POOL
  • LIST
  • MIGRATE

Administrator-specified customized file grouping or aggregation

In addition to using the SIZE clause to control the amount of work passed to each invocation of a InterfaceScript, you can also specify that files with similar attributes be grouped or aggregated together during the scheduling phase. To do so, use an aggregator program to take a list of chosen candidate files, sort them according to certain attributes, and produce a reordered file list that can be passed as input to the user script.

You can accomplish this by following these steps:
  1. Run mmapplypolicy with the -I prepare option to produce a list of chosen candidate files, but not pass the list to a InterfaceScript.
  2. Use your aggregator program to sort the list of chosen candidate files into groups with similar attributes and write each group to a new, separate file list.
  3. Run mmapplypolicy with the -r option, specifying a set of file list files to be read. When invoked with the -r option, mmapplypolicy does not choose candidate files; rather, it passes the specified file lists as input to the InterfaceScript.
    Note: You can also use the -q option to specify that small groups of files are to be taken in round-robin fashion from the input file lists (for example, take a small group of files from x.list.A, then from x.list.B, then from x.list.C, then back to x.list.A, and so on, until all of the files have been processed).

    To prevent mmapplypolicy from redistributing the grouped files according to size, omit the SIZE clause from the appropriate policy rules and set the bunching parameter of the -B option to a very large value.

Reasons for candidates not to be chosen for deletion or migration

Generally, a candidate is not chosen for deletion from a pool, nor migration out of a pool, when the pool occupancy percentage falls below the LowPercentage value. Also, candidate files will not be chosen for migration into a target TO POOL when the target pool reaches the occupancy percentage specified by the LIMIT clause (or 99% if no LIMIT was explicitly specified by the applicable rule).

The limit clause does not apply when the target TO POOL is a group pool; the limits specified in the rule defining the target group pool govern the action of the MIGRATE rule. The policy-interpreting program (for example, mmapplypolicy) may issue a warning if a LIMIT clause appears in a rule whose target pool is a group pool.