IBM Support

Readme and Release notes for release 3.5.1.8 (LL) IBM Tivoli Workload Scheduler LoadLeveler 3.5.1.8 LL-3.5.1.8-power-AIX Readme

Fix Readme


Abstract

xxx

Content

Readme file for: LL-3.5.1.8-power-AIX
Product/Component Release: 3.5.1.8
Update Name: LL-3.5.1.8-power-AIX
Fix ID: LL-3.5.1.8-power-AIX
Publication Date: 07 October 2010
Last modified date: 07 October 2010

Installation information

Download location

Below is a list of components, platforms, and file names that apply to this Readme file.

Fix Download for AIX

Product/Component Name: Platform: Fix:
(LL) IBM Tivoli Workload Scheduler LoadLeveler AIX 5.3
AIX 6.1
LL-3.5.1.8-power-AIX

Prerequisites and co-requisites

None

Known issues

  • - Known Issues

    For more known issues, please see the HPC Central wiki:

    [February 22, 2010]

    In TWS LoadLeveler 3.5.1.4, 4.1.0.2 and 4.1.0.3, jobs will not be started in a login shell. The environment in which the job runs may not be set as expected and the job may fail to run correctly.

    Workaround is to set the keyword in the job command file to COPY_ALL.

    For TWS LoadL 3.5.1.4 - Apply apar IZ70280 emergency fix package available from IBM service.

    For TWS LoadL 4.1.0.2 and 4.1.0.3 - Apply apar IZ70442 emergency fix package available from IBM service.

    [May 18, 2009]

    A coexistence issue was introduced in TWS LoadLeveler 3.5.0.5 which also affected TWS LoadLeveler 3.5.1.1. Jobs will not be able to run in a mixed cluster with TWS LoadLeveler 3.5.0.1 - 3.5.0.4 service levels with either TWS LoadLeveler 3.5.0.5 or TWS LoadLeveler 3.5.1.1.

    The coexistence problem introduced in TWS LoadLeveler 3.5.0.5 can not be corrected.

    The entire cluster will need to be migrated to either TWS LoadLeveler 3.5.0.5 or TWS LoadLeveler 3.5.1.1 at the same time.

    There is no coexistence issue between TWS LoadLeveler 3.5.0.5 and TWS LoadLeveler 3.5.1.1.

    [January 19, 2009]

    Do not apply TWS LoadLeveler 3.5.0.2 maintenance level. If the job queue is not empty when upgrading to TWS LoadLeveler 3.5.0.2, the jobs in the job queue will be removed.

Known limitations

TWS LoadLeveler 3.5 does not support checkpointing for data staging jobs.

Installation information

  • - Installation procedure

    Install the LoadLeveler updates on your system by using the normal, smit update_all command.

    For further information, consult the LoadLeveler Library for the appropriate version of the LoadLeveler AIX Installation Guide.

Additional information

  • - Package contents

    LoadL.full.3.5.1.8
    LoadL.msg.en_US.3.5.1.6
    LoadL.so.3.5.1.8

  • - Changelog

    Notes

    Unless specifically noted otherwise, this history of problems fixed for LoadLeveler 3.5.1.x applies to:

    • LoadLeveler 3.5.1.x for AIX 6
    • LoadLeveler 3.5.1.x for AIX 5
    • LoadLeveler 3.5.1.x for SUSE LINUX Enterprise Server 11 (SLES11) on POWER servers
    • LoadLeveler 3.5.1.x for SUSE LINUX Enterprise Server 10 (SLES10) on POWER servers
    • LoadLeveler 3.5.1.x for SUSE LINUX Enterprise Server 9 (SLES9) on POWER servers
    • LoadLeveler 3.5.1.x for Red Hat Enterprise Linux 5 (RHEL5) on POWER servers
    • LoadLeveler 3.5.1.x for Red Hat Enterprise Linux 4 (RHEL4) on POWER servers
    • LoadLeveler 3.5.1.x for SUSE LINUX Enterprise Server 11 (SLES11) on Intel based servers
    • LoadLeveler 3.5.1.x for SUSE LINUX Enterprise Server 10 (SLES10) on Intel based servers
    • LoadLeveler 3.5.1.x for SUSE LINUX Enterprise Server 9 (SLES9) on Intel based servers
    • LoadLeveler 3.5.1.x for Red Hat Enterprise Linux 5 (RHEL5) on Intel based servers
    • LoadLeveler 3.5.1.x for Red Hat Enterprise Linux 4 (RHEL4) on Intel based servers
    • LoadLeveler 3.5.1.x for SUSE LINUX Enterprise Server 11 (SLES11) on servers with 64-bit Opteron or EM64T processors
    • LoadLeveler 3.5.1.x for SUSE LINUX Enterprise Server 10 (SLES10) on servers with 64-bit Opteron or EM64T processors
    • LoadLeveler 3.5.1.x for SUSE LINUX Enterprise Server 9 (SLES9) on servers with 64-bit Opteron or EM64T processors
    • LoadLeveler 3.5.1.x for Red Hat Enterprise Linux 5 (RHEL5) on servers with 64-bit Opteron or EM64T processors
    • LoadLeveler 3.5.1.x for Red Hat Enterprise Linux 4 (RHEL4) on servers with 64-bit Opteron or EM64T processors
    Warning section

    A coexistence problem was introduced in TWS LoadLeveler 3.5.0.5 and TWS LoadLeveler 3.5.1.1 which can not be corrected. The entire cluster will need to be migrated to either TWS LoadLeveler 3.5.0.5 or TWS LoadLeveler 3.5.1.1 at the same time.

    Restriction section
    • TWS LoadLeveler 3.5 does not support checkpointing for data staging jobs.
    General LoadLeveler problems

    Problems fixed in LoadLeveler 3.5.1.8 [October 8, 2010]

    • Fixed llsummary command to display the correct job id for jobs which have been moved from one schedd to another using the llmovespool command.
    • Fixed the startd daemon to ignore the completion job command state if the job step was already terminated to prevent jobs from being stuck in the job queue.
    • Fixed jobs to run on partitions that had removed exclude_bg keyword from the partition's default class configuration.
    • Fixed LoadLeveler to do retries on the getpwnam() API so the correct passwd and group information will be retrieved if there are network issues instead of returning a "NOT FOUND" error.
    • Fixed central manager deadlock and core dump from occurring by removing the completed step from the user and group class queues before the dependent steps get requeued.
    • Fixed LoadLeveler from crashing by calling thread safe dirname() and filename() APIs during multi-thread executions.
    • Fixed LoadLeveler to accept jobs with environment variables up to 100KB.
    • Fixed the job step's completion code to return the wait3 UNIX system call status when the job is cancelled.

    Problems fixed in LoadLeveler 3.5.1.7 [July 20, 2010]

    • Resources will be held correctly if two reservations in the cluster were reserving the same resources with the second reservation's start time corresponding to the first reservation's end time.
    • Fixed the llsummary command from crashing when the history file was being modified at the same time the command was trying to read it.
    • Fixed the llsummary command to handle small data fragments in the history file so job steps will now be displayed correctly.
    • Fixed the llacctmrg command from crashing if the global history file was greater than 2 GB.
    • Fixed llsummary and llacctmrg commands to be able to access history files greater than 2GB.
    • Fixed the central manager from crashing by locking the job step so different threads can not operated on it concurrently.
    • Fixed the llqres command so that it will now work in a mixed 32 bit and 64 bit cluster environment without seeing the 2512-301 error message.
    • Fixed the user prolog environment variables to be passed to the user epilog.
    • Fixed Loadleveler to prevent duplicate job id error by trying other remote inbound schedds for remote job submission if the network connections to the inbound schedd is not stable.
    • During file system failures, new mechanisms are implemented to reaccess file handlers in order to recover LoadLeveler to working state. The new implementations are to have a new timer to enable the schedd to come up automatically if file access was available and to set the schedd to drain state if the schedd file handlers can not be recovered.

    Problems fixed in LoadLeveler 3.5.1.6 [May 20, 2010]

    • Fixed the evaluation of consumable cpus calculation for jobs which dynamically turn smt on and off so jobs will be scheduled properly on a Power5 or Power6 systems.
    • Fixed the negotiator core dump when the "START" expression was not configured for a machine.
    • Fixed design of dependent steps to get new qdate when they are put onto the idle queue due to enforcement of the maxqueued and maxidle limits.

    Problems fixed in LoadLeveler 3.5.1.5 [March 22, 2010]

    • Fixed LoadLeveler to be able to honor the task order in the task_geometry keyword when assigning cpus to task ids.
    • Fixed llstatus to display the correct configuration expressions for all expression keywords.
    • Fixed the dispatch cycle of routed jobs so when the central manager failover takes place, the preempted jobs will now be able to run.
    • Fixed LoadLeveler to set the environment variable from the prolog output if each line contains at most 65534 characters. All lines containing more than 65534 characters will be ignored.
    • Fixed LoadLeveler jobs to start correctly in the login shell and know when to run under a login shell so the pmd will not hang during execution.
    • Fixed the reservation debug message field so the central manager will not core dump.

    Problems fixed in LoadLeveler 3.5.1.4 [January 18, 2010]

    • Fix LoadLeveler from crashing when started in drain mode.
    • Fix the LoadL_negotiator daemon from core dumping by initializing an internal variable that was being used.
    • Fix LoadLeveler jobs from hanging in preempted pending state by correcting the machine state for the jobs being preempted.
    • Fix the schedd daemon memory leak when processing reservations by removing the reservation element object after use.
    • Fix the llsummary command segmentation fault by skipping over data that are not valid when processing the history file.
    • Fix user id name to have a length up to 256 so jobs can now run when submitted using those ids.
    • Fix llqres -l to output the correct days of the month under the Recurrence section if the month have less than 31 days.
    • Fix LoadLeveler to send emails to the right administrator accounts when LoadLeveler detects errors.
    • Fix LoadLeveler to execute the rescan function so jobs can not be scheduled once the running jobs are completed when using the default scheduler.
    • Fix submitted jobs to be rejected when user id is not valid.
    • Fix LoadLeveler to not send notification emails if the api process has already reported the errors.
    • Fix LoadLeveler jobs to run with the correct gid on AIX platform.
    • Fix LoadLeveler multi-step jobs to run with the correct umask value.
    • Fix the negotiator daemon to ensure that resource counts are now being updated correctly when a step is canceled during the window of time after it has been scheduled but before the job start order has been dispatched.

    Problems fixed in LoadLeveler 3.5.1.3 [November 2, 2009]

    • Fix the job command file parsing error 2512-059 when the first non-blank line is neither a comment line nor a LoadLeveler keyword or if the first character of the first non-blank line is not a '#' sign.
    • Fix the resource count for coschedule job steps so if the step is canceled after it has been scheduled and waiting for preemption to take place, those resource counts will now be updated correctly for future dispatching cycles.
    • Fix LoadLeveler performance by reducing the overhead of handling llq query requests so that the impact to the overall scheduling progress is also reduced.
    • Fix documentation on why using different flags for llq will generate different outputs for the same job.

    Problems fixed in LoadLeveler 3.5.1.2 [August 19, 2009]

    • Fix the LoadL_schedd SIGSEGV termination while many jobs are submitted by correcting reference counting on the data area while threads are still referencing it.
    • Fix LoadLeveler to use unsigned int64 variables instead of integer for file size calculations whenever transmitting files, including transmitting history files that are greater than 2G to the llacctmrg command.
    • Fix the llqres output to show the correct month value under the "Recurrence" section.
    • Fix the LoadL_startd increased memory size consumption by modifying LoadLeveler to dynamically load the libraries only once.
    • Fix the schedd memory leak when performing a llctl reconfig while having parallel, user space jobs on the queue in running state by correcting the memory leaks in the adapter objects.
    • Fix the job step staying in the complete state for a long period of time by changing the central manager job termination/cleanup processing.
    • Fix LoadLeveler to have better performance when scheduling jobs, especially in a cluster which has huge number of nodes with similar resources on each node.

    Problems fixed in LoadLeveler 3.5.1.1 [May 18, 2009]

    Notice: This is a mandatory service update to TWS LoadLeveler 3.5.1.0.

    • Data staging options DSTG_NODE=MASTER and DSTG_NODE=ALL can now be used.
    • Fix the accounting output of llsummary command to not have multiple same step entries after LoadLeveler restarts on a multistep job.
    • Fix the child starter process to ensure it is started as root so that the process could set up the environment and credentials to run the job.
    • Fix the negotiator handling of step dependencies so jobs that are supposed to run will run and those that shouldn't would not.
    • Linux: On linux platforms with multiple CPUs, it is possible for the seteuid function to malfunction. When the LoadLeveler startd daemon encounters this failure, its effective user id may be set incorrectly, in which case it is possible for jobs to become stuck in ST state. A workaround to the glibc issue is provided in this service update.

    LoadLeveler Multicluster problems

    Problems fixed in LoadLeveler 3.5.1.3 [November 2, 2009]

    • Fix LoadL_schedd memory leak when running the llstatus -X command in a multi-cluster environment.
    • Fix LoadLeveler so jobs can be submitted to the remote cluster in a mixed 3.5.X and 3.4.3.X multi-cluster environment.

    Problems fixed in LoadLeveler 3.5.1.1 [May 18, 2009]

    Fix the llstatus -X from core dumping when there are adapters or MCMs on the nodes.


    LoadLeveler Blue Gene problems

    Problems fixed in LoadLeveler 3.5.1.6 [May 20, 2010]

    The duration of an active blue gene partition can now be modified on the Blue Gene/P system.


    Problems fixed in LoadLeveler 3.5.1.5 [March 22, 2010]

    Fixed LoadLeveler Blue Gene jobs to start on free nodes by skipping over invalid partitions in the Blue Gene database during partition load and continue to load on valid partitions.


    Problems fixed in LoadLeveler 3.5.1.1 [May 18, 2009]

    Added scheduling enhancements to make it easier to find resources to run jobs on large Blue Gene systems.


    TWS LoadLeveler Corrective Fix listing
    Fix Level APAR numbers
    LL 3.5.1.8 AIX: IZ80738 IZ83416 IZ83769 IZ84426 IZ85339 IZ85341 IZ85385 IZ85390 IZ85406
      LINUX: IZ78027 IZ81226 IZ82094 IZ83772
    LL 3.5.1.7 AIX: IZ69053 IZ75381 IZ75989 IZ76705 IZ77606 IZ77612 IZ78490 IZ78493 IZ79273 IZ79274
      LINUX: IZ77000 IZ77610
    LL 3.5.1.6 AIX: IZ52059 IZ73349 IZ75142 IZ75265
      LINUX: IZ70744
    LL 3.5.1.5 AIX: IZ56208 IZ67241 IZ67479 IZ67760 IZ69047 IZ70280 IZ70760 IZ70787
      LINUX: IZ66534
    LL 3.5.1.4 IZ60760 IZ62312 IZ64121 IZ64435 IZ64717 IZ64913 IZ64956 IZ65273 IZ65278 IZ66454 IZ66461 IZ66874 IZ66914 IZ67156
    LL 3.5.1.3 IZ55661 IZ58696 IZ59450 IZ59799 IZ59841 IZ59842 IZ63572
    LL 3.5.1.2 IZ51180 IZ51401 IZ52219 IZ52926 IZ52927 IZ53764 IZ53825 IZ54439
    LL 3.5.1.1 IZ48410 IZ48545 IZ48548 IZ50225 IZ50226

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SGS8DD","label":"Tivoli Workload Scheduler LoadLeveler for AIX"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB57","label":"Power"}}]

Document Information

Modified date:
07 October 2010

UID

isg400000312