IBM Support

SDK: Introduction: Platform LSF API Services

Troubleshooting


Problem

SDK: Introduction: Platform LSF API Services

Resolving The Problem

What do Platform LSF API Services do?

Platform LSF API Services

Platform LSF services are natural extensions of operating system services. Platform LSF services glue heterogeneous operating systems into a single, integrated computing system.

Platform LSF APIs provide easy access to the services of Platform LSF servers.

Platform LSF APIs have been used to build numerous load sharing applications and utilities. Some examples of applications built on top of the Platform LSF APIs are lsmake, lstcsh, lsrun, and the LSF Batch user interface.

Platform LSF base API services

The Platform LSF Base API (LSLIB) allows application programmers to get services provided by LIM and RES. The services include:

  • Configuration information service
  • Dynamic load information service
  • Placement advice service
  • Task list information service
  • Master Selection service
  • Remote execution service
  • Remote file operation service
  • Administration service

Configuration information service

This set of function calls provide information about the Platform LSF cluster configuration, such as hosts belonging to the cluster, total amount of installed resources on each host (e.g., number of CPUs, amount of physical memory, and swap space), special resources associated with individual hosts, and types and models of individual hosts.

Such information is static and is collected by LIMs on individual hosts. By calling these routines, an application gets a global view of the distributed system. This information can be used for various purposes. For example, the Platform LSF command lshosts displays such information on the screen. LSF Batch also uses such information to know how many CPUs are on each host.

Flexible options are available for an application to select the information that is of interest to it.

Dynamic load information service

This set of function calls provide comprehensive dynamic load information collected from individual hosts periodically. The load information is provided in the form of load indices detailing the load on various resources of each host, such as CPU, memory, I/O, disk space, and interactive activities. Since a site-installed External LIM (ELIM) can be optionally plugged into the LIM to collect additional information that is not already collected by the LIM, this set of services can be used to collect virtually any type of dynamic information about individual hosts.

Example applications that use such information include lsload and lsmon. This information is also valuable to an application making intelligent job scheduling decisions. For example, LSF Batch uses such information to decide whether or not a job should be sent to a host for execution.

These service routines provide powerful mechanism for selecting the information that is of interest to the application.

Placement advice service

Platform LSF Base API provides functions to select the best host among all the hosts. The selected host can then be used to run a job or to login. Platform LSF provides flexible syntax for an application to specify the resource requirements or criteria for host selection and sorting.

Many Platform LSF utilities use these functions for placement decisions, such as lsrun, lsmake, and lslogin. It is also possible for an application to get the detailed load information about the candidate hosts together with a preference order of the hosts.

A parallel application can ask for multiple hosts in one LSLIB call for the placement of a multi-component job.

The performance differences between different models of machines as well as the number of CPUs on each host are taken into consideration when placement advice is made, with the goal of selecting qualified hosts that will provide the best performance.

Task list manipulation service

Task lists are used to store default resource requirements for users. Platform LSF provides functions to manipulate the task lists and retrieve resource requirements for a task. This is important for applications that need to automatically pick up the resource requirements from user's task list. The Platform LSF command lsrtasks uses these functions to manipulate user's task list. Platform LSF utilities such as lstcsh, lsrun, and bsub automatically pick up the resource requirements of the submitted command line by calling these LSLIB functions.

Master selection service

If your application needs some kind of fault tolerance, you can make use of the master selection service provided by the LIM. For example, you can run one copy of your application on every host and only allow the copy on the master host to be the primary copy and others to be backup copies. LSLIB provides a function that tells you the name of the current master host.

LSF Batch uses this service to achieve improved availability. As long as one host in the Platform LSF cluster is up, LSF Batch service will continue.

Remote execution service

The remote execution service provides a transparent and efficient mechanism for running sequential as well as parallel jobs on remote hosts. The services are provided by the RES on the remote host in cooperation with the Network I/O Server (NIOS) on the local host. The NIOS is a per application stub process that handles the details of the terminal I/O and signals on the local side. NIOS is always automatically started by the LSLIB as needed.

RES runs as root and runs tasks on behalf of all users in the Platform LSF cluster. Proper authentication is handled by RES before running a user task.

Platform LSF utilities such as lsrun, lsgrun, ch, lsmake, and lstcsh use the remote execution service.

Remote file operation service

The remote file operation service allows load sharing applications to operate on files stored on remote machines. Such services extend the UNIX and Windows file operation services so that files that are not shared among hosts can also be accessed by distributed applications transparently.

LSLIB provides routines that are extensions to the UNIX and Windows file operations such as open(2), close(2), read(2), write(2), fseek(3), stat(2), etc.

The Platform LSF utility lsrcp is implemented with the remote file operation service functions.

Administration service

This set of function calls allow application programmers to write tools for administrating the Platform LSF servers. The operations include reconfiguring the Platform LSF clusters, shutting down a particular Platform LSF server on some host, restarting an Platform LSF server on some host, turning logging on or off, locking/unlocking a LIM on a host, etc.

The lsadmin utility uses the administration services.

LSF Batch API services

The LSF Batch API, LSBLIB, gives application programmers access to the job queueing processing services provided by the LSF Batch servers. All LSF Batch user interface utilities are built on top of LSBLIB. The services that are available through LSBLIB include:

  • LSF batch system information service
  • Job manipulation service
  • Log file processing service
  • LSF batch administration service

LSF Batch system information service

This set of function calls allow applications to get information about LSF Batch system configuration and status. These include host, queue, and user configurations and status.

The batch configuration information determines the resource sharing policies that dictate the behavior of the LSF Batch scheduling.

The system status information reflects the current status of hosts, queues, and users of the LSF Batch system.

Example utilities that use the LSF Batch configuration information services are bhosts, bqueues, busers, and bparams.

Job manipulation service

The job manipulation service allows LSF Batch application programmers to write utilities that operate on user jobs. The operations include job submission, signaling, status checking, checkpointing, migration, queue switching, and parameter modification.

LSF Batch administration service

This set of function calls are useful for writing LSF Batch administration tools.

The LSF Batch command badmin is implemented with these library calls.

[{"Product":{"code":"SSWRJV","label":"IBM Spectrum LSF"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"--","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"6.2","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}},{"Product":{"code":"SSWRJV","label":"IBM Spectrum LSF"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":null,"Platform":[{"code":"","label":""}],"Version":"","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
23 June 2018

UID

isg3T1014341