Published on 04-Dec-2012
"With the very rich way in which Platform RTM allows us to view grid performance data, we can demonstrate how users are getting their fair share of the infrastructure and identify any sharing conflicts." - A company representative
Global Pharmaceutical Firm
The Global Research & Development (GRD) group at a global leader in the pharmaceutical industry depends heavily on the company’s worldwide computing cloud for the vast amounts of processing power needed to run sophisticated software applications that support their drug development activities. The organization discovers, develops, manufactures and markets leading prescription drugs for conditions such as arthritis, high cholesterol and penile dysfunction.
Two main user groups in R&D share a single, worldwide computing grid Both groups were concerned they were not getting timely access to resources Find a tool capable of monitoring grid activity and job execution that met group SLAs
Platform RTM helped IT determine when to add additional hardware to increase grid capacity
Users are satisfied with the grid performance and acknowledge the benefit of sharing grid resources A collaborative effort that enables users to maximize their ability to do work, and helps IT manage the computing infrastructure
The Global Research & Development (GRD) group at a global leader in the pharmaceutical industry depends heavily on the company’s worldwide computing cloud for the vast amounts of processing power needed to run sophisticated software applications that support their drug development activities. The organization discovers, develops, manufactures and markets leading prescription drugs for conditions such as arthritis, high cholesterol and penile dysfunction. The GRD organization consists primarily of Research: Computational Science, which discovers new drugs through the process of creating new molecular compounds, and Development: Pharmacometrics, which brings the new drugs to market. Both groups rely heavily on computers and advanced computational software to support their activities.
To manage the application processing workload across this dynamic data center, the company uses IBM® Platform™ LSF®, an intelligent, policy-driven workload management software for distributed computing environments. The two user groups were concerned about sharing a single grid. Each believed that the other’s workload was interfering with their ability to access the computing resources they needed for the timely completion of their own work. Caught in the middle, the IT organization needed to find a way to demonstrate they could meet SLAs established with both groups and that each was getting better application throughput and performance than it could get from having its own dedicated computing cluster.
In-depth visibility into job operations and performance
To optimize utilization of computing hardware, and to provide maximum performance for applications used by both communities, a cloud computing model, linking 400 CPUs across different geographies to create a centrally managed resource is employed. Platform LSF workload management software is used to manage the batch and parallel application processing workload across this computing cluster.
In spite of an innovative approach to implementing job priority queues, however, the IT organization was buffeted by complaints from both R&D groups that their needs for timely access to adequate computing resources were not being met and that they should each have their own dedicated computing cluster. IT Management needed a way to demonstrate to both groups that they were in fact experiencing better application performance from the shared environment than they would if they each had their own dedicated cluster. IT also needed a way to determine when they needed to add more computers to the Platform LSF cluster to meet growing business requirements.
The company chose IBM Platform RTM, a reporting, tracking and monitoring solution to provide in-depth visibility into the detailed job operations and performance within its Platform LSF computing grid. Platform RTM enables the company to make timely decisions for proactively managing computing resources against business demand and to demonstrate to both user groups that SLAs are being met. It also makes it easier to diagnose application performance problems, and helps IT develop job submission guidelines to maximize job throughput and performance.
A key part of what Platform RTM provides is the Contention Report, which allows individual users to see whether they are getting their ‘fair share’ of the computing resources and to what extent their jobs are being preempted by other jobs. This report also enables users to estimate when their jobs will be completed.
By graphically depicting how the grid resources are being used over time, Platform RTM’s SLA Report makes it possible for IT to actively manage the computing cluster, while providing a tool that enables them to demonstrate that they are meeting the SLAs established with each user group.
The Completion Factor Report, which shows how long it takes jobs to run on the grid, and compares this against a theoretical, standalone grid of 100 dedicated CPUs, enables IT to show users that they are getting better service delivery from a shared grid than they would if they each had their own cluster. The report is also a useful capacity planning tool that helps IT determine when they need to add more CPUs to meet growing business requirements.
Together, these three main features of Platform RTM make it easier to manage the grid and demonstrate that they are meeting SLAs established with each user community. They also help diagnose grid performance problems and establish job submission guidelines to maximize job turnaround and performance.
“I find myself going into the Platform RTM site every day just to get the warm fuzzies that everything is operating properly, because it gives me that instant view into how things are working,” says a company representative.
For more information
To learn more about IBM Platform Computing please contact your IBM marketing representative or IBM Business Partner, or visit the following website: ibm.com/platformcomputing
Products and services used
IBM products and services that were used in this case study.
Platform LSF, Platform RTM
© Copyright IBM Corporation 2012 IBM Corporation Systems and Technology Group Route 100 Somers, NY 10589 Produced in the United States of America May 2012 IBM, the IBM logo, ibm.com, Platform Computing, Platform LSF and Platform RTM are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the web at “Copyright and trademark information” at ibm.com/legal/copytrade.shtml This document is current as of the initial date of publication and may be changed by IBM at any time. Not all offerings are available in every country in which IBM operates. The client examples cited are presented for illustrative purposes only. Actual performance results may vary depending on specific configurations and operating conditions. THE INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS” WITHOUT ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING WITHOUT ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND ANY WARRANTY OR CONDITION OF NON-INFRINGEMENT. IBM products are warranted according to the terms and conditions of the agreements under which they are provided.