Durham University plays a starring role in computational cosmology

Building a powerful and efficient cluster for big data research projects with IBM System x iDataPlex

Published on 15-May-2013

"The IBM Platform Computing software is more stable and user-friendly than our old open source software. One of our objectives is to sell HPC capacity to business customers, and we feel that these tools will put us in a better position to deliver professional levels of service." - Dr Lydia Heck, Senior Computer Manager, Institute for Computational Cosmology, Durham University

Customer:
Durham University

Industry:
Education

Deployment country:
United Kingdom

Solution:
Technical Computing, Business Resiliency, Energy Efficiency, General Parallel File System (GPFS), High Availability

IBM Business Partner:
OCF

Overview

Established in 2002, Durham University’s Institute for Computational Cosmology (ICC) has become a leading international centre for research into the origin and evolution of the universe, using high-performance computing (HPC) clusters to simulate cosmological events and answer some of the most fundamental questions in science: What were the first objects in the Universe? How do galaxies form? What is the nature of dark matter and dark energy? Where does the large-scale structure of the universe come from? What is the fate of the Universe?

Business need:
To maintain the UK’s high standing in fields such as cosmology, Durham University’s Institute for Computational Cosmology (ICC) needed the ability to handle big data research projects.

Solution:
The ICC created COSMA5 – a cluster built on IBM® System x® iDataPlex® and IBM Platform Computing software, which is capable of processing huge data-sets to simulate the formation of structure in the universe and solve problems in many other fields.

Benefits:
Runs four times as fast as earlier clusters, enabling more sophisticated simulations. Operates at over 90 percent efficiency. Requires just 1.2 watts of electricity in the data centre per watt utilised by the cluster.

Case Study

Established in 2002, Durham University’s Institute for Computational Cosmology (ICC) has become a leading international centre for research into the origin and evolution of the universe, using high-performance computing (HPC) clusters to simulate cosmological events and answer some of the most fundamental questions in science: What were the first objects in the Universe? How do galaxies form? What is the nature of dark matter and dark energy? Where does the large-scale structure of the universe come from? What is the fate of the Universe?

Dr Lydia Heck, Senior Computer Manager at the ICC, comments: “Our research helps to maintain the UK’s position as a leader in the fields of cosmology and astronomy. In partnership with other HPC centres around the country, and with international consortia such as Virgo, we have run projects whose results have underpinned some of the most important cosmological research papers of the past decade.”

Winning the bid for DiRAC 2
Following the success of the original DiRAC (Distributed Research utilising Advanced Computing) initiative, the government allocated a second round of funding for a new generation of HPC investment, known as DiRAC 2.

The ICC was keen to become a centre for DiRAC 2, and made a successful bid for funds to create a new cluster, which would be known as COSMA5. A key factor in its success was the fact that Durham University was able to fund an upgrade to the power supply in the university’s HPC machine room, providing a world-class facility with the power, cooling and physical attributes required for the latest generation of HPC clusters.

Designing clusters for specific workloads
“The strategy behind DiRAC 2 is that the clusters are designed for different types of research problems,” says Dr Heck. “COSMA5 is optimised for ‘big data’ projects, while, for example, the Leicester cluster and the Cambridge HPCS cluster are designed for optimal high-performance interprocess communication, and the Cambridge Cosmos system for shared memory. Running the right code on the right platforms makes a huge difference to efficiency, as well as enabling the UK’s universities to tackle a wider range of projects.

“To make sure COSMA5 would deliver the petascale storage and data-processing power required for our role in DiRAC 2, we needed to put the best possible infrastructure in place. We performed a full EU procurement exercise to select the right hardware and implementation partners to help us build the cluster, and a joint proposal from IBM and OCF achieved the highest score of all the bids we received. Their success was not just based on price and technical capability, but also on support and service levels – and they had good references to support their proposal, which made us confident that they could deliver what we needed.”

Building COSMA5
Following the completion of the power supply upgrade, the IBM and OCF teams worked closely with the ICC to install the new cluster, which comprises 420 IBM System x iDataPlex dx360 M4 nodes with a total of 6,720 processor cores and 53,760 GB of memory, and three IBM System x3750 M4 servers which serve as login and development nodes.

DDN disk systems provide 2.4 PB of storage, which is managed by the IBM General Parallel File System (GPFS™) and backed up by IBM Tivoli® Storage Manager.

Performance and energy efficiency
“COSMA5 is four times as fast as the previous-generation COSMA4 cluster – achieving sustained performance of 126 TeraFLOPS at 91 percent efficiency on the LINPACK benchmark,” says Dr Heck. “It is also 28 times as fast as the old air-cooled COSMA3 cluster, which we have now retired and removed from the machine room.

“Replacing COSMA3 with a more powerful water-cooled cluster means that our total HPC landscape is much more energy-efficient: the machine itself not only uses considerably less electricity per FLOP, it also requires no air conditioning.”

The ICC now needs no air chillers; all of the cooling requirements are met by the iDataPlex water cooling system. As a result, the machine room’s power utilisation effectiveness (PUE) score has been reduced to around 1.2 – which means that more than 80 percent of the electricity used by the room goes directly to the clusters themselves, rather than powering ancillary systems.

Dr Heck has also been impressed with the reliability of the cluster: “We have reached a stable state quite quickly. If I look at the cluster right now, all but one of the 420 nodes are currently operational, and all but ten of them have been running since the most recent system update and subsequent reboot.

Easier cluster management
COSMA5 has also seen the ICC adopt new software for cluster management: IBM Platform HPC, which includes IBM Platform MPI for communication between processes; and IBM Platform LSF® for workload management.

“The IBM Platform Computing software is more stable and user-friendly than our old open source software,” comments Dr Heck. “One of our objectives is to sell HPC capacity to business customers, and we feel that these tools will put us in a better position to deliver professional levels of service.”

Powering scientific progress
Meanwhile, the scientific community is already working with COSMA5. A cosmological research project known as “EAGLE” is using the cluster to model galaxy formation. The UK magnetohydrodynamics community is also harnessing the cluster to investigate star formation and interstellar clouds.

Dr Heck concludes: “COSMA5 is playing an active role in the advancement of cosmology and astronomy, helping the UK maintain its position as a leader in these areas. We’re excited by its potential to support academic and commercial science projects across the country and around the world.”

Products and services used

Legal Information

© Copyright IBM Corporation 2013 IBM United Kingdom Limited PO Box 41, North Harbour Portsmouth Hampshire, PO6 3AU Produced in the United Kingdom May 2013 IBM, the IBM logo, ibm.com, GPFS, iDataPlex, LSF, System x and Tivoli are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the web at: www.ibm.com/legal/copytrade.shtml. IBM and OCF are separate companies and each is responsible for its own products. Neither IBM nor OCF makes any warranties, express or implied, concerning the other’s products. This document is current as of the initial date of publication and may be changed by IBM at any time. Not all offerings are available in every country in which IBM operates. The client examples cited are presented for illustrative purposes only. Actual performance results may vary depending on specific configurations and operating conditions. THE INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS” WITHOUT ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING WITHOUT ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND ANY WARRANTY OR CONDITION OF NON-INFRINGEMENT. IBM products are warranted according to the terms and conditions of the agreements under which they are provided. Actual available storage capacity may be reported for both uncompressed and compressed data and will vary and may be less than stated.