Published on 01-Dec-2010
Validated on 07 Jun 2012
"Our work with IBM GPFS demonstrates the viability and usefulness of a global file system for collaboration and data-sharing between supercomputing centers—even when the individual supercomputing clusters are based on very different technical architectures. The flexibility of GPFS and its ability to support all the different DEISA supercomputers is highly impressive." - Mr. Stefan Heinzel, Director of the Rechenzentrum Garching at the Max Planck Society
Customer:
DEISA (Distributed European Infrastructure for Supercomputing Applications)
Industry:
Government
Deployment country:
Germany
Solution:
IT/infrastructure, Deep Computing, General Parallel File System (GPFS)
Overview
The Distributed European Infrastructure for Supercomputing Applications (DEISA) is a consortium of leading national supercomputing centers that aims to foster pan-European scientific research. The 15 partners are located in 10 countries and separated by great distances — from Scotland, Sweden and Finland in the north, to Italy and Spain in the south.
Business need:
DEISA wanted to advance European computational science through close collaboration between Europe’s most important supercomputing centers by supporting challenging computing tasks and sharing data across a wide-area network.
Solution:
To allow different IBM and non-IBM supercomputer architectures to access data from across a wide-area network, DEISA worked closely with the IBM Deep Computing team to create a global multicluster file system based on IBM General Parallel File System (GPFS™).
Benefits:
Enables scientists in different countries to share supercomputing resources and allocate computing tasks to the most suitable systems. Provides a rapid, reliable and secure shared file system for a variety of supercomputing architectures.
Case Study
The Distributed European Infrastructure for Supercomputing Applications (DEISA) is a consortium of leading national supercomputing centers that aims to foster pan-European scientific research. The 15 partners are located in 10 countries and separated by great distances—from Scotland, Sweden and Finland in the north, to Italy and Spain in the south. DEISA’s task, funded by the European Commission’s Seventh Framework Program, is to design, develop and enhance an infrastructure that will help these partners work together more effectively.
Supporting a heterogeneous environment
“Several of the members of DEISA run supercomputing clusters that are based on IBM Power Systems™, PowerPC® or Blue Gene®/P hardware, while the others use a variety of architectures including Cray XT5/6, SGI-Altix and NEC SX9,” comments Dr. Stefan Heinzel, Director of the Rechenzentrum Garching at the Max Planck Society, which is one of the principal partners of DEISA. “Creating an infrastructure that would allow all these different supercomputing architectures to share data effectively across distances of several thousand kilometers was a significant challenge—and something that very few organizations had ever attempted before.”
The sites were connected via the pan-European GÉANT communications infrastructure—but the creation of the physical network was not the only challenge DEISA faced. To develop an effective distributed supercomputing infrastructure, one of the main requirements was to build a shared file system that would allow each center to access data from the other locations.
Making the most efficient use of resources
“Different research projects have different computational needs, and are suited to the characteristics of different supercomputing clusters,” explains Dr. Hermann Lederer, Deputy Director of the Rechenzentrum Garching. “So, for example, it might be more efficient for a team of German scientists to run some parts of a specific project on one of the French supercomputing clusters, or vice versa. We wanted to make it possible to route the jobs from one center to another for processing, and then bring the results home again, without involving the end-users in the complexity of managing the infrastructure.”
Mr. Heinzel adds: “To achieve this, we needed a way to share large files across the wide-area network, and the best option was IBM General Parallel File System, or GPFS. Many of the DEISA partners had already been using GPFS within their own clusters, and we also had confidence in IBM’s expertise in deep computing solutions. In fact, it was DEISA’s close relationship with IBM that stimulated the development of many of the GPFS features that we are using today.”
Boosting reliability
The first major challenge was to ensure that the infrastructure was reliable enough to support international projects without having any negative impact on local research.
“We had to ensure that the individual supercomputing clusters were autonomous, so that a failure on one cluster wouldn’t affect the others,” explains Dr. Lederer. “IBM helped us by considerably enhancing the multiclustering capabilities of GPFS and the client-side reliability. It was essential to reengineer the communication patterns so that compute nodes in one cluster wouldn’t need to communicate with compute nodes in another cluster, which helped to improve reliability and performance.”
Solving network communication challenges
A second issue was the solution’s reliance on the network infrastructure. If there was a minor problem with the wide-area network, it could cause I/O errors that disrupted long computing jobs. To resolve this, DEISA decided to store all job-related data on a local file system for the duration of the job, and then copy it globally via GPFS at the end of the job.
“In the future, we are interested in testing a new IBM technology called Panache, which serves as a clustered file system cache for parallel data-intensive applications that require wide-area file access,” says Mr. Heinzel. “If it works as intended, it should help to further combat WAN latencies and outages by storing all updates of data and metadata in the cache, and then asynchronously replicating them to the remote GPFS cluster.”
Advances in science and technology
By working closely with the IBM Deep Computing team, DEISA has succeeded in creating a distributed supercomputing infrastructure that supports scientists and researchers at companies and institutions throughout Europe. Each year, the DEISA Extreme Computing Initiative (DECI) calls for proposals for “grand challenge” projects that use the DEISA infrastructure to enable groundbreaking research in all areas of science and technology. The most promising projects are selected by the DEISA Executive Committee and granted access to DEISA computing resources.
DECI projects have already enabled great advances in a wide range of scientific fields. One notable example is nuclear fusion research.
“The fusion community has benefited from over 10 million hours of computer resources under DEISA in the last couple of years, via the different organizations and projects,” says Dr. Duarte Borba, a researcher at the European Fusion Development Agreement (EFDA). “This has allowed significant progress to be made in the use of large-scale computer resources for fusion applications. DEISA facilitates access to a diverse set of computer architectures, which has created new opportunities for the fusion community.”
DEISA has also started working on the development of an intercontinental distributed supercomputing architecture. By working with TeraGrid and leveraging the robustness of GPFS, DEISA has been able to demonstrate the viability of collaboration between sites in Europe and the USA, linking supercomputing resources across enormous physical distances—from Bologna, Italy to San Diego, California.
Building a strong relationship
“We have come a long way since the first launch of DEISA,” comments Dr. Lederer. “Thanks to dedicated support from IBM and collaboration with our colleagues across Europe, we have created a reliable, high-performance infrastructure that has contributed to many important scientific projects.”
Mr. Heinzel concludes: “Our work with IBM GPFS demonstrates the viability and usefulness of a global file system for collaboration and data sharing between supercomputing centers—even when the individual supercomputing clusters are based on very different technical architectures. The flexibility of GPFS and its ability to support all the different DEISA supercomputers is highly impressive.”
Products and services used
IBM products and services that were used in this case study.
Hardware:
Power Systems
Software:
General Parallel File System
Legal Information
© Copyright IBM Corporation 2010 IBM Systems and Technology Group Route 100 Somers, New York 10589 U.S.A. Produced in the United States of America November 2010 All Rights Reserved IBM, the IBM logo, ibm.com, AIX, Blue Gene/P, GPFS and Power Systems are trademarks of International Business Machines Corporation in the United States, other countries or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the web at “Copyright and trademark information” at ibm.com/legal/copytrade.shtml Other company, product and service names may be trademarks or service marks of others. References in this publication to IBM products or services do not imply that IBM intends to make them available in all countries in which IBM operates. Offerings are subject to change, extension or withdrawal without notice. All client examples cited represent how some clients have used IBM products and the results they may have achieved. The information in this document is provided “as-is” without any warranty, either expressed or implied.