Published on 10-Oct-2012
"It is challenging to work on a project that truly pushes the boundaries of scientific research, but also highly rewarding. In IBM, we have a partner who shares the excitement of tackling these challenges and developing the innovative technology that will power the intelligent information systems of tomorrow and enhance our understanding of the world around us." - Edwin Valentijn, Target Program Manager and Head of OmegaCEN
University of Groningen
Technical Computing, Automation, General Parallel File System (GPFS), High Availability , Virtualization, Virtualization - Server
The University of Groningen in the Netherlands is one of the oldest and largest research universities in Europe. In 2010 the University launched Target, an ambitious research project aimed at revolutionizing the management of very large amounts of data. Target is supported by Samenwerkingsverband Noord Nederland and Gemeente Groningen. It operates under the auspices of Sensor Universe. The project is financially supported by the European Fund for Regional Development and the Dutch Ministry of Economic Affairs (Pieken in de Delta), the Province of Groningen and the Province of Drenthe.
To support its Target project, the University of Groningen needed to build a computing platform that could handle the massive data processing and storage requirements of large-scale research initiatives.
IBM designed a solution comprising of an IBM® Intelligent Cluster™ with IBM System x® servers using intelligent Intel® Xeon® processors, as well as IBM System Storage® technology and IBM GPFS™
The new infrastructure fully meets the enormous data processing and archiving requirements of the Target project, delivering extremely high throughput levels and excellent resource utilization.
Founded in 1614, the University of Groningen in the Netherlands is one of the oldest and largest research universities in Europe. In 2010 the University launched Target, an ambitious research project aimed at revolutionizing the management of very large amounts of data. Target is supported by Samenwerkingsverband Noord Nederland and Gemeente Groningen. It operates under the auspices of Sensor Universe. The project is financially supported by the European Fund for Regional Development and the Dutch Ministry of Economic Affairs (Pieken in de Delta), the Province of Groningen and the Province of Drenthe.
Under the Target project, a number of university research groups and commercial partners will collaborate to design and implement intelligent information systems capable of transforming huge amounts of raw scientific data into valuable knowledge.
Meeting the challenges of modern research
As today’s research projects push information technology to its limits, institutions like the University of Groningen are challenged to provide computing infrastructures capable of storing and processing the constantly increasing flood of data generated by large-scale research.
Edwin Valentijn, Target Program Manager and Head of OmegaCEN, explains: “Big data is incredibly important to the majority of research projects, but it is also problematic in that it creates a requirement for massive data storage and data processing capabilities. Target was born out of a need to provide a common platform that could meet these requirements, allowing scientists to collaborate on research and development for big data projects.”
While most research projects share a common need for hosting enormous amounts of data, they must also perform highly complex and diverse analyses of this data. This meant that any potential platform developed by the University had to integrate large-scale data processing and archiving, as well as support a broad range of application data.
Solid partnership with IBM
“We had already been collaborating with IBM to develop a supercomputing environment for the Low Frequency Array (LOFAR) telescope. Here, we tackled similar challenges around processing very high volumes of data,” notes Valentijn. “That collaboration has been highly successful, so we were keen to partner with IBM once again.”
IBM Global Technology Services® worked with the Target team to design a powerful, scalable infrastructure based on IBM System x servers, IBM System Storage hardware and IBM General Parallel File System. The entire hardware infrastructure is hosted at the Donald Smits Center for Information Technology (CIT) at the University of Groningen.
Multi-tier storage and high-performance computing cluster
The core of the Target platform is an open-standards storage solution called the “test bed”, featuring five different storage pools that represent distinct environments, each with its own functionality and specifications. The use of several storage tiers helps to provide an effective balance between storage costs and performance, combining tape and disk storage to support a range of application data.
Specific storage devices were defined for each storage pool, which support different operational and performance requirements, and enable growth in volume and bandwidth as required. IBM donated two petabytes (PB) of initial storage space to the test bed, and the University extended the environment by a further 8 PB.
As it was necessary to stream data to the most suitable devices and storage pools, the IBM team deployed an IBM Intelligent Cluster solution comprising 58 IBM System x servers running SUSE Linux Enterprise Server. The IBM Intelligent Cluster leverages Intel Xeon processors for high processing density and speed.
The IBM System x servers are connected to one another with a 10 gigabyte Ethernet cluster interconnect, while the storage components are connected to the servers via a combination of Fibre Channel and SAS switches. The IBM Intelligent Cluster solution is primarily used to provide backend storage for IBM GPFS, and also supports a number of virtual machines, databases and user applications.
Managing complexity with IBM General Parallel File System
IBM GPFS was chosen as the file system layer for the storage infrastructure at Target. GPFS is a high-performance, clustered file system that offers fast, consistent access to a common set of file data. Currently, the Target GPFS handles approximately 380 million files and is one of the largest distributed file systems in the world.
The automated lifecycle management capabilities of IBM GPFS help to simplify data management and provide excellent control over data placement. After being categorized by a number of parameters, including type and importance, data is assigned to the optimal storage pool based on its particular requirements. IBM GPFS also spreads data across multiple storage devices automatically, ensuring high-performance data access.
Hans Gankema, Project Manager for GPFS and Target Facility Manager, comments: “The Target infrastructure is very complex, and IBM GPFS helps to manage that complexity and hide it from our end-users. All they see is a huge, generic storage environment because GPFS unites storage resources with different characteristics into a single file system. The solution is also highly flexible and scalable, which means that as the project progresses it will be easy for us to expand our capacity enormously without disruption or loss of performance.”
Supporting diverse data requirements and high processing speeds
A number of applications are now in the initial stages of leveraging the new infrastructure. For example, the LifeLines initiative is using the Target test bed to expand its research database, which stores health information from 165,000 project participants. The aim is to collect genotype and phenotype data from patients over a period of 30 years, analyze it and find ways to promote healthy aging.
Considerable gains have also been made in regards to throughput speed. “The levels of throughput that we have been seeing with the new IBM storage infrastructure are phenomenal,” states Valentijn. “For example, we are using Target to manage data from the largest optical survey telescope in the world – the VLT Survey Telescope – located in Chile and designed for surveying the sky in visible light.
“We needed to be able to process data twice as fast as it was captured by the telescope, in order to keep up with the approximately 100 GB of imagery that is produced each day. With Target, we can now process data ten times as fast as it is captured, which is a major achievement. This survey data will be extremely valuable in helping us to map dark matter and improve our understanding of how galaxies evolve.”
Breaking new ground
With the Target infrastructure now in place, the University of Groningen and its project partners will have the ability to break new ground in scientific research and collaborate on an unprecedented scale.
Valentijn concludes, “Target offers us exciting new opportunities to control and manage the avalanche of data being produced by modern research projects, and to ultimately turn that data into valuable information. The infrastructure that has been put in place by IBM Global Technology Services will play a pivotal role in helping us to realize these goals.
“It is challenging to work on a project that truly pushes the boundaries of scientific research, but also highly rewarding. In IBM, we have a partner who shares the excitement of tackling these challenges and developing the innovative technology that will power the intelligent information systems of tomorrow and enhance our understanding of the world around us.”
Products and services used
IBM products and services that were used in this case study.
Intelligent Cluster, Intelligent Cluster running Linux - SUSE, Storage: DS3200, Storage: DS5300, Storage: DCS3700, Storage: EXP3000, Storage: TS3500 Tape Library, System x, System x: System x running Linux - SUSE, System x: System x3550 M3, System x: System x3650 M3, System x: System x3850 X5
Novell SUSE Linux
© Copyright IBM Corporation 2012 IBM Corporation Systems and Technology Group Route 100 Somers, NY 10589 Produced in the United States of America October 2012 IBM, the IBM logo, ibm.com, Intelligent Cluster, Global Technology Services, GPFS, System Storage, System x, and Tivoli are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the web at “Copyright and trademark information” at ibm.com/legal/copytrade.shtml Intel, the Intel logo, Xeon and Xeon Inside are trademarks of Intel Corporation in the U.S. and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. This document is current as of the initial date of publication and may be changed by IBM at any time. Not all offerings are available in every country in which IBM operates. The performance data and client examples cited are presented for illustrative purposes only. Actual performance results may vary depending on specific configurations and operating conditions. THE INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS” WITHOUT ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING WITHOUT ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND ANY WARRANTY OR CONDITION OF NON-INFRINGEMENT. IBM products are warranted according to the terms and conditions of the agreements under which they are provided. Actual available storage capacity may be reported for both uncompressed and compressed data and will vary and may be less than stated. XSC03128-USEN-00