Penn State slashes backup time by 80 percent

Published on 19-Dec-2012

"With the RamSan we literally just turned it on and that’s all the performance tuning we did. It just worked out of the box." - Michael Fenn, Systems Administrator, Penn State

Customer:
Penn State University

Industry:
Education

Deployment country:
United States

Solution:
General Parallel File System (GPFS)

Overview

The Research Computing and Cyberinfrastructure (RCC) group at Penn State University provides systems and services that are used extensively in computational and data-driven research and teaching by more than 5,000 faculty members and students.

Business need:
RCC at Penn State needed to speed up nightly backups

Solution:
Use of Flash array instead of continuing to increase the number traditional disk spindles

Benefits:
· Instant 6 times performance increase in nightly backup · Two 1U Flash arrays replaced 200 power-hungry 15K hard disks · Reduced power consumption by 90 percent · Flash storage system can scale to 10 TB · Flash delivers affordable high speed and reliability · Requires less power to operate, produces less heat and consumes less rack space

Case Study

The Research Computing and Cyberinfrastructure (RCC) group at Penn State University provides systems and services that are used extensively in computational and data-driven research and teaching by more than 5,000 faculty members and students.

The situation

Research initiatives slowed by HDD

To deliver on that mission, RCC operates computational clusters composed of 1,000 servers with 10,000 CPU cores. RCC’s storage infrastructure consists of a cluster of 14 servers running Red Hat Enterprise Linux 5. The servers are connected through 8 Gbps Fibre Channel HBAs to 900 TB of disk storage on a Storage Area Network (SAN) with that data served out over Ethernet to the computational clusters.

The RCC storage cluster is built on General Parallel File System (GPFS) of IBM®, a highly scalable clustered file system that leverages file metadata to achieve policy-based active management of files for backup, migration, archiving and index tagging of files. It also places an extreme random I/O workload on the storage infrastructure by having to process metadata for those files on an ongoing basis. The GPFS metadata files are small—only 512 bytes—so capacity is not the issue. Rather, the IOPS requirements to process millions of files on an ongoing basis were overwhelming Penn State RCC’s SAN infrastructure.

The challenge

Speed up nightly backups

RCC had allocated two hundred 15K RPM hard disk drives to handle the metadata processing that managed the nightly backup of user data. The group had to overprovision capacity of the drives in an attempt to generate acceptable IOPS to complete the backup operation in a brief overnight window so as not to impact production processes.

Despite allocating a massive number of the disks to the operation, the nightly incremental backup were still taking up to 6 hours and degrading other system operations.

The solution
High-performance, reliable RamSan-810 flash storage

The team at RCC determined that a solid state storage solution was needed to effectively handle the IOPS load and replace the HDDs, which had been overprovisioned in an attempt to boost IOPS.

“We didn’t need the capacity but we did need the performance,” said Vijay Agarwala, Sr. Director for Penn State's RCC. ““There were operations, such as nightly backups which have to check the status of all the files in the system, that were being bottlenecked by the performance of those disks. So it made sense to go with a Flash media array rather than continuing to throw traditional disk spindles at the problem,” said Jason Holmes, lead systems administrator at RCC.

The RCC team conducted an evaluation process with SSD products from four vendors. It was not an easy process as RCC administrators struggled with most of the SSD systems, tweaking and tuning them in an attempt to wring out anything close to the promised performance. Their experience with the RamSan-810 SSD from Texas Memory Systems, an IBM Company was completely different.


The result
Instant six times performance boost

“With some of the other solutions we tested, we poked and pried at them for weeks to get the performance where the vendors claimed it should be,” said Michael Fenn, systems administrator at Penn State. “With the RamSan we literally just turned it on and that’s all the performance tuning we did. It just worked out of the box.”

Penn State tested a pair of RamSan-810 1U flash storage systems in a high-availability mirrored configuration that took advantage of the replication functionality of GPFS. The RamSan units each contain 2 TB of solid state Flash storage and can be scaled up to 10 TB.

The RamSan-810 is the first Flash storage system from TMS to use Enterprise Multi Level Cell (eMLC) Flash memory, a new technology that features high speed and reliability paired with the affordability of MLC. eMLC is perfect for read-intensive environments such as Penn State’s. According to Holmes, the applications usage at RCC consists of an 85/15 read-write ratio, so eMLC offered a better value than the increased write performance—but higher cost—of a Single Level Cell system, such as the RamSan-710.

The benefits for Penn State were immediate. After installing the RamSan-810 systems, the nightly backup times improved by 6 times, dropping from the previous 6 hours down to just 1 hour.

“GPFS allows us to move the metadata from the disk to the RamSan online and once we did that the backups were reduced down to about an hour. So that was a huge difference,” said Holmes. “TMS for this application was the best solution largely because of its maturity and performance. It seemed very stable and it just worked out of the box.

Another thing we’ve seen is increased responsiveness in the file system with users, and commands like listing large directories run five times as fast as they did previously. The IOPS capacity of the RamSan has significantly speeded those up.”

Swapping 200 power-hungry 15K hard disks for two RamSan-810s provided yet another benefit for RCC, reducing its power consumption for metadata storage in the GPFS file systems by 90 percent, from about 5000 watts down to just 500 watts. The power savings are just part of the cascading cost savings that RamSan Flash storage creates. RamSan Flash storage systems not only require less power to operate, but also produce less heat and consumes less rack space, lowering data center cooling and floor space costs.

Penn State is projecting that the current 2 TB capacity of the RamSan will be sufficient for at least 2 - 3 years while retaining plenty of expansion headroom of up to 10 TB for each RamSan-810. According to Holmes, Penn State absolutely made the right decision in choosing Texas Memory Systems.

We’ve had zero problems with the RamSans. During the evaluation we brought them in, plugged them in, zoned them into our Fibre Channel SAN, created LUNs and they’ve just worked ever since then. They’re very well made and a very mature product. Their support has been exceptional.

Products and services used

IBM products and services that were used in this case study.

Hardware:
Storage: IBM FlashSystem 810

Legal Information

© Copyright IBM Corporation 2012 IBM Corporation Systems and Technology Group Route 100 Somers, NY 10589 Produced in the United States of America May 2012 IBM, the IBM logo, ibm.com, Texas Memory Systems, RamSan-810 and RamSan-710 are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the web at “Copyright and trademark information” at ibm.com/legal/copytrade.shtml Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. This document is current as of the initial date of publication and may be changed by IBM at any time. Not all offerings are available in every country in which IBM operates. The performance data discussed herein is presented as derived under specific operating conditions. Actual results may vary. THE INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS” WITHOUT ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING WITHOUT ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND ANY WARRANTY OR CONDITION OF NON-INFRINGEMENT. IBM products are warranted according to the terms and conditions of the agreements under which they are provided. Actual available storage capacity may be reported for both uncompressed and compressed data and will vary and may be less than stated.