Open Mic Webcast: Overview of SAN Performance - 24 April 2013 [presentation, audio replay and Q&A transcript attached]
Walter Scanlan, an IBM Senior IT Specialist and Distinguished IT Specialist, provided an overview on improving performance of IBM Collaboration Solutions software on a Storage Area Network (SAN).
Attendees were given an opportunity to ask questions.
Topic: Overview of Storage Area Network (SAN) Performance
Day: Wednesday, April 24, 2013
Time: 11:00 AM EDT (15:00 UTC/GMT, UTC-4 hours) for 60 minutes
For more information about our Open Mic webcasts, visit the IBM Collaboration Solutions Support Open Mics page.
OpenMic_Overview of SAN performance_04242013.pdf
Q: I've been in meetings were vendors are looking for how many "IOPs" our systems can support. Are there any tools I can run as a system administrator to find that?
A: IOP is typically calculated based on MB/sec vs size of each Fibre Channel packet, typically we see 15 - 22k for each packet., at the SAN it is based on the SAN writing blocks, so that would be 4k block size. I argued during the call that basing your Domino performance on an IOP sizing is likely to lead to poor performance, generally Domino is 85% read and 15% write, the reads are random not sequential this information should be taken into consideration when sizing
Absolutely, we need to measure MB/sec, latency (read & write queue length), cache hits vs misses and balance that again cost of storage, from SATA through RAID 5, RAID 1/0 and SSD. Optimally if the SAN can move blocks between the various types (and costs) of disks available to it, we will get the most cost effect blend of disks.
Of course quality of disk and brilliance of SAN architecture all go to pot if the Fabric has massive Fan-In from Guest to SAN, insufficient thoughput (2Gbps FC for example) and if the Guest has insufficient HBAs and too little Cache to function effectivly. IOPS at the SAN end won't win you anything if the rest of the architecture is just plain badly designed.
Measure disk i/o queue lengths....
Q: If we use Domino on VMware and the storage is on a really fast SAN, do I really need a separate disk (LUN or whatever the equivalent is), or is one virtual disk partition each for Domino data, TransLogs and DAOS sufficient - even if shared with other systems? (We do not use TransLogs or DAOS yet, so I don't know yet if it would lead to issues in our environment or not).
A: DAOS is happy on slower larger disks. IBM recommends the Translogs be on "fast disk", this is a sequential write heavy workload as such it can take most advantage of write cache if available, and uses almost no Read cache. Also use a bigger Element size since TxL is fast sequential, and get the format of transaction logs optimized for Domino servers that use a block size different than the default block size of 512 bytes with NOTES.INI option Create_R85_Log=1
And Domino .NSFs seem to be happy on RAID 5 in the middle, with plenty of OS cache for re-reads, at Guest and at NVRAM end of the FC connection.
4/24/13 11:39 AM BrianM
Q: Physical server 2003 standard (32 bit) Domino R8.5.3FP3. WinTel team is complaining of really bad disk fragmentation on the SAN.
A: Not enough information to answer this but Windows file system by design is open to fragmentation. The OS fragmentation must be addressed. Domino NSF structure can also "fragment" within the .NSF context, which means COMPACT must be used to address it. Please remember than Domino is a random IO server. May be worth looking at Defrag.NSF "the world's first Domino-specific database defragmentation product that runs natively as a Domino server task" from Preemptive in Oz: www.preemptive.com.au/defrag.
One of the principal causes of Domino disk fragmentation is the crazy insistence of some users in running compact -c to "reduce" the size of .NSF files. This fails dismally in theory and in practise! Domino needs those "recovered" blocks, so almost as soon as you restart Domino, it is grabbing new blocks to extend the .NSF adding further to the fragmentation, and so on. The correct technique is to allow all the NSF files to reach their "natural" size, defragment the disk (quickest method is copy off, format the disk, then copy back), and then run compact -b (no space recovery) to push the white space to the end of the file where it can be re-used within the .NSF and no need to go extend the file externally to find free blocks. This will make Domino fly withinthe i/o context.
Q: When running IOMeter tests, we consistently see performance ramp-up times of 2-5 minutes before the system really starts to perform. But we can't figure out where, or why, this is happening. This is on Windows 2008 R2 on VMWare on a Netapp SAN w/16 disks. We're also using 16K blocks... is that an optimal size for Domino?
A: I would suspect the VM has insufficient resources (RAM & CPU) defined as it's start-up resources, and VMWare takes a few minutes grabbing more resources from the central pool, which in turn may have to steal resources from another VM who has too much resource allocated. This will take a few minutes before Domino has adequate resources to really fly. Try setting the initial resources for Domino to those you see when it's flying, rather than making it ask for them after startup.
Q: any new Disk i/o performance improvements we can look for in the Domino 9 Server ?
A: Mostly in 8, we tried to better align ourselves to the way SANs do things, and the way VMWare et al do things with bigger buffering and minimizing unnecessary "internal" i/o. Domino 9 was more about Social integration and Client than i/o performance.
If you are suffering i/o issues, look closer to home .... Domino (Mail) primary goal is to save your email to disk, try using Connections and sharing your emails and documents with Connections, that will save 75% of your email generated i/o!
“Email is where knowledge goes to die!” (Bill French)
Q: Thanks so far! ok, translogs: fast and expensive, daos: slow and cheap - what about domino nsf's?
A: RAID 5 is fine for .NSF. "middling" ;-) Even better if you have an intelligent SAN that can move Least Recently Used blocks downwards to cheaper SATA storage and "migrate" heavily used blocks (Names.NSF, Transaction Logs et al) upwards to SSD with the .NSFs in the middle on RAID 5.
Some SANs will auto place busy blocks onto SSD and downgrade DAOS to SATA slower disks.
Q: My main historical challenge is that slow disks for DAOS take a long time to backup.
A: Indeed, but why not do incremental backup for DOAS rather than Full?
4/24/13 11:45 AM Colin Stamp : Use archiving!!! Move old unwanted Mail documents and DAOS files to long term Archive to reduce DAOS & .NSF volumes.
Q: Sure, incremental is essential, but one needs to run full backup occasionally. e.g. week end or month end.
A: Agreed, but the low cost of DAOS SATA storage has down sides ... and technically you don't need to run Full Backups very often at all, the only snag with Incrementals and very infrequent Full Backups is the number of incremental backups you need to restore from to fully recover from the last Full Backup. Same rules as TxL.
Q: Can we have access to that white paper? Yeah, that white paper would be nice to have some facts for the users to argue with..
Q: Those with virtualized environments, are your virtual servers directly connecting to the LUN? Or are you using virtual drives? I've found the latter don't work well performance wise
A: We used to recommend direct attach in VMWare to the disk (LUN) but more recently they have equalized, and both are about the same.
Q: We had to lock DAOS onto Tier 2/3 drives as they kept creeping up onto Tier1 due to high I/O - attachments are big and create lots of I/O
A: Initially DOAS is highly used, but after a few minutes it drops to zero use (ever again, frequently!)
So the SAN needs to continuously monitor for Least recently Used blocks and degrade them down from SSD to RAID 5 and finally to SATA .
The SAN needs to be constantly re-assessing and making these decisions about ALL blocks, not taking an early guess, placing them, then leaving them where they are. I saw a guy who virtualised all his domino servers and put the virtual drives for every server on the same LUN! We call that "dumb"!
Q: So every domino server sharing the same bank of drives in the SAN?
A: Although it depends on the LUN, of course. It if it serviced by 10 HBAs and has 10 channels across the Fabric and 10 ports on the SAN it will be fine....
Q: I'm not sure on that, but I guess my SAN collegues think on the lines of "the SAN system is intelligent enough to use the disks wisely", but from your side I understand that is not necessarily the case?
A: Do the math... each Domino can hit 60 - 80MB/sec sustained, peaking to 100MB/sec, multiply that by the number of DPARs and make sure the LUN can withstand that level of assault...!!
Q: Monitoring disk queue lengths usually flags up the issue.
A: Correct, CPU may be short of RAM and of course short of i/o cache forcing CPU up as system scavenges pages and swaps memory pages to refill the free page pool.
Q: I do view rebuilds on a RAM drive on some servers, RAM is cheap and view indexes small.
A: Great, that's my recommendation.
Q: Around 2 years ago I took the advice from someone on some session to read "SAN for dummies" to be able to talk to SAN admins in their language - I have bought it, but not yet read.. :-)
A: Happy to do SAN for Dummys talks if anyone wants one. With a 64bit OS on VMWare 32GB of RAM will very significantly improve i/o by catching as many re-reads as possible and thus reduce the actual number of i/os across the Fabric to the SAN.
Q: I put translogs on a RAM drive for fun once, but not recommended :)
A: Not if system fails ... no!! But these days they rarely fail at the OS level (panic) which is the ONLY point at which you would lose RAM disks contents.
But as they say "needs discussion!"
Q: A vendor is asking how many IOPs can our system support, and are there any tools that I can run as a system admin to find that?
A: What you are asking is how many IO per second can your system support in the environment. There is going to be theoretical IO capacity that is available for the physical environment. That is not going to provide your vendor or you with any accurate or valid information about the actual environment workload that can be supported. And the reasons for that are the IOP definitions are going to be for the environment, meaning for the entire physical allocation that has been made from the SAN from the switch and the physical system. And when you deploy any solution, there are going to be bottlenecks that are created as a result of both the application and the physical topology that you are deploying on. And I'll return to Domino just as an example. Domino has an IO hotpoint in the environment: names.nsf. and names.nsf is a high IO point in the environment. Names.nsf, when it is written to disk, will be written to one or more drives in the env, but that is a small subset of the IO capacity of the environment. So it's important to understand not what your max IOP rates are in the environment, but what's the maximum IOP density you will have? And that is harder to define because you need to understand where that example names.nsf ends up. Does it end up on solid state drives on the back end, fast SAS drives? Is there caching in front of it? There are just a number of variables that play into whether or not how many IOPs that your system can support has any relevance. The short answer, your system specifications both from the SAN and from whatever hardware vendor you purchased your fiber card adapters are going to define the maximum IOPs on every system. So if you bought store y system for example, there is going to be a max IOP that system can support based on the configuration you deploy, but it is going to have little to no value to help you understand what your real performance characteristics of that solution might be.
Q: We use Domino and vmware and the storages are really FAST SAN. Do I need separate disks or one virtual disk partition for each Domino data translogs DAOS even if we share with other systems. We do not use DAOS translogs yet, so I'd like to know if that 's going to be continuation of the environment or not.
A: I think the question is: Is IBM's best practice is to have dedicated all mount points for transaction logging and for DAOS from the Domino data directory which hosts your end user mailfiles and do I really need that if my SANS is really really fast. The short answer is "no you don't", but we still strongly recommend that you manage those requirements as separate volumes. The reason for that is multiple: DAOS is going to take up a lot of space, Transaction logging is not, but transaction logging is very IO intensive without using a lot of space. So it makes sense, for example, in a large enterprise environment, to put your transaction logs on your fastest drives. Whether that's SSD or FAST SAS drives. But it doesn't make sense to put DAOS there, the costs are high for the solution that it provides. So even in a very very SAS SAN environment, I promise, your network administrator, your storage administrator, has cost structures for each layer or level or type of service being delivered out of that SAN, and if you can quantify where your IO cost is going to be, which IBM does for you in calling out that transaction log, should be on your fastest drives available. That can help reduce the administrator pain in managing the IO requirements for the environment.
Q: Note from our Win Tel people, running Domino 8.5.3 that we're getting bad fragmentation on the SAN. Indexing happening against mailfiles etc...is fairly high data or data volume. So is there something we can do about that?
A: Notes databases are not sequential files, and as a result, as the file size grows, if it's distributed, you're going to have a lot of non-sequential I/O requests, which are the most expensive to perform on disk. The best answer I can give you is that the cost of managing views is a multiple of the size of the index. So the best answer for view indexing is manage your views to be as small as possible. So for mailfiles we recommend using the inbox management tool to get messages out of the inbox which will reduce the size of the index and reduce the indexing cost and there are a number of IO operations that occur on the index on Full Text Indexing you can mount point that into a separate structure so that the disk I/O for FTI is isolated from your end user read request against the actual data itself. Use the INI parameter.
Q: When you have people with 5 -25 Gb mailfiles it's not helping.
A: That's a whole other topic for which I wrote a white paper on the cost of large mailfiles and the majority of the cost, to summarize it, is in view indexing, and the majority of that cost is, effectively, if I have a inbox with 5K documents, and I make one change to it, I am doing 5000 things for one change. If I have an inbox with 100 documents in it, and I add one document, I only have 100 changes to manage. So the size of your inbox is the biggest impact on performance that you are going to see in your environment. Cutting down the inbox size, which is why I added the inbox management tool, to move aged documents out of the inbox. Because the inbox is refreshed ever time the user checks mail, we want to reduce that cost, and it's a multiplier. So effectively you can predict the cost increases in your environment not based on mailfile size, but based on number of entries in your inbox. So managing that down will really reduce not just, to give you an idea: I did a comparison of 1000 entries in an inbox, and 14000 entries in an inbox. Same data, just moved the data out of the inbox and the CPU costs were 1/3 lower, the memory costs were 2/3 lower, the disk I/O costs were 28% lower. You're not getting rid of data, you're just reducing the amount of work Domino has to do every time data changes. The link to the white paper I wrote is put into the discussion now.
Q: We currently have a PMR open with IBM to chase down some intermittent performance issues across all of our servers in our environment which was largely virtualized last year. It smells like a disk issue but we're really looking for the smoking gun. My question would be, based on the information we are getting from events generator, we are seeing consistently notifications that the recovery manager logfile is full. This smells like a backlog issue with disk I/O at the root of it, I was wondering if you had any comment based on your experience?
A: I don't want to prejudge based on this event, I will say that's generally a system constraint. Not necessarily disk, but it is a system constraint. And I am willing to review your information after the call. What needs to be collected are those 4 layers. We need the information statrep data, and prints data from Domino, we need the OS data, and then we need the SAN information. And for your virtualized environment we are going to need, if it's, I don't know if you are on Windows or vmware, or you are using vm controller or whatever you are using, we are going to need that layer as well to understand where our bottleneck is. You're kind of in the position of the classic problem that I work with, in that you are just the application administrator that wants this resolved and there are 5 layers below you that are contributing to the problem. That message just reflects the fact that we are constrained at the Operating System. It is usually disk, but I don't want to prejudge because I've also seen problems where memory caused those types of problems.
Q: We have virtualized our whole organization and the Domino server is the first to go on an equallogic array. As the years went by more and more systems were added to the vmware environment and our Domino servers and mail began to slow. there currently planning on moving towards a vnx array and using auto tiering to handle performance, would it really be a place for me to dedicate and put names.nsf, for example and SSD. Is that technology going be to know what to do with, if we use transaction logging and actually implement it, is it going to know that we are keeping that mono tiered in the SSD area on the arrays?
A: We have to be careful here. For things like Transaction Logging, because the rate is steady, it's going to be pretty good with that. Where you might have problems is probably outside of mail if you host Domino applications and your application workflow is either (a) lumpy or (b) you have a large dataset over which the application is reading. Then any time that application goes into a critical view update or a Full Text index update, which is going to drive massive disk I/O, if that application had been dropped into the same tier 2 service, it's probably not going to be a great experience for your end users. The question is, do you want to try to flag those workflows to be hard stopped into the tier 1 environment in your solution? That's an operational question, meaning, you as the business owner have to decide whether or not the service that has been provided needs to be tier 1 level for performance and availability standpoint rather than dropping it to tier 2. But, where Domino is going to "catch you" is that, anytime it goes into a disk I/O storm, which is basically view indexing, that's going to create your challenge for end user response if you have a bottleneck in your application. Why I say this is, as bad as mailfiles are, they're effectively single-threaded. One user's trying to access a mailfile maybe two with an administrator. But with an applicaiton you have a large user community all going into this database. So if a user has to wait 3 seconds for a view to update, it's not that bad. But if an application has 3 second view update, and you have a new user coming in every second, you're just backing users up because their changes are coming in faster than the view index can actually refresh to show those changes in the view. And you are eventually going to create a hang in the environment if you've had enough users coming in to overwhelm your TCP/IP thread pool. And that's actually a common problem I see with applications. The first question is if you're all mail, then you'll probably manage it OK. If you're a Domino application deployment you might want to evaluate whether or not you have high scalability application from an end user standpoint and then I'd recommend you tag those into your tier 1 environment.
Q: Full Text Indexing. There was an issue on a server the other day; alot of the full text indexes are set for immediate update, not every hour. There was an IBM recommendation against certain full text indexes being immediate. Comments, points of reference, something like that..
A: I am not a proponent, from a cost standpoint, of allowing users to manage or full text index their mailfiles either on the fly or update their FTI. If they're applications that's a different story because you're trying to cost offset. So by creating and managing FTI update on the fly frequently, you're trying to reduce the cost of end users doing searches in the application and you have concurrent users in the application so there is a value and a benefit to doing that. Without data I couldn't tell you which would end up being the better cost for you, meaning, end users having slower or outdated search results by making the FTI each to a specific point where you preselect rebuild or update the index vs allowing indexing on demand. But my default recommendation for applications is go ahead and allow Full Text indexing and update on demand. Therefore, Immediate is recommended.
Q: I am trying to make sure I understand the goal of the presentation. We could use the information ourselves and do our own analysis, but is this presentation also intended to give us a better idea of what kind of information we need to provide if we call IBM Support about an issue?
A: I was not intending to try to give you a template on what you would need to collect before calling IBM on a performance issue. The goal of this session in my mind, was to give administrators, solution administrators, a better understanding of what the logical and physical configuration of a SAN is and understand what datapoints are available within the SAN to better understand the performance characteristics of that SAN. This was really coming out of alot of customer prompts I worked on in the last couple of years where the solution administrator is really just getting a signal from the storage network management team that performance is OK, but not understanding what do you tell from the information that could ask for that is available to help them really understand what is happening. So the example I've given, I'll go back to Domino and names.nsf, we had a customer that had a storage team report disk performance is fine, average response time is 8 ms and everything is great. When we went back in and looked the actual performance data, what we saw is 99% of the responses are within 1 and 1/2 ms. But there were specific object requests that were taking 2 and 3 seconds and it turned out just to be a chokepoint in names.nsf when they were doing view indexing. The storage location wasn't appropriate for that particular database. But it's a question of understanding the right questions to ask. If we're not being fed information as solution administrators and we're not being fed information from our downstream service component drivers then we make bad assumptions or decisions on where the problem might be based on that information. That was my goal in putting together this deck: help you understand what information is available, what to look for. If you're to try and work a performance problem, this would be the set of information I would focus on. If you call IBM for a performance, then you're going to end up getting the same request for data. It is going to be tailored to your environment because in this example, I happened to talk about storewise solution, I talked about 3 different OS's, you might be on SUN for all I know, there is going to be a specific set of data collection. The base forms around these data points, but specific to your environment.
The resources mentioned in this Open Mic have also been posted to our support Twitter account at http://twitter.com/IBM_ICSsupport #icsopenmic
You can also find us on Facebook at: http://www.facebook.com/IBMLotusSupport
Redbook: "Sizing Large-Scale Domino Workloads on iSeries": http://www.redbooks.ibm.com/redpapers/pdfs/redp3802.pdf
Optimizing server performance: Semaphores http://www.ibm.com/developerworks/lotus/library/ls-Semaphores_Part1/side1.html
Check for Disk bottlenecks using Perfmon on Windows Version 6.x servers http://www.ibm.com/support/docview.wss?uid=swg21615403
AIX 6.1 nmon Command syntax reference http://pic.dhe.ibm.com/infocenter/aix/v6r1/index.jsp?topic=%2Fcom.ibm.aix.cmds%2Fdoc%2Faixcmds4%2Fnmon.htm
Detailed disk performance information available from IBM i Disk Watcher http://pic.dhe.ibm.com/infocenter/iseries/v7r1m0/topic/rzahx/rzahxdiskwatcher.htm
Collecting diagnostic information for VMware ESX/ESXi using the vSphere Client: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=653
Optimizing Virtual Infrastructure with PowerVM and the IBM Systems Director https://ibm.biz/BdxgUX
|Organizational Productivity- Portals & Collaboration||WebSphere Portal||Not Applicable|
|Organizational Productivity- Portals & Collaboration||IBM Connections||Not Applicable|
|Organizational Productivity- Portals & Collaboration||IBM Sametime||Not Applicable|
More support for:
Performance / Hang
Software version: 8.5
Operating system(s): AIX, IBM i, Linux, Solaris, Windows, z/OS
Reference #: 7038065
Modified date: 09 May 2013