Know your I/O

We recently had one of our student-workers who worked on the ADMCS helpdesk up and graduate. On his last day, he spent a good 30 minutes pumping me for storage information. We're a teaching institution, and I don't get asked that often, so I didn't mind. It did, however, get me thinking about how storage is managed. Now, there are better books (and blogs) on this than what I'm writing here, but this is how I think of it.

Know your I/O: Access Patterns

When planning the storage for a system one thing you need to know above all other information is how that data is going to be accessed. This isn't WebDAV vs. SMB, this is more storage specific. I'm talking about 95% reads 5% writes, 1.2Gb/minute average transfer, highly latency sensitive. That kind of thing. You can make some assumptions based on the applications that'll be accessing the storage, but if you really need to know, the only way to find out is measuring.

Once you know how that data is going to be accessed, you can build or provision its storage accordingly. Knowing how likely the dataset is to grow is also something you need to know, but that's a luxury we often don't get. And for the love of performance metrics, don't forget peak loading and behavior under fault conditions.

Here are several areas to be thinking of when looking at a storage request.

Read/Write Percentage

Knowing what percentage of your I/O operations are reads and writes tells you a lot about what kind of storage it needs. A database transaction log for instance is by definition 100% writes, unless a roll-back is called for. A web-server like WWU's MyWeb service (home for this blog from 2004-2009) is 100% reads except for logging. A file-server supporting the Purchasing and Accounting offices is probably 30% writes.

You need to know this percentage since writes are more expensive than reads. Something that's mostly writes, like that database transaction log, probably shouldn't go on a parity based RAID like RAID5 because writes incur both a parity calculation and a second write before the data is considered committed. Something that's mostly reads, like that MyWeb service, can make great use of caching for speed improvements which allows the use of slower storage.

Also be aware what impact your backup system has on this percentage. That's a process that is by definition 100% read when it occurs. Make sure your system can handle that when it happens.

Average and Peak I/O Rates

How much data is in transit at any given time, and how bursty is it? A web-server like MyWeb ran between 3-6GB a day, and was fairly bursty. A file-server like our big guys can do 500GB in a work-day with a constant but highly variable transfer rate. A backup-to-disk server is running as close to flat-out as it can get, all writes, for 12-18 hours a day and can do multiple Terabytes a day. And that's just average I/O rates.

Peak I/O still needs to be designed for. In a lot of cases, the biggest peak I/O event is when the backup hits. Knowing what your peak is allows you to make sure your storage system can keep up. If your application is almost entirely reads, even during peak, then increased caching should be sufficient to handle the peak. If your peak is mostly writes, then you may need more/better storage hardware to handle it. Knowing the peak allows you to be prepared for it.

Latency Sensitivity

Not all I/O operations are created equal. Some applications require I/O operations to be in a specific sequence and will wait for each I/O op to complete before continuing on to the next operation. Others, such as file-servers aren't sensitive in that way and can happily handle multiple I/O streams even if one is bogging down for some reason. Knowing how sensitive to latency an application is tells you how much performance variability it can tolerate.

A database doesn't consider a transaction committed until the transaction-log write is committed. If the transaction-log volume is suffering high write latency for some reason, perhaps a RAID rebuild is underway, it can slow the entire database down even though the much larger data volume is still working just fine. This is how poor performance in one area can slow an entire service down.

Not all applications are latency sensitive. For applications that are mostly read, caching can provide a buffer between storage that's suddenly slow and the data consumer. Still others, such as a print-server, just plain don't care.

For applications that are highly latency sensitive, you need to ensure that latency stays low during average and peak I/O periods, as well as during exceptional events such as disk outages in the underlaying storage. Applications that don't care can tolerate a much wider range of storage performance, which allows you to engineer your storage around average I/O rather than peak I/O.

I/O Access Type

How random are your I/O operations? For a transaction-log on a database, the I/O is almost entirely sequential. For the data volume on a database the accesses are very likely to be highly random. For a file-server, accesses are going to be very highly random with occasional bursts of sequential. For a disk-image server, accesses are going to be very significantly sequential. A backup-to-disk server is going to be exclusively sequential (unless a de-duplication technology is in use at which point it may be highly random). Each of these has their own problems.

Largely random I/O requires storage that can keep up with that kind of access pattern, which historically has been SCSI and its descendant technologies, Fibre Channel and SAS. Solid-state drives excel at this kind of I/O pattern. SATA is not so good, as it hits bottlenecks faster than an equivalent SAS drive would.

Largely sequential I/O is what storage makers dream of, since its the easiest. Rotational media can actually beat out solid-state in many cases, as this is the single best access pattern for rotational media (especially highly sequential writes). Largely sequential I/O is also highly sensitive to fragmentation, which turns that nice sequential read into a bunch of random reads and tanks performance.

This is one area where choice of operating system and file-system can make a very significant impact. Some file-systems, such as extent-based systems like XFS or EXT4, are designed around minimizing file-level fragmentation. Others, such as DOS FAT, have no awareness of fragmentation and are therefore highly prone to allowing it.

This is another area where backup processing can make a major difference. A file-server that's very highly randomized during the business day can have a significantly sequential I/O pattern during the backup. Backup performance will suffer fragmentation penalties much sooner than end-users will notice anything. For a generic file-server that's probably OK. For a backup system that's struggling to stay within a backup window, it needs attention.

Storage Failure Handling

This is a sneaky one. You need to know how your storage performs when something has gone wrong. Perhaps a controller died and only one is left to handle I/O. Perhaps a RAID5 array lost a drive and is rebuilding. Perhaps a path failed on the Fibre Channel fabric which forced a fail-over to another path, pausing I/O for a few critical seconds. All of these can impact your storage performance.

If you know that RAID rebuilds take several days, you need to plan your storage so it can handle peak I/O events while simultaneously handling a rebuild. This very thing caught me out, just search this blog for "MSA1500" to get all the details. You really do not want to go live on a storage platform, only to find out the hard way that a failed drive causes the application to crash due to latencies.

Know how your storage system behaves during a failure. This may require simulating failures after you receive it just so you can find out for yourself. This kind of thing is very rarely included on spec-sheets, so you have to investigate on your own.

Size and Growth

Last on my list, but still important. How much data you're working with makes a big difference. If your storage request is asking for 32GB of storage for use with a highly latency sensitive app, highly random I/O, and minimal growth prospects, you can fulfill that request with solid state drives and even cost reasonably. If they're asking for 3.2TB of storage with the same requirements, that's another story all together; that kind of storage system is going to be very expensive.

Like I said, there are many things to consider with a storage request. How things are handled at the hardware level makes a big different as well. The HP EVA, and I believe the Xiotech line and probably the EMC stuff too, purposely randomize block locations on spindles so all I/O is effectively random I/O as far as the physical drives are concerned. Your storage components can hide problems in any of the above. Or make them much, much worse. That's for the next article in the series.

Know your I/O: Access Patterns

Know your I/O: The Components

Know your I/O: The Technology

Know your I/O: Caching

Know your I/O: Putting it together, Blackboard

Know your I/O: Putting it together, Exchange 2007 Upgrade