The new era of big storage...

| 2 Comments

...is full of flash. And that changes things.

Not a surprise at all to anyone paying attention, but there it is. Flash is changing things in many ways:

  • Hybrid SSD+HD drives are now out there on the market, bringing storage tiering to the consumer space.
  • SSD is now kind of a standard for Laptops, or should be. The cheap option still has HD on it, but... SSD man. Just do it.
  • One SSD can fully saturate a 6Gb SATA or SAS link. This changes things:
    • A channel with 12 of those things is going to seriously under-utilize the individual drives.
    • There is no way a RAID setup (hardware, software, or ZFS) can keep up with parity calculations and still keep the drives performant, so parity RAID of any stripe is a bad choice.
    • A system with a hundred of these things on it, channeled appropriately of course, won't have enough system-bus speed to keep them fed.
  • Large scale enterprise systems are increasingly using a SSD tier for either caching or top-level tiering (not all solutions are created equal).
    • ZFS L2ARC + Log
  • They're now coming in PCIe flavors so you don't even have to bother with a HBA.
    • Don't have to worry about that SAS speed-limit anymore.
    • Do have to worry about how many PCIe slots you've got.

Way back in elder days, when Windows NT was a scrappy newcomer challenging the industry dominant incumbent and and said incumbent was making a mint on selling certifications, I got one of those certifications to be a player in the job market (it actually helped). In the studying for that certification I was exposed to a concept I had never seen before:

The Hierarchical Storage Management System.

NetWare had hooks for it. In short, it does for files what Storage Tiering does for blocks. Pretty easy concept, but required some tricky engineering when the bottom layer of the HSM tree was a tape library(1). All scaled-out (note, not distributed(2)) storage these days is going to end up using some kind of HSM-like system. At they very tippy-top you'll get your SSDs. They may even be in the next layer down as well. Spinning rust (disks) will likely form the tier that used to belong to spooling rust (tape), but they'll still be there.

And that tier? It can RAID5 all it wants. It may be 5 disk sets, but it'll have umpty different R5 sets to stripe across so it's all good. The famous R5 write-penalty won't be a big issue, since this tier is only written to when the higher tier is demoting data. It's not like the HSM systems of yore where data had to be promoted to the top tier before it could even be read, we can read directly from the slow/crappy stuff now!(3)

All flash solutions will exist, and heck, are already on the market. Not the best choice for bulk-storage, which is why they're frequently paired with big deduplication engines, but for things like, say, being the Centralized Storage Array for a large VM (sorry, "private cloud") deployment featuring hundreds/thousands of nearly identical VMs... they pay off.

Spinning disks will stick around the way spooling tape has stuck around. Farther and farther from the primary storage role, but still very much used.


[1]: Yes, these systems really did have a tape drive as part of a random-access storage system. If you needed a file off of tape, you waited. Things were slower back then, OK? And let us not speak of what happened when Google Desktop showed up and tried to index 15 years worth of archival data, and did so on 200 end-user workstations within a month.

[2]: Distributed storage is another animal. The flash presence there is less convincing, but it'll probably happen anyway.

[3]: Remember that bit about Google Desktop? Well... "How did we go from 60% used to 95% used on the home-directory volumes in a week? OUR USERS HAVEN'T BEEN THAT USERY!!!" That's what happened. All those brought-from-archive files now landed on the precious, precious hard-drives. Pain teaches, and we figured out how to access the lower tiers.

Also, I'm on twitter now. Thanks for reading.

2 Comments

"There is no way a RAID setup (hardware, software, or ZFS) can keep up with parity calculations and still keep the drives performant, so parity RAID of any stripe is a bad choice."

How then do you get drive redundancy*? So you can't have a Raid 5 or any raid that calculates a checksum because that calculation will just kill your speed. Would a Raid 1 (mirrored) array solve this issue?

What then happens on a Raid 1 array where you have two near identical drives being written to in a near identical manner? If drive 1 collapses, why would drive 2 not collapse at the same time? Sure there might be tiny, tiny imperfections in the manufacturing process but would that differentiate the drives enough for us to have enough time to replace them?

I believe that trust is a great factor in stopping people from moving to SSD. We trust spindals, we need to ditch them, but we need to trust their replacement. Sure we can monitor SSD block writes, check as sectors go bad, but our big question is if Drive A0 fails why won't drive A1 fail at almost the same time?

*Maybe drive redundancy is a thing of the past. We just have more and more nodes, with each node being a backup of another node. That being the case people are still buying bare metal servers and installing them onsite.