Aging solid-state drives

There is a misconception about solid-state drives that's rather pernicious. Some people have grabbed onto the paranoia surrounding the SSDs of several years ago and have hung onto that as gospel truth. What am I referring to? The fundamental truth that our current flash-drive technology has an upper limit on the number of writes per memory cell, coupled with a lack of faith in ingenuity.

Once Upon a Time, it was commonly bandied about that SSD memory cells only had 100,000 writes in them before going bad. Cross that with past painful experience with HD bad sectors and you scared off a whole generation of storage administrators. These scars seem to linger.

The main problem cited is/was hot-spotting on the drive itself. Certain blocks get written to a LOT (the journal, the freespace bitmap, certain critical inodes, etc) and once those wear out, well... you have a brick.

This perception has some basis in truth, most especially in the el-cheapo SSD drives of several years ago, but not any more. The enterprise class solid-state drive has not had this problem for a very long time. The exact technical details have been covered quite a lot in the media and Anandtech has had several good articles on it.

Part of the problem here is the misconception that a storage block as seen by the operating system corresponds with a single block on the storage device itself. This hasn't been the case since the 1980's when SCSI drives introduced sector reallocation as a way to handle bad sectors. Back in the day, and heck right now for rotational media, each hard drive keeps a stash of reallocation sectors that act as substitutes for actual bad sectors. When this happens most operating systems will throw alarms about pre-fail, but the data is still intact. What's more, the operating system doesn't necessarily know which sectors got reallocated. What looks like a contiguous block on the file allocation table actually has a sector significantly apart from the rest, which can impact performance of that file.

Solid state disks take this to another level. SSD vendors know darned well that flash ages, so they allocate a much larger chunk of storage for this exact reallocation scheme. The enterprise SSD drives out there have a larger percentage of this reserve space than consumer-grade SSD drives. As blocks wear out, they're substituted in real time from the reallocation block, and since solid-state-drives don't cause I/O latency increases when accessing non-contiguous blocks you'll never know the difference.

The other thing SSDs do is something called wear-leveling. The exact methods vary by manufacturer, but they all do it. The chipset on the drive itself makes sure that no cell get pounded with writes more than others. For instance, It'll write to a new block and mark the old block as free, while handling an 'overwrite' operation. The physical block corresponding to a logical block can change on a daily basis thanks to this. Blocks that get written to constantly, that darned journal again, will be constantly on the move.

The really high end SSD drives have a super-capacitor built into them and onboard cache. The chipset moves the high-write blocks to that cache to further reduce write-wear. The super-cap is there in case of sudden power loss where it'll commit blocks in the cache into flash. When you're paying over $2K for 512GB of space, this is the kind of thing you're buying.

All of these techniques combine to ensure your shiny new SSD will NOT wear itself out after only 100K writes. Depending on your workload, these drives can happily last three years or more. Obviously, if the workload is 100% writes they won't last as long, but you generally don't want SSD for 100% write loads anyway; you use SSDs for the blazing fast reads.

For modern SSD drives:
  • You do NOT need special SSD-aware filesystems. Generally, these are only for stupid SSD drives like a RAID array of MicroSD cards.
  • For most common workloads you do NOT need to worry about write-minimization.
  • They can handle in the millions to tens of millions of write operations per logical block (yes, it'll consume multiple physical blocks over its lifetime for that, but that's how this works).
It's time to move on.