A needed bit of info about SATA drives

| 1 Comment
Anandtech ran an article recently about enterprise storage. In it they go over SATA vs. SCSI vs. SAS. Most of it I already knew, but towards the back was a kernel of information that I hadn't caught before.

We know that generally speaking SATA drives can't quite keep up to the same kind of workloads that SCSI can. Differences in the manufacturing process, quality control, and the like. I don't fully understand it, which irks me, but there it is. One of those areas is something called 'nonrecoverable read error' rate.

Take a look at this Seagate drive. It's almost the last thing on the spec page. The Nonrecoverable Read Error rate is 1 bit in 1014 bits, or 1 bad read in 12.5TB. Mainline SCSI and FC drives have that error rate as high as 10 to the 15th or 16th. Every 12.5TB of data transferred includes a corrupted bit.

We don't see this as a problem in most enterprise situations because they all run in some form of redundant array setup. RAID5 drivers, usually in the RAID controller, see the bad bit and go to the parity data to fill in the real value. RAID1 drivers go to the mirror. No biggie. The problem comes with RAID5 rebuilds, when the entire array is read in order to generate the parity data. If you have 14 500GB drives in your RAID5 array, that means during a rebuild you transfer around 7TB of data. If a bad bit shows up during the rebuild process, a 56% chance, game over. That's a from-tape rebuild.

This is why systems such as RAID6 are showing up. That's a double parity system, so rebuilding one bad disk does not risk the whole array if a nonrecoverable read error occurs. You lose two disks to parity, but you can still have a 30-disk array without much risk.

One more reason why SATA isn't quite ready for realtime data applications. Nearline, yes, but not realtime. This'll play hob with our ideas for our BCC cluster.


1 Comment

Thanks for the heads up. I run an HP MSA/20 with two Logical arrays, 3 disks each, for my direct to disk backup tasks. If I lost one of those arrays, it's not the end of the world. But I have considered going with SATA for application storage. Now I'll rethink that option.