NTFS and fragmentation

I've known for a while that filesystem fragmentation can seriously affect NTFS performance. This isn't just run of the mill, "frag means using random access patterns for what should be sequential I/O," performance degradation. I'm seeing it on volumes backed by EVA arrays which purposely randomize I/O to the disk spindles. Clearly something in the meta-data handling degrades significantly when frag gets to a certain point.

Today I found out why that is.

http://blogs.technet.com/askcore/archive/2009/10/16/the-four-stages-of-ntfs-file-growth.aspx

As the number of fragments increase, the MFT table has to track more and more fragments. Once the number of fragments exceeds how much can be stored directly into the MFT, it starts adding indirection layers to track the file extents.

If you have a, say, 10GB file on your backup-to-disk system, and that file has 50,000 fragments, you are absolutely at the 'stage 4' listed in that blog post. Meta-data operations on that file, such as tracking down the next extent to read from if you're doing a restore or copy, will be correspondingly more expensive than a 10GB file with 4 fragments. At the same time, attempting to write a large file that requires such massive fragmentation in turn requires a LOT more meta-data operations than a big-write on an empty filesystem.

And this, boys and girls, is why you really really really want to avoid large fragmentation on your NTFS-based backup-to-disk directories. Really.