OpenSolaris

| 6 Comments
I've been checking out OpenSolaris for a NAS possibility, and it's pretty nifty. A different dialect than I'm used to, but still nifty.

Unfortunately, it seems to have a nasty problem in file I/O. Here are some metrics (40GB file, with 32K and 64K record-sizes).

OpenFiler                                 random  random
              KB  reclen   write    read    read   write
        41943040      32  296238  118598   15682   62388
        41943040      64  297141  118861   23731   86620

OpenSolaris                               random  random
              KB  reclen   write    read    read   write
        41943040      32  259170 1179515    8458    7461
        41943040      64  244747 1133916   13894   13001
The identical hardware, but different operating system. I've figured out that the stellar Read performance is due to the zfs 'recordsize' being 128k. When I drop it down to 4k, similar to the block-size of XFS in OpenFiler, the Read performance is very similar. What I don't get is what's causing the large difference in random I/O. Random-write is exceedingly bad. With the recordsize dropped to 4K on XFS the random-read gets even worse; I haven't stuck through it enough to see what it does to random-write.

Poking into iostats show that both OpenFiler and OpenSolaris are striping I/O across the four logical disks available to them. I know the storage side is able to pump the I/O, as witnessed by the random-write speed on OpenFiler. The chosen file-size is larger than local RAM so local caching effects are minimized.

As I mentioned back in the know-your-IO article series, random-read is the best analog of the type of I/O pattern your backup process follows when backing up large disorganized piles of files. Cache/pre-fetch will help with this to some extent, but the above numbers give a fair idea as to the lower bound of speed. OpenSolaris is w-a-y too slow. At least, how I've got it configured, which is largely out-of-the-box.

Unfortunately, I don't know if this bottleneck is a driver issue (HP's fault) or an OS issue. I don't know enough of the internals of ZFS to hazard a guess.

6 Comments

Please post details of the underlying hardware, RAID volumes, how you create the ZFS pool, etc. The same for Linux/XFS and how you tested performance.

It might be that your raid cache is actually flushing its write cache on transaction commit. Try to do as root:

echo zfs_nocacheflush/W0t1 | mdb -kw

And see if it makes any difference for your random writes.

To revert to default:

echo zfs_nocacheflush/W0t0 | mdb -kw

To make it permanent set the following parameter in the /etc/system file:

set zfs:zfs_nocacheflush = 1

and reboot the server.

For more details see http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide

another thing is - you are doing iozone -s 40G -r 32k -r 64k
so I would set ZFS's recordsize to 32K and not the default 128KB.
For random writes the default recordsize value with abot iozone will case each 32KB to be a 128KB read, modify and 128KB write and it may drive your numbers down badly.


Starting with build 132 the CPQary driver is integrated into Open Solaris.
So maybe you should try build 134?