One of my post-brainshare tasks is to rebenchmark some OES performance. I did a benchmark series back in September and the results there weren't terribly encouraging. I learned at BrainShare that a mid-December NCPSERV patch fixed a lot of performance issues, and I should rerun my tests. Okay, I can do that.
One test I did underlines the need to tune your cache correctly. Using the same iozone
tool I've used in the past, I ran the throughput test with multiple threads. Three tests:
20 threads processing against a separate 100MB file (2GB working set)
40 threads processing against a separate 100MB file (4GB working set)
20 threads processing against a separate 200MB file (4GB working set)
The server I'm playing with is the same one I used in September. It is running OES SP2, patched as of a few days ago. 4GB of RAM, and 2x 2.8 P4 CPU's. The data volume is on the EVA 3000 on a Raid0 partition. I'm testing OES througput not the parity performance of my array. Due to PCI memory, effective memory is 3.2GB. Anyway, the very good table:
20x100M 40x100M 20x200M
Initial write 12727.29193 12282.03964 12348.50116
Rewrite 11469.85657 10892.61572 11036.0224
Read 17299.73822 11653.8652 12590.91534
Re-read 15487.54584 13218.80331 11825.04736
Reverse Read 17340.01892 2226.158993 1603.999649
Stride read 16405.58679 1200.556759 1507.770897
Random read 17039.8241 1671.739376 1749.024651
Mixed workload 10984.80847 6207.907829 6852.934509
Random write 7289.342926 6792.321884 6894.767334
The 2GB dataset fit inside of memory. You can see the performance boost that provides on each of the Read tests. It is especially significant on the tests designed to bust read-ahead optimization such as Reverse Read, Stride Read, and Random Read. The Mixed Workload test showed it as well.
One thing that has me scratching my head is why Stride Read is so horrible with the 4GB data-sets. By my measure about 2.8GB of RAM should be available for caching, so most of the dataset should fit into cache and therefore turn in the fast numbers. Clearly, something else is happening.
Anyway, that is why you want to have a high cache-hit percentage on your NSS cache. This is also why 64-bit memory will help you if you have very large working sets of data that your users are playing on, and we're getting to the level where 64-bit will help. And will help even though OES NCP doesn't scale quite as far as we'd like it to. That's the overall question I'm trying to answer here.