Monday, January 07, 2008

I/O starvation on NetWare, another update

I've spoken before about my latency problems on the MSA1500cs. Since my last update I've spoken with Novell at length. Their own back-line HP people were thinking firmware issues to, and recommended I open another case with HP support. And if HP again tries to lay the blame on NetWare, to point their techs at the NetWare backline tech. Who will then have a talk about why exactly it is that NetWare isn't the problem in this case.

This time when I opened the case I mentioned that we see performance problems on the backup-to-disk server, which is Windows. Which is true, when the problem occurs B2D speeds drop through the floor; last Friday a 525GB backup that normally completes in 6 hours took about 50 hours. Since I'm seeing problems on more than one operating system, clearly this is a problem with the storage device.

The first line tech agreed, and escalated. The 2nd line tech said (paraphrased):
I'm seeing a lot of parity RAID LUNs out there. This sort of RAID uses CPU on the MSA1000 controllers, so the results you're seeing are normal for this storage system.
Which, if true, puts the onus of putting up with a badly behaved I/O system onto NetWare again. The tech went on to recommend RAID1 for the LUNs that need high performance when doing array operations that disable the internal cache. Which, as far as I can figure, would work. We're not bottlenecking on I/O to the physical disks, the bottleneck is CPU on the MSA1000 controller that's active. Going RAID1 on the LUNs would keep speeds very fast even when doing array operations.

That may be where we have to go with this. Unfortunately, I don't think we have 16TB of disk-drives available to fully mirror the cluster. That'll be a significant expense. So, I think we have some rethinking to do regarding what we use this device for.

Labels: , , , ,


Comments:

<< Home

This page is powered by Blogger. Isn't yours?