Results: Conclusions

| 1 Comment
The objective of this series of tests was to determine how well Open Enterprise Server -- Linux (here referred to as 'Linux') scales when compared to Open Enterprise Server - NetWare (here referred to as 'NetWare'). One of the prime goals was to figure out if we need to throw hardware at our cluster if we decide to migrate to Linux soon. My earlier test had shown that for a single station pounding on a Linux and NetWare server, the Linux server turned in better performance.

I was testing the performance of an NSS volume mounted over NCP. In part this is because NetWare clustering only works with NSS, but mostly because of two other reasons. The only other viable file-server for Linux is Samba, and I already know it has 'concurrency issues' that crop up well below the level of concurrency we show on the WUF cluster. Second, the rich meta-data that NSS provides is extensively used by us. I don't believe any Linux file system has an equivalent for directory quotas.

Hardware

  • HP ProLiant BL20P G2
  • 2x 2.8GHz CPU
  • 4GB RAM
  • HP EVA3000 fibre attached
OES-NetWare config
  • NetWare 6.5, SP5 (a.k.a. OES NetWare SP2)
  • N65NSS5B patch
  • nw65sp5upd1
  • 200GB NSS volume, no salvage, RAID0, on EVA3000
OES-Linux config
  • OES Linux SP2
  • Post-patches up to 9/12/06
  • 200GB NSS volume, no salvage, RAID0, on EVA3000
No attempts at tuning the operating systems were taken. Default settings were used to better resemble 'out of the box' performance. The one exception was on the NetWare IOZONE tests, where MAXIMUM SERVICE PROCESSES was bumped to 1000 from 750 (to no measurable effect, as it turned out).

To facilitate the testing I was granted the use of one of the computer labs on mothballs between terms. This lab had 32 stations in it, though only 30 stations were ever used in a test. I thank ATUS for the lending of the lab.

Client Configuration
  • Windows XP Sp2, patched
  • P3 1.6GHz CPU
  • 256MB RAM
  • Dell
  • Novell Client version 4.91.2.20051209 + patches
  • NWFS.SYS dated 11/22/05
When you look at situations where the Linux server was not bogged down with CPU load, it turned in performance that rivaled and in some cases exceeded that turned in by the NetWare server. This is consistent with my January benchmark. File-create and Dir-create both showed very comparable performance when load was low.

Unfortunately, the Linux configuration hits its performance ceiling well before the NetWare server does. Linux just doesn't scale as well as NetWare. I/O operations on Linux are much more CPU bound than on NetWare, as CPU load on all tests on the Linux server was excessive. The impact of that loading was very variable, though, so there is some leeway.

Both of the file-create and dir-create tests created 600,000 objects in each run of the test. This is a clearly synthetic benchmark that also happened to highlight one of the weaknesses of the NCP Server on Linux. During both tests it was 'ndsd' that was showing the high load, and that is the process that handles the NCP server. Very little time was spent in "IO WAIT", with the rest evenly split between USER and SYSTEM.

The IOZONE tests also drove CPU quite high due to NCP traffic, but it seems that actual I/O throughput was not greatly affected by the load. In this test it seems that Linux may have out-run NetWare in terms of how fast it drove the network. The difference is slight, a few percentage points, but looks to be present. I regret not having firm data for that, but what I do have is suggestive of this.

But what does that mean for WWU?

The answer to this comes with understanding the characteristics of the I/O pattern of the WUF cluster. The vast majority of it is read/write, with create and delete thrown in as very small minority operations. Backup performance is exclusively read, and that is the most I/O intensive thing we do with these volumes. There are a few middling sized Access databases on some of the shared volumes, but most of our major databases have been housed in the MS SQL server (or Oracle).

For a hypothetical reformat of WUF to be OES-Linux based, I can expect CPU on the servers doing file-serving to be in the 60-80% range with frequent peaks to 100%. I can also expect 100% CPU during backups. This, I believe, is the high end of the acceptable performance envelope for the server hardware we have right now. With half of the nodes scheduled for hardware replacement in the next 18 months, the possibility of dual and even quad-core systems becomes much more attractive if OES Linux is to be a long term goal.

OES-Linux meets our needs. Barely, but it does. Now to see what OES2 does for us!

Tags: ,

1 Comment

Great write up. You weblog is one of the most informative on the web. I have a slightly related question. On the topic of clusters, do you find the benefits of a cluster/SAN setup out weighed by the increased complication in node upgrades/patching and the "all your eggs in one basket" when it comes to storage on the SAN. We currently have our servers broke down by school(k-12), admin, and bordermanager. In a perfect world, I wouldn't have everything so spread out. It makes administration somewhat more complicated to those less familiar with eDirectory and file server duties. Shared storage worries me for two reasons. The first being what I alluded to earlier, in that I have the impression that all your eggs are in one basket. The second, is that it's an expensive proposition. If I ever try to push for it, it needs to be as rock solid as you can get. iScsi is an option from a feature standpoint, but the performance probably isn't there compared to a fiber channel SAN.