September 2006 Archives

Oops, BrainShare 07 big thing

Show's what I pay attention to.

http://www.novell.com/coolblogs/?p=394

So. The next Zen will be the big focus of BS-07, most likely.

Here is more:

http://www.novell.com/coolblogs/?p=542

Tags: ,

2nd day of classes

Today is the second day of classes. So far it is largely going well, except for some lingering blackboard issues relating to automatic enrollment in classes. Printing is back to its normal in-term volumes. Web usage of myfiles is up, though myweb has stayed mostly flat. All computer labs are up and running.

This also means that project-wise things are really quieting down around here. The first week of classes is always spent mostly on fire watch, so not a lot else gets done. It also means that after-hours, "it's broke, fix it right now," calls are sadly more common. It's amazing how easy it is to get used to not having students around, having them all here suddenly is a bit of a shock.

One amusing note. My boss is in a corner office that looks across the street at an apartment complex that houses a lot of students. He has remarked humorously at the number of wireless access points his laptop picks up. So far, at least 15 are visible.

OES2 release pushed beyond BrainShare

To quote:
Please note that Open Enterprise Server services currently run on SUSE Linux Enterprise Server 9. New purchases of Open Enterprise Server will not include SUSE Linux Enterprise Server 10 until it officially becomes part of Open Enterprise Server in the next release, scheduled for mid-2007.
Hmm. This tells me that what we'd be seeing at BrainShare '07 will be beta builds of OES2. March is not 'mid-2007'.

This further brings the question of what the Big Thing will be at BS-07. Last year it was SUSE 10. All. Over. The. Place. OES2 will be big for me, but I'm not convinced that Novell will give the next OES the same push it did for SLES 10. I'm a bit irked that they seem to be minimizing the file and print serving that made the company, but that's just business; file-servers don't make for profit anymore. On the other hand, I may be wrong.

The flag-ship products are SLES, GroupWise, Zen, and Identity Manager. IM is a big consulting driver, and still a hot technology, so that'll still get a big focus. Zen7 SP1 is recently enough out the door that SP2 or even a version 8 is probably not going to happen by BrainShare time. GroupWise 7 has been out a while now, but I haven't heard any mumblings about a v8 for that product.

On the other hand, openSUSE 10.2 is in Alpha right now. According to the roadmap 10.2 will release Devember 7th. What this means for SLES is unclear to me, but it could mean that beta builds of SLES 10.2 may be available at BrainShare. You can find a list of changes from 10.1 to 10.2 (for openSUSE, this isn't for SLES) here. The changes aren't terribly significant, just improvements to the XWindows environment (both Gnome and KDE), and related applications.

So no, I can't yet tell what the Big All Consuming Message will be. Eh. Time will tell.

Tags: ,

Making data-junkies happy

TodayI made a data-junkie happy. I pointed out the ODBC driver for eDirectory. With that you can do anything you could normally do with ODBC. Such as write reports, or find out which test accounts haven't been used in 12 months. Fun stuff like that.

Once Upon a Time this was Developer.novell.com only option, but somewhere along the line it snuck into the ConsoleOne directory. Take a look at...

ConsoleOne\1.2\reporting\bin\odbc.exe

Use and enjoy.

Tags: ,

Plugging holes with Zen!

This morning's SANs diary says that the infocon level is at Yellow. This happens pretty rarely, but it only elevates for a good reason. In this case a VML vulnerability and exploit have emerged in the last few days. You can read about CERTs description of it here.

There are a number of ways to get around the problem, but Microsoft has suggested a few. You can read their take on things here.

It turns out that one of the methods recommended by Microsoft is actually pretty easily done through Zen for Desktops.

Un-register Vgx.dll

Microsoft has tested the following workaround. While this workaround will not correct the underlying vulnerability, it helps block known attack vectors. When a workaround reduces functionality, it is identified in the following section.

Note The following steps require Administrative privileges. It is recommended that the system be restarted after applying this workaround. It is also possible to log out and log back in after applying the workaround however; the recommendation is to restart the system.

To un-register Vgx.dll, follow these steps:

1.

Click Start, click Run, type "regsvr32 -u "%ProgramFiles%\Common Files\Microsoft Shared\VGX\vgx.dll"" (without the quotation marks), and then click OK.

2.

A dialog box appears to confirm that the un-registration process has succeeded. Click OK to close the dialog box.

Impact of Workaround: Applications that render VML will no longer do so once Vgx.dll has been unregistered.

To undo this change, re-register Vgx.dll by following the above steps. Replace the text in Step 1 with "regsvr32 "%ProgramFiles%\Common Files\Microsoft Shared\VGX\vgx.dll”" (without the quotation marks).

As I said, this is fairly simple to do through ZenWorks. Create a new Application Object and enter in the details manually. Put this on the "path to file"

%*WINDIR%\System32\regsvr32

And this in the Parameters:

-u "%*ProgramFiles%\Common Files\Microsoft Shared\VGX\vgx.dll"

Set it to run in system impersonation and associate it how you will with a force-run and probably run-once. To undo it once the patch it out or you have confidance that your AntiVirus vendor will catch the bug, re-registering it the same way is just as easy.

Note: This is just a wild idea, not something we have running. We might, but we have several layers of approvals to get through before we push something like this out to everyone. Feel free to riff on this idea to your own needs.

The pro/con of clustering

| 1 Comment
A question was posed:
On the topic of clusters, do you find the benefits of a cluster/SAN setup out weighed by the increased complication in node upgrades/patching and the "all your eggs in one basket" when it comes to storage on the SAN.
One of the biggest things to get used to with clustering is that your uptimes for your cluster nodes will go down dramatically from what you're used to with your existing mainline servers, but your service uptimes will go up. Once we put in the cluster we haven't had an unplanned multi-hour outage that wasn't attributable to network issues. The key here is 'unplanned'. We've had several planned outages for both service-packing and actual hardware upgrades to the SAN array itself.

Prior to the cluster WWU put in three 'facstaff' servers and three 'student' servers to handle user directories and shared directories. This way when one server died, only a third of that class of user was out of luck. The cluster still follows this design for the user directories, but that's more for load-balancing between the cluster nodes than disaster resiliance. Since the cluster went in we've merged all of our facstaff shared volumes into a single volume. This was done because we were getting more and more cases of departments needing access to both Share1 and Share3, and we didn't have drive letters for that.

Patching and service-packing the cluster is easier than it would be with stand-alone servers. I can script things so that three of our six cluster nodes vacate themselves from the cluster in the middle of the night, so I can apply service-packs to them in the middle of the day. Repeat the same trick the next day. I can have a service-pack rolled out to the cluster in 48 hours with no after hours work on my part. THAT is a savings (unless you're counting on the overtime pay, which I don't get anyway).

The downside is the 'eggs in one basket' problem. If this building sinks into the Earth right now, WWU is screwed. Recovering from tape, after we get replacement hardware of course, would take close to a week. Don't think we haven't noticed this problem.

To be fair, though, we'd have this problem even if we were still on separate servers. True disaster recovery requires multi-location of data and services, which stand-alone servers also suffer from. Under the old architecture and presuming those servers were split between campus and our building, the 'building sinking into the ground' scenario would cause a significant portion of campus to stop working and a significant portion of students to lose everything for the days it'd take us to recover from tape. During that time WWU's teaching function would probably halt as, 'the Earth ate my homework,' would be a very valid excuse.

In our case losing a third or two thirds of all user-directory and shared-directory data would halt the business of the university. While the outage wouldn't be quite as severe as it would be if our SAN melted, it would be just as disruptive. Because of that, going for an 'all or nothing' solution that increases perceived uptime was very much in order.

We're in the process of trying to replicate our SAN data to a backup datacenter on campus. We can't afford Novell's Business Continuity Cluster which would provide automation to make this exact thing work. So we're having to make do on our own. We don't yet have a firm plan on how to make it work, and the 'fail back' plan is just a shaky; we only got the hardware for the backup SAN a month ago. It will happen, we just don't know what the final solution will look like.

As for iSCSI versus FibreChannel, my personal bias is for FC. However, I fully realize that gigabit ethernet is w-a-y cheaper than any FC solution out there today. I prefer FC because the bandwidth is higher and due to how it is designed I/O contention on the wire has less impact to overall performance. Just remember that iSCSI really really likes jumbo frames (MTU >1500 bytes), and not all router techs are OK with twiddling that; you may end up with a parallel and separate ethernet setup between your servers and the iSCSI storage.

As for iSCSI throughput, I haven't done tests on that. However I just got done looking at a whole bunch of throughput tests in and out of our FC SAN. During the IOZONE tests on NetWare, I recorded a high-water mark of 101 MB/s out of the EVA. This is 80% GigE speed, and therefore theoretically this transfer rate was doable over iSCSI. The true high-water mark was achieved by running IOZONE on the Linux server locally on the server, and on a cluster node running TSATEST on a locally mounted volume. At that time I saw a maximum transfer rate of 146 MB/s, which is 117% of GigE speed, so iSCSI wouldn't have been able to handle that. On the other hand, during day to day operations and during backups the transfer rate has never exceeded the 125 MB/s GigE mark. It's come close, but not exceeded it.

Tags:

Results: wild speculation

The question on my mind is why is ncp serving on OES-Linux so much more resource intensive than OES-NetWare? The answers are not immediately clear, and I lack certain developer tools to answer why that may be. So I'm left with wild speculation, which I'll indulge in.

I strongly suspect a contributing factor is where the code executes. In NetWare everything is in Ring 0 (kernel-land) unless exiled to a Protected Memory Space whereupon it executes in Ring 3 (user-land). My CNE classes said that stuff running in a protected memory space typically runs 3-5% slower than in the OS memory space on NetWare. On Linux, at least as far as the 2.6 kernels anyway, memory accessible from Ring 0 is limited to the first 1GB of RAM and most processes are supposed to run in Ring 3. This is the architecture that permits things like "kill -9 [pid]" to work on Linux, but abend the server in NetWare.

There was a very handy slide at BrainShare 2006 that showed the differences in the NCP/NSS architecture in NetWare and Linux. The session was IO104: File System Roadmap by Richard Jones. Because you can purchase your very own BrainShare DVD, I'm going to assume that any NDAs on this information have lapsed. You'll want to open these links in different tabs, I'll be referring to the contents of them.

IO104 Slide 40: Linux and NetWare Architectures

The NetWare architecture is very familiar. I've been looking at that chart for years. The thing to note is that the NSS and NCP bits are right next to eachother in kernel-land, so run well together with little interference.

IO104 Slide 41: NSS on Linux in OES

This is how NSS and NCP are crammed into Linux. The 'up call' box is how communication between kernel-land and user-land are performed. Every piece of I/O that comes in on an NSS volume over any file protocol, NCP, Samba, NFS, or AFP, has to pass the user/kernel interface. If you look at slide 40 you can see that this is true for all file-systems on Linux.

The side information on slide 41 hints at a major problem when OES-Linux first shipped. At that time the file-cache was being kept in kernel-land like it is in NetWare. This gave some screaming numbers. Unfortunately Linux is limited to 1GB of RAM in kernel-land, and that has to be shared with everything in kernel-land. So it screamed... so long as you had very small file systems. Ahem. SP1 changed that so NSS could use Linux's native caching mechanism. It dropped the speed a bit, but it could again handle large file-systems.

Since every I/O request on a file-system has to pass the computing equivalent of the blood/brain barrier, this introduces certain lags. The true impact of this is unknown to me, as my linux-fu is too weak to know where to stick the probes to get an idea as to where all that CPU is going. Watching the split of load types I clearly saw that the CPU spent very little time in IOWAIT, and split roughly evenly between USER and SYSTEM. The NCP server was doing something, but NSS (all that SYSTEM time) clearly was quite busy as well. Due to how file-servers are handled on Linux if I had run this against Samba the busy process would have been SMBD, since CPU for file-system work is 'charged' against the calling process.

Then there is the possibility of just not having fully optimized code. I've heard that NSS as a linux file system runs 'only' 12% slower than reiser (when called locally on the Linux server, and not over a file-serving protocol), which says that NSS is pretty butch as it is. Scale is the key question, though.

The same File System Futures presentation had a few slides about where NSS is likely to go in future revisions of OES and SLES, where 'future' is likely the version past the one coming out Real Soon Now, and it looks quite promising. The block diagram for how the NetWare Services shim into Linux is much cleaner. The plan, as of March, was to shim in a 'NetWare Modular Features' layer between the file-systems and the Virtual File Services layer. The advantage to this would be at a minimum NetWare-style trustees on reiser, JFS, UFS, etc.

Once the next version of OES ships I'll see if I can get the hardware to re-run the dir-create and file-create tests. Even doing a single workstation should tell me what improvements, if any, were put into OES when it comes to scalability.

Tags: ,

Results: Conclusions

| 1 Comment
The objective of this series of tests was to determine how well Open Enterprise Server -- Linux (here referred to as 'Linux') scales when compared to Open Enterprise Server - NetWare (here referred to as 'NetWare'). One of the prime goals was to figure out if we need to throw hardware at our cluster if we decide to migrate to Linux soon. My earlier test had shown that for a single station pounding on a Linux and NetWare server, the Linux server turned in better performance.

I was testing the performance of an NSS volume mounted over NCP. In part this is because NetWare clustering only works with NSS, but mostly because of two other reasons. The only other viable file-server for Linux is Samba, and I already know it has 'concurrency issues' that crop up well below the level of concurrency we show on the WUF cluster. Second, the rich meta-data that NSS provides is extensively used by us. I don't believe any Linux file system has an equivalent for directory quotas.

Hardware

  • HP ProLiant BL20P G2
  • 2x 2.8GHz CPU
  • 4GB RAM
  • HP EVA3000 fibre attached
OES-NetWare config
  • NetWare 6.5, SP5 (a.k.a. OES NetWare SP2)
  • N65NSS5B patch
  • nw65sp5upd1
  • 200GB NSS volume, no salvage, RAID0, on EVA3000
OES-Linux config
  • OES Linux SP2
  • Post-patches up to 9/12/06
  • 200GB NSS volume, no salvage, RAID0, on EVA3000
No attempts at tuning the operating systems were taken. Default settings were used to better resemble 'out of the box' performance. The one exception was on the NetWare IOZONE tests, where MAXIMUM SERVICE PROCESSES was bumped to 1000 from 750 (to no measurable effect, as it turned out).

To facilitate the testing I was granted the use of one of the computer labs on mothballs between terms. This lab had 32 stations in it, though only 30 stations were ever used in a test. I thank ATUS for the lending of the lab.

Client Configuration
  • Windows XP Sp2, patched
  • P3 1.6GHz CPU
  • 256MB RAM
  • Dell
  • Novell Client version 4.91.2.20051209 + patches
  • NWFS.SYS dated 11/22/05
When you look at situations where the Linux server was not bogged down with CPU load, it turned in performance that rivaled and in some cases exceeded that turned in by the NetWare server. This is consistent with my January benchmark. File-create and Dir-create both showed very comparable performance when load was low.

Unfortunately, the Linux configuration hits its performance ceiling well before the NetWare server does. Linux just doesn't scale as well as NetWare. I/O operations on Linux are much more CPU bound than on NetWare, as CPU load on all tests on the Linux server was excessive. The impact of that loading was very variable, though, so there is some leeway.

Both of the file-create and dir-create tests created 600,000 objects in each run of the test. This is a clearly synthetic benchmark that also happened to highlight one of the weaknesses of the NCP Server on Linux. During both tests it was 'ndsd' that was showing the high load, and that is the process that handles the NCP server. Very little time was spent in "IO WAIT", with the rest evenly split between USER and SYSTEM.

The IOZONE tests also drove CPU quite high due to NCP traffic, but it seems that actual I/O throughput was not greatly affected by the load. In this test it seems that Linux may have out-run NetWare in terms of how fast it drove the network. The difference is slight, a few percentage points, but looks to be present. I regret not having firm data for that, but what I do have is suggestive of this.

But what does that mean for WWU?

The answer to this comes with understanding the characteristics of the I/O pattern of the WUF cluster. The vast majority of it is read/write, with create and delete thrown in as very small minority operations. Backup performance is exclusively read, and that is the most I/O intensive thing we do with these volumes. There are a few middling sized Access databases on some of the shared volumes, but most of our major databases have been housed in the MS SQL server (or Oracle).

For a hypothetical reformat of WUF to be OES-Linux based, I can expect CPU on the servers doing file-serving to be in the 60-80% range with frequent peaks to 100%. I can also expect 100% CPU during backups. This, I believe, is the high end of the acceptable performance envelope for the server hardware we have right now. With half of the nodes scheduled for hardware replacement in the next 18 months, the possibility of dual and even quad-core systems becomes much more attractive if OES Linux is to be a long term goal.

OES-Linux meets our needs. Barely, but it does. Now to see what OES2 does for us!

Tags: ,

Passing of an era

A quick break from benchmark stuff, but over the past 7 days an era has passed. The feed file to this blog was passed up by another file for most-hit file. The file itself is a page-header on another html page that is quickly going to pass me right up. Though, if you go to the professor's page you can see why its rank is suddenly rising. The title, "Research on Romantic Relationships", is catchy enough, but the prof is offering payment for participation. Same-sex partners only, please.

Results: IOZONE and throughput tests

Unfortunately for me there were significant problems with the iozone and throughput tests. With iozone it is very, very clear that some form of client-side caching took place during the NetWare tests that did not occur during the Linux test. This seriously tainted the data. On NetWare, one station recorded a throughput of 292715 for a 16MB file size and 32KB record size, yet that same station at the same data-set recorded a throughput of 6602 on Linux. Yet, somehow, the total run-time for that workstation was not 44 times longer for the OES-Linux run than the OES-NetWare run.

With the throughput tests, there were no perceivable differences between 16 simultaneous threads and 32 simultaneous threads. The NetWare throughput test showed signs of client-side caching as well, so those results are tainted. Plus I learned that there were some client-side considerations that impacted the test. The clients all had WinXP SP2 in 256MB of RAM, and instantiating 16 to 32 simultaneous IOZone threads causes serious page faults to occur during the test.

As such, I'm left with much more rough data from these tests. CPU load for the servers in question, network load, and fibre-channel switch throughput. Since these didn't record very granular details, the results are very rough and hard to draw conclusions from. But I'll do what I can.

At the outset I predicted that these tests would be I/O intensive, not CPU intensive. It turns out I was wrong for Linux, as CPU loads approached those exhibited by the dir-create and file-create tests for the whole iozone run. On the other hand, the data are suggestive that the CPU loading did not affect performance to a significant degree. CPU load on NetWare did approach 80% during the very early phases of the iozone tests, when file-sizes were under 8MB, and decreased markedly as the test went on. It was during this time that the highest throughputs were reported on the SAN.

Looking at the network throughput graphs for both the lab-switch uplink to the router core and the NIC on the server itself suggest that throughput to/from OES-Linux was actually faster than OES-NetWare. The difference is slight if it is there, but at a minimum both servers drove an equivalent speed of data over the ethernet. Unfortunately, the presence of client-side caching on the clients for the NetWare run prevent me from determining the actual truth of this.

On the fibre-channel switch attached to the server and the disk device (an HP EVA) I watched the throughputs recorded on the fibre ports for both devices. The high-water mark for data transfer occurred during the first 30 minutes of the iozone run with NetWare, the Linux test may have posted an equivalent level but that test was ran during the night and therefore its high-water mark was unobserved. At the time of the NetWare high-water mark all 32 stations were pounding on the server with file-sizes under 16MB. The level posted as 101 MB/s (or 6060 MB/Minute), which is quite zippy. This transfer rate coincided quite well with the rate observed on the ethernet. This translates to about 80% utilization on the ethernet, which is pretty close to the maximum expected throughput for parallel streams.

For comparison, the absolute maximum transfer rate I've achieved with this EVA is 146 MB/s (8760 MB/Min). This was done with iozone running locally on the OES-Linux box and TSATEST running on one of the WUF cluster nodes backing up a large locally mounted volume. Since this setup involved no ethernet overhead, it did test the EVA to its utmost. It was quite clear that the iozone I/O was contending with the TSATEST data, as when the iozone test was terminated the TSATEST screen reported throughput increasing from 830 MB/Min to 1330 MB/Min. I should also note that due to the zoning on the Fibre Channel switch, this I/O occurred on different controllers on the EVA.

These tests suggest that when it comes to shoveling data as fast as possible in parallel, OES-Linux performs at a minimum the equivalent of OES-NetWare and may even surpass it by a few percentage points. This test tested modify, read, and write operations, which except for the initial file-create and final file-delete operations are metadata-light. Unlike file-create, the modify, read, and write operations on OES-Linux appear to not be significantly impacted by CPU loading.

Next, conclusions.

Tags: ,

Results: create operation differences

Looking at charts that show just create operations on the platforms are interesting.
Graph comparing file-create and dir-create operations on OES-LinuxI just put the Min values in the error bars to make it a cleaner graph. But here you can see the trend mentioned in the file-create tess about the 4000 object line. Only here 4500 objects seems to be the point where file-create passes dir-create in terms of time per operation. This is a result of CPU usage and the fact that file-create appears to be more affected by it than NetWare is. The idential NetWare chart is illustrative, but since CPU never went above 70% for more than a few moments it isn't a pure apples-to-apples comparison.
Graph comparing file-create and dir-create operations on OES-NWIn this case, file-create remains below dir-create for the whole run. What's more, dir-create drove CPU a lot harder than file-create did. The early data in the Linux run shows that OES-Linux would follow this file-create-is-faster pattern given sufficient CPU.

Exactly why file-create performance degrades so fast when CPU contention begins is unclear me. In terms of disk bandwidth, all four tests barely twitched the needle on the SAN monitor; these tests do not involve big I/O transfers. As far as NSS is concerned, a directory and a file are very similar objects in the grand scheme of things. Yet NSS seems to track more data related to directories than files, so it seems counter intuitive that file-create would lag when CPU becomes a problem. This question is one I should bring with me to BrainShare 2007.

Next, IOZONE and throughput tests.

Tags: ,

Results: file-create

The Test:
30 workstations create a sub-directory, and in that sub-directory create 20,000 files. At each 500 files it does a directory listing and times how long it takes to retrieve the list. A running total of the time taken to create files is kept, and a log of how long each entry takes to create is also kept.
Graph comparing file-create times between OES-Linux and OES-NetWare on an NSS volueThis chart is interesting in several ways. First of all, note the lower error bars for the Linux line. Those bars overlap and up to about 4000 files actually is below the NetWare average. This says to me that when there is CPU room, Linux may be faster than NetWare when responding to file creates. This particular line was caused by the same method as the previous test, namely that some test stations started up to 30 seconds before the whole group was running and therefore had a window of uncontended I/O. Those same workstations finished their tests while others were still around 12000 files, which further explains the downward trend of the Linux line above that threshold.

The second interesting thing is the sheer variability of the results. As with the dir-create test, CPU was completely utilized on the OES-linux box. The reported load-averages were very similar to dir-create. Some test workstations were able to run a complete test before others even got to 12000 files. Yet others took a really long time to process. The file-create test ran well over an hour, where the same test on NetWare took just under 30 minutes.
Graph comparing file enumeration between OES-Linux and OES-NetWareThis graph shows significant differences between the two platforms. As with the first chart, 4000 directories and under some workstations turned in NetWare-equivalent response times when speaking to OES-Linux. As with the above, this was due to uncontended I/O. But once all the clients started running the test the response time for directory enumeration was greatly degraded.

Because file-create seems to clog the I/O channels more than dir-create did, directory enumeration had to compete in the same channels and thus response times suffered. Towards the end of the test when some workstations had finished early response times were creeping back towards parity with OES-NetWare.

Next, create operation differences.


Tags: ,

Results: dir-create

| 2 Comments
Taking at look at the data for the dir-create test, you can see the differences between the two platforms.

The Test:
30 workstations create a sub-directory, and in that sub-directory create 20,000 directories. At each 500 directories it does a directory listing and times how long it takes to retrieve the list. A running total of the time taken to create directories is kept, and a log of how long each entry takes to create is also kept.

Directory Create graph comparing NetWare to Linux
This chart shows it very well. As I've said before, the state of the server affected this run. At its peak, the NetWare server had a CPU load around 65%. The Linux server had a load average around 18, which roughly translates to a CPU load of 900%. Directory Create is an expensive operation due to the amount of meta-data involved. This is clearly much more expensive on the Linux platform than it is on the NetWare platform.

The range of results is also quite interesting. Generally speaking, when speaking to a NetWare server the clients had a pretty even spread of response times. Time were faster than others. It just happens. Because of testing limits I was not able to start all stations at exactly the same time; however, start-time was within 30 seconds of eachother. The stations that went first recorded really good times for the first 3000 directories or so then slowed down as everyone got going. This effect was quite clear in the raw Linux data, though it is hidden in the above chart.

A side effect of that is that when the fast clients finished, it removed some of the I/O contention going on. You can see that in the downward curve of the Linux line towards the end of the test. That doesn't indicate that Linux was getting better at higher speeds, just that some clients had finished working and had removed themselved from the testing environment.

Directory Enumeration graph comparing NetWare and Linux
This is the chart that describes how long it takes to enumerate a single directory inside of a dir-list of the created sub-directory. As the test progressed there were more directories to enumerate. Mere enumeration isn't an expensive operation, as it just involved a sub-set of the metadata involved in the directory-entries. As with the dir-create test, dir-enum shows that Linux is slower on the ball than NetWare is under heavy load conditions. This is pretty clearly CPU related, as a single client running these tests shows very little difference between the platforms.

The hump and fall-off of the Linux line is an artifact of faster workstations getting done quicker and getting out of the way. The sheer variability of the linux line is interesting in and of itself. I'm sure further testing may identify the cause of that, but I'm limited on time and other resources so I won't be investigating it now.

Next, on Monday, file-create and file-enumerate.

Tags: ,

Backups for OES

| 1 Comment
One of the things that has prevented us from seriously considering a move to OES-Linux has been the backup problem. Apparently there has been some movement on that issue. At Brainshare this year SyncSort was quite prominent in pointing out that they had full support for backing up NSS volumes on Linux.

Today over at Cool Blogs, Richard Jones posted about the progress of this technology in the industry. The short version is that Novell implemented SMS on Linux, and for vendors that already had a solid Linux client it required them to completely rewrite it. Which would explain why it has taken almost two years for the big storage players to come out with supported product. Novell has taken steps to support the really big storage players in UnixLand (IBM, et. al.) in their clients, using extended attributes (Xattrs).

Turns out that xattr thing was slipped into a patch on the 11th of August. I wonder if that's the same package that had shadow volumes included?

Tags: ,

Testing completed

And now I enter the data-analysis phase. It'll be a while until I release numbers.

But, I figured I'd give some impressions I got from the tests. For brevity purposes, when I say NetWare I mean, "OES NetWare 6.5 SP3 with patches up to 8/23/06", and when I say Linux, I mean, "OES Linux SP2, with patches up to 9/1/06". Also, when talking about I/O, I'm referring to, "I/O performed over the network via NCP to an NSS volume."
  • I/O on Linux is more CPU bound than on NetWare. For absolute sure, dir-create and file-create are much more expensive operations CPU-wise. They both perform similarly when done with unloaded systems, but the system hit for create on Linux is much higher than on NetWare. This could be due to System/User memory barriers, but my testing isn't robust enough to test that sort of thing. NetWare is all Ring 0, where by necessity Novell has brought a lot of the file-sharing functions in Linux into Ring 3.
  • Bulk I/O speed is similar. When talking about bulk I/O functions, in my case this was the IOZONE test, both platforms perform similarly. Unfortunately, caching played a big role on the NetWare test and didn't perform any role in the Linux test. This is the inverse of my findings in January. The testing gods frowned on me.
  • Linux seems to support faster network I/O than NetWare. Unfortunately, this may just be a side-effect of the caching. But network loads were higher when running the bulk IO tests on Linux than they were with NetWare. This can be a good thing (Linux supports more network I/O than NetWare) or a bad thing (Linux requires more network I/O for similar performance). Not sure at this time which it is.
CPU loads on the WUF cluster nodes during term generally run on average in the 8-12% range. The multiplier for CPU load was similar for dir-create and file-create operations, if you assume (incorrectly) that the CPU is reflective of file I/O activity Linux machines performing the same duties would report load-averages around 8.0. Since most I/O are reads, and that operation is not as load-inducing as a create, the averages would be under 100% (load-average of 2.0 for these boxes). But still higher than for NetWare.

Another thing to note is that the bulk IO test with IOZONE also induced very high load-averages on Linux, but the apparent throughput was very comparable to NetWare. IOZONE works by creating a file of size X and runs a series of tests on records of size Y. Unlike the dir-create and file-create tests, this test doesn't test how fast you can create files it tests how fast you can get data. Clearly record I/O within files still induces CPU load in the form of NDSD activity; however, unlike the dir-create and file-create tests the apparent throughput is not nearly as affected by high-CPU conditions.

From this early stage it looks like we could convert WUF to Linux and still not need new hardware. But we'd be running that hardware harder, much harder, than it would have run under NetWare. Since we're not pushing the envelope with our NetWare servers now, we have the room to move. If our servers were running closer to 20% CPU, the answer would be quite different.

As I read the documentation, it looks like NCPserv is a function of ndsd. Therefore, seeing ndsd taking up CPU cycles that way was due to NCP operations, not DS operations. If that's the case, substituting a reiser partition for the NSS partition would decrease CPU loading some, but probably not the order of magnitude it needs.

Tags: ,

Is brocade lying to me?

| 1 Comment
The Throughput monitor on the Fibre channel switch has questionable values. The FC spec says that we're using 2Gb ports, and the ethernet on the server is 1Gb. Logically, if that ethernet is running flat out, it should use around 50% of the Fibre bandwidth.

Yet, when the fibre throughput monitor is reporting 125 MB/s (1Gb/s), it also shows a utilization of only 25%. Buh? Am I missing something here?

Shadow volumes

I mentioned a few months ago something called Shadow Volumes. I just noticed something today in ncpcon on the test server that grabbed my eye something fierce:
BENCHTEST-LIN:help

change volume
config
connection
create shadow_volume
create volume
dismount
exit
help
mount
purge volume
remove volume
rights
send
shadow
shift
stats
volume
enable login
disable login
files
Note the bolded commands. Perhaps Novell has slipped in Shadow Volumes in a post-SP2 update? Doing help on the 'create shadow_volume' command gives this output:

BENCHTEST-LIN:help create shadow_volume

NAME: create shadow_volume - Create NCP shadow volume

SYNTAX:
create shdadow_volume ncp_volume_name path

DESCRIPTION:
Use this command to create an association between an NCP volume
and a NCP shadow volume. This command only adds the NCP shadow
volume mount information to "/etc/opt/novell/ncpserv.conf".

This command can be added to a cluster load script.

You can run ncpcon console commands without entering NCPCON by
prefacing the command with ncpcon.



EXAMPLE:
create shadow_volume vol1 /home/shadows/vol1
and "help shadow"
BENCHTEST-LIN:help shadow

NAME: shadow - Perform Shadow Volume operations on a NCP Volume - (null)

SYNTAX:
shadow volumename operation [options]

DESCRIPTION:
You can run ncpcon console commands without entering NCPCON by
prefacing the command with ncpcon.


OPTIONS:
operation=[lp][ls][mp][ms] - (lp) List primary files
(ls) List shadow files
(mp) Move files to primary
(ms) Move files to shadow

pattern="searchPattern" - File pattern to match against

owner="username.context" - Username and Context

uid=uidValue - User ID

time=[m][a][c] - (m) Last Time Modified (a) Last Time Accessed
(c) Last Time Changed

range=[time period] - See Time period

size=[size differential] = See Size differential

output="filename" - Output all results to the specified filename

time period=[a][b][c][d][e][f][g][h][i][j]
(a) Within Last Day
(b) 1 Day - 1 Week
(c) 1 Week - 2 Weeks
(d) 2 Weeks - 1 Month
(e) 1 Month - 2 Months
(f) 2 Months - 4 Months
(g) 4 Months - 6 Months
(h) 6 Months - 1 Year
(i) 1 Year - 2 Years
(j) More Than 2 Years

size differential=[a][b][c][d][e][f][g][h][i][j][k]
(a) Less than 1KB
(b) 1 KB - 4 KB
(c) 4 KB - 16 KB
(d) 16 KB - 64 KB
(e) 64 KB - 256 KB
(f) 256 KB - 1 MB
(g) 1 MB - 4 MB
(h) 4 MB - 16 MB
(i) 16 MB - 64 MB
(j) 64 MB - 256 MB
(k) More than 256 MB


EXAMPLE:

Yes, 'EXAMPLE:' is blank in the HELP. Hmmmmmm. I don't see any documentation updates, but those commands are indeed present. Richard Jones mentioned that shadow volumes are an OES2 feature, and to try it out in the beta. Perhaps there is an OES2 beta in the near future? Who knows.

Tags: ,

return of.. part 2

Now that I'm looking at the network loading data I am seeing something interesting. The NetWare server handled the first 30 minutes of load better than the OES-Linux server did, but after that the OES-Linux server provided better throughput. The difference isn't great, a few percentage points on the GbE link, but it is there. CPU is still pretty high, but it's more than keeping up.

Unfortunately, we seem to have an 'apples to apples' problem. While the network utilization appears to be higher with the OES Linux server, implying better throughputs, it is clear from the few clients that have finished the run that there was no caching involved with this particular test. Comparing numbers, therefore, will be a bear.

Ideally I'd rerun the NetWare test with client caching and oplock 2 disabled, but I don't have time for that. This server needs to be given back to the service I borrowed it from.

Tags: ,

return of differences.

| 1 Comment
The big iozone test has kicked off about 45 minutes ago. When I did this on NetWare, CPU hung at around 60-65% or so, and the Telecom guys made happy noises as their new monitoring software turned colors they'd never seen before.

Okay, it turned 'warning'. Before it was either green/working, or red/broken. They'd never seen yellow/high-load before. They were quite happy.

Anyway... the 1GB link between the lab with all the workstations and the router core was running 79-81% utilization. Nice!

On the san link we had around 20% utilization, the highest I'd ever seen the EVA drive before.

Right now I can't tell what that link is running, but the link into the server itself is running in the 50-60% range. Better analysis will occur tomorrow when I can ask the Telecom guys how that link behaved overnight. As for the server, load-levels are well above 3.0 again. Right at this moment it's at 14ish, with ndsd being the prime process.

At this point I'm begining to question what unix load-averages mean when compared to the cpu-percentage reported by NetWare. Are they comparable? How does one compare? Anyway, the dir-create and file-create tests did showed to be much more cpu-bound on Linux than NetWare, and this sort of bulk I/O seems to have a similar binding. Late in the test CPU on NetWare was fairly low, 20% range, with the prime teller of loading being allocated Service Processes. So I'm pretty curious as to what load will look like when all the stations get into the 128MB file sizes and larger.

Tags: ,

differences continued

I finally managed to get the 'big file' test done. Results in the end were similar to those reported in the previous post. The file create process isn't that much improved over dir-create that the tests ran fast. Though, there was a marked difference. During the dir-create test the uptime load-levels were in the 17-19 range, where with the file-create test they were in the 4-6 range. Much improved, but still pushing CPU well past 100%.

I haven't looked at the data closely yet, but I suspect that the same trends reported in the dir-create test follow here. I didn't do a test for dir-create and file-create on NetWare with a smaller number of stations, but then it didn't seem like I needed to. The 'break even' point, where CPU is just under 100%, on the dir-create looks to be in the 4-6 station range, with the file-create point on or around 10 stations.

Tags: ,

differences bloom

I got the OES-Linux SP2 server formatted and installed this morning. And the NSS volume created. I ran the first benchmark, and golly there is a difference.

Test 1 is the 'big directory' test. The client stations create 20,000 sub directories in a sub-directory titled the name of the machine. The time to create each directory is tracked, and the time it takes to enumerate each directory is also tracked. In testing out the benchmark it is clear that mkdir is a more expensive operation than 'touch' is in the make-file test (also 20,000 files).

On NetWare with 30 client machines pounding the server, CPU rose to about 80% or so and stayed there. Load on the CPUs were equal. There was some form of bottlenecking going on because some clients finished much faster than others, and it isn't clear what separated the two classes.

On Linux the load-average is pretty stable around 18. The process taking up that CPU is ndsd. The numbers I'm getting back from the clients are vastly worse than NetWare. The first time I ran it I figured that this was due to the workstation objects not having the posixAccount extension. So I fixed that, and now the percentages are better, but still much worse than NetWare. I'll run this test again with only 10 clients, so I get to compare smaller concurrent access numbers.

That kind of load is not exactly 'real user load', it's a synthetic load designed to show how well either platform handles abuse. The iozone benchmark should be closer to comparable since that's just a single file, and ndsd shouldn't be involved with those accesses much at all. That'll be almost entirely i/o subsystem.

Tags: ,

progressing

Right now I'm running the mass IOZONE test. 30 workstations are pounding the test NetWare server with IOZONE, running this command-line:

iozone -Rab \report-dump\IOZONE-std\%COMPUTERNAME%-iozone1.xls -g 1G -i 0 -i 1 -i 2 -i 3 -i 4 -i 5

Right now all the stations are chewing on the 1GB file, and are all at various record-size stages. But the fun thing is the "nss /cachestats" output:
BENCHTEST-NW:nss /cachestat
***** Buffer Cache Statistics *****
Min cache buffers: 512
Num hash buckets: 524288
Min OS free cache buffers: 256
Num cache pages allocated: 414103
Cache hit percentage: 63%
Cache hit: 3407435
Cache miss: 1978789
Cache hit percentage(user): 60%
Cache hit(user): 3031275
Cache miss(user): 1978789
Cache hit percentage(sys): 100%
Cache hit(sys): 376160
Cache miss(sys): 0
Percent of buckets used: 48%
Max entries in a bucket: 7
Total entries: 399112
Yep. All that I/O is only partially being satisified by cache-reads. As it should be at this stage of the game.

What surprised me yesterday when I kicked off this particular test was how baddly hammered the server was at the very begining. This is the small file-size test, and better approximates actual usage. CPU during the first 30 minutes of the test was in the 70-90% range, and was asymetric, CPU1 was nearly pegged. During that phase of it we also drove a network utilization of 79-83% on the GigE uplink from the switch serving the testing machines and the router core. And on the Fibre Channel switch serving the test server, the high-water mark for transfer speed was 101 MB/Second (~20% utilization).

The FC speed is notible. The fasted throughput I was able to produce on the port linking the EVA was about 25 MB/Second, and that was done with TSATEST running against local volumes in parallel on three machines. Clearly our EVA is capable of much higher performance than we've been demanding of it. Nice to know.

Depending on how the numbers look once this test is done, I might change my testing procedure a bit. Run a separate 'small file' run in IOZone to capture the big-load periods, and perhaps a separate 'big file' run with 1G files to capture the 'cache exhaustion' performance.

From a NetWare note, the 'Current MP Service Processes' counter hit the max of 750 pretty fast during the early stages of the test. Upping the max to 1000 showed how utilization of service processes progressed during the test. Right now it's steady at 530 used processes. Since I don't think Linux has a similar tunable parameter, this could be one factor making a difference between the platforms.

Tags: ,

Prognostication

Hmmm....

Item: Apple is now shipping on Intel hardware.
Item: OS 10.5 will be shipping with Boot Camp pre built in.

Given those, I suspect that game makers now have less incentive to create games for OS X. The presumption being that anyone who needs to can boot Windows and run their games there. Why go to the extensive effort to port games to OS X?

Therefore: There may be fewer games released for the Mac
Therefore: More gaming will be done on Mac hardware, but Windows OS.
Therefore: Macintosh machines will come under similar upgrade pressure as PC machines, thanks to increased gaming numbers.
Therefore: Apple will be under increased pressure to permit at least graphics card upgrades to their mid-line machines (iMac).

Hmmmm....

Technology is cooooool

I just worked out a REALLY NEAT trick to help with managing my benchmarking clients. I figured something like this is possible, but actually seeing it work was one of those moments that make what I do so fun. I came real close to shouting, "I am the zombie master," but I held off. Just.

Anyway, the trick:
  1. Make sure all the clients are imported as Workstation Objects.
  2. Create a Workstation Group, and add all of the clients into it.
  3. Add the newly created Workstation Group as a R/W trustee of the volume I'm benchmarking against. This allows the workstations as themselves, not users, to write files.
  4. Create a Workstation Policy, associate it to the group.
  5. In the Workstation Policy, create a Scheduled Task. Point it at the batchfile I wrote that'll map a drive to the correct volume, run the tests, and clean up.
  6. Modify the schedule so it'll run at a specific time, making sure to uncheck the 'randomize' box.
  7. Force a refresh of the Policies on the clients (restarting the Workstation Manager service will do it).
And the best part? I don't have to by physically present to kick off the activities! Woo! I can even run the big I/O ones in the depths of night.

The jobs all seem to start within 30 seconds of the scheduled time. This doesn't seem to be due to differences in the workstation clocks, on checking those are all within 3 seconds of 'true', rather the Workstation Manager task polling interval. I wish I could get true 'everyone right now' performance, but that's not possible without w-a-y more minions.

On the 'large number of sub-directories' test, the early jumpers seemed to get a continued edge over their late starters. The time to create directories for the early jumpers was consistantly in the 3-5ms range, where the late jumpers were in the 10-13ms range. Significant difference there. And some started fast and became slow, so there is clearly some threshold involved here beyond just the server dealing with all those new directory entries. CPU load on the NetWare box (what I have staged up first) during the test with 32 clients creating and enumerating large directories was in the 55-70% range. That load is spread equally over both CPUs, so those bits of NSS are fully MP enabled.

Tags: ,

Progress

The cut-over in the data center to the new Cisco 6509 is moving apace. By Friday everything in there will be on the new switch. One of the happy side effects is that now all of the data center is on Gig-Ether ports. Not that we have much that pushes that kind of data, but still, nice to have.

In Benchmarking news, I'm still building the testing protocol. But one thing has shown itself one more time. Back in the original benchmark I noticed an artifact in the data. The IOZONE "random read" test shows a troff in the NCP-on-NetWare data. When using a 64kb record size, there is a marked decrease in performance. It showed up in the original data, and in some of the test runs I've just completed. The hardware behind the January test and this one is different, the server is a bit older but this older server is hooked up to the SAN.


32kb
64kb
128kb
8128KB
9216 8450 9101

This is sample data. For the 8MB file size, you can see the three record sizes either side of the 64kb line. The performance drop exhibited there is repeated throughout the whole random read test. I wonder why that particular record size is so slow?

ATT in December

A partial session list has been posted.

Find it: http://www.novell.com/training/attlive/sessions.html

To which I say, 'eh.'
  • We don't do GroupWise.
  • We don't do Identity Manager.
  • We don't to high performance Linux. We barely do web-serving with Linux at the moment.
  • Which leaves ZEN, the one bright spot.
Eh. BrainShare is better for me at this point.

Tags: ,