January 2006 Archives

New client!

A week ago Novell released version 1.1 of their Novell Client for Linux!

Download it here.

The link they posted to the 'changes' document is broken. Happily, I found the document for you.

This being V1.something software, the list of changes is large, and the bugs fixed non-trivial. This software is under active development, folks. But it still beats the pants off of NCPMOUNT if you are able to run it on your system.

Some examples of fixed items:
20. Login thinks that admin.novell, .cn=admin.o=novell and .admin.novell are all different people. (97333)
22. Unable to purge files on a Clustered Volume. (112861)
31. If a context is not specified for login, a user in the server's context cannot authenticate. (96430)
45. The "CN" user identifier variable displays the wrong information. (97089)
There are more, but those are just some hilights. Clearly, more still needs to be done to make this a true equivalent to the Novell Client for Windows in terms of usability and uniformity.

Benchmark results summary

| 2 Comments
These eight articles were written as part of a benchmark I ran. The goal was to check out two separate variables. NetWare vs Linux, and NCP vs CIFS. The hardware used in this test was identical.
Server Hardware:
HP ProLiant BL20, G3
2x 3.2GHz Cpu
2GB RAM
2x 72GB U320 HD, RAID1
Hyperthreading off
100MB Ethernet port

Client Info:
3.00GHZ CPU
1GH RAM
Novell Client 4.91.0.20050216
100MB Ethernet port, different subnet from server
WinXP,SP2 fully patched

Switched ethernet between Server and Client

NetWare Config
NetWare 6.5 SP4a (a.k.a. OES-NW SP1)
No post-SP4a patches
No changed NSS settings
No Proliant RomPaq applied (i.e. Novell supplied drivers, not HP-supplied)
10GB NSS volume
Purge-Immediate flagged in test directory

OES-Linux Config
OES-Linux SP1
Novell Samba
No post-patches (risky, I know, but best apples-to-apples since SP2 was on the Red Carpet servers)
10GB NSS Volume
Purge-Immediate flagged in test directory
The performance tests were performed with IOZONE over the network. As you would expect, certain tests were constrained by network performance, but the data was rich enough to draw conclusions from all levels of file size.

These tests were done such that only my I/O was being handled by the servers. I don't have the resources to check out how the two platforms and protocols handle high levels of contention. That'll have to be handled by people other than me.

Part 1: Caching
Part 2: CIFS
Part 3: NCP
Part 4: Comparing Cache, NCP-on-Linux vs CIFS-on-NetWare
Part 5: Comparing Uncached, NCP-on-NetWare vs CIFS-on-Linux
Part 6: Conclusions so far
Part 7: Uncached NCP
Part 8: NCP vs CIFS on Linux

The Bottom Line
NCP-on-Linux is the best bet. This is a surprising result, but it goes to show that Novell has done a good job in porting over NCP onto the Linux platform. I did not expect to find that NetWare was second to Linux for file-serving over Novell's 20 year old file serving protocol. The improvement for running NCP clients against a Linux server was not jaw dropping, only single digit improvements, but the fact that it is better at all says something right there.

And as a bonus, the data I drew it all from!
Summary

Since I now have data runs for both protocols that do not include client-side caching, this comparison should be a lot easier. So far we have learned that NCP overall is better than CIFS for the kinds of file-access our users do most. I expect this to show here as well. Earlier tests showed that NCP-on-Linux (cached) is better than CIFS-on-NetWare (cached), and NCP-on-Linux (uncached) is better than NCP-on-NetWare (uncached). Since I've already shown that NCP-on-NetWare is better than CIFS-on-Linux, and NCP-on-Linux is better than NCP-on-NetWare, it is a foregone conclusion that CIFS-on-Linux will be worse than NCP-on-NetWare.

But by how much? Same OS back end for the two, so lets go see!

Write Tests

The Write test turned in an overall performance increase of 17% for using NCP versus CIFS. Like the previous NCP vs. CIFS comparisons, the differences in performance are very visible in the Record Size scale. The 4K record size shows a performance increase of 97%, 8K at 95%, 16K at 59%, 32K at 44%, and 64K at 13%. After 64K CIFS starts performing better. Each progressive record size up to 16M gets a little bit worse for NCP, until it gets to 16M and has a performance hit of -13%. The file-sizes show a similar but flatter curve, with the inflection between NCP vs CIFS occurring between the 16M and 32M file-sizes. The 64K files perform 75% faster, and the 512M files perform 6% slower.
NCP vs CIFS on Linux, Writer test
The ski-jump look of the graph shows it all right there. As with the previous NCP vs CIFS, file-size doesn't have a LOT to do with performance, but it does have an impact. The slope of the 4K line shows that the larger file-sizes probably wouldn't be able to match NCP's performance for the smaller files.

The Re-Writer test showed an overall improvement of NCP over CIFS by 16%, a bit lower than the Writer performance. This was also reflected in the record-size and file-size performances. The movement isn't great, but it does suggest that CIFS contains slightly better metadata-handling than NCP.

The Random Write test showed an overall improvement of NCP over CIFS by 4%. The reason for the poorer showing is that NCP's small record-size performance that did so well in the Writer and Re-Writer tests, isn't nearly as good on this test. The same ski-jump is visible in the graph, but not to the same slope.

The Record Rewrite test showed an overall improvement of NCP over CIFS by 6%. Like the Random Write test, NCP wasn't able to show the stellar performance at the smaller record-sizes that it showed on the Write test. The inflection point is between the 64K and 128K record-sizes.

Read Tests

The Reader test turned in an average performance boost for using NCP of 6%. Like the earlier test comparing CIFS-on-Linux to NCP-on-NetWare, there isn't a strong correlation with record-size and performance.
NCP vs CIFS on Linux, Reader test
The performance was almost entirely better than CIFS, but in many cases only by a few percentage points.

The Re-Reader test performed much the same as the Reader test, and posted a performance increase of only 5%. Like the re-writer test, this is probably due to better meta-data handling in CIFS than with NCP. The data looks much like the Reader chart in shape and form.

The Random Read test posted a performance boost of 4%. NCP performed a bit better (up to 9%) at the smaller record sizes, but overall performance was generally just a few points above the break-even line.

The Backward Read test turned in a performance boost of 5%. As with most CIFS tests, NCP performed better at smaller record sizes. As with the Random Read test, performance was overall better than CIFS by only a few points on most of the chart.

Conclusions

While CIFS has NCP beat on writes to large files, NCP has CIFS beat on reads. This matches earlier results. In fact, NCP-on-Linux is better than NCP-on-NetWare enough that the large file reads are now above the 1.00 line. Novell has done a good job getting NCP ported to Linux.

Which protocol to use depends on what you are going to use the server for. For general office file-server usage, NCP is by far the better protocol. For GIS, large DB, or other large media files, CIFS probably is the better choice in those cases. In our case, though, NCP's access patterns fit our usage patterns better.

Summary
Summary

The run is complete, and I now have a true apples-to-apples comparison of NCP performance. The result is a rather surprising one! In every single test, NCP-on-Linux out-performed NCP-on-NetWare. The lowest margin was 1%, and the highest margin was 9%, so the advantage isn't stellar. On the other hand, NCP started life on NetWare so you would expect it to do better on that platform.

T'ain't so.

One small trend did show up in the test data. Tests that involved a write component showed a slight, 1-3%, increase over the NetWare data. Tests that involved a read component showed a little better performance, 6-8%. The reasons for this are unclear, but it is very consistent.

Write Tests

The Writer test showed the best performance gain in the range of file-sizes 2M and under, and record-sizes 32K and under. The average improvement in this range was a rather respectable 5%, which is much higher than the overall average for the test of 1%. Performance seems to be affected more by file-size than by record-size, as the range of improvement over record-size was smaller than the range of improvement over the file-sizes. There is a hint in the data that 4K record-sizes for files larger than 16M are much better handled on NetWare, but that data was not gathered.

The Re-Write test showed similar patterns to the Write test, but slightly faster. As the description of the test says, a re-write doesn't affect meta-data to the same degree that a new file would. As with the Writer test, the best performance gain was in the range of file-sizes 2M and under, and record-sizes 32K and under. In that range the improvement was also 5%. Overall, the test showed a 2% improvement for running NCP on Linux. An interesting outlier in the data is the file-size of 8M, which turned in the worst result of the test at a -3%.

The Random Write test showed a 2% improvement in performance over NCP-on-NetWare. The best consistent performance was at the 64K and 256K file-sizes, each with a performance increase of 11%.

The Record Rewrite test showed the best performance of the write tests, at 3%. Every single record size tested showed at least a .5% improvement over NetWare. The best record-size was 4K, with a performance boost of 10%, and the worst was 16M, with performance just a hair over parity with NetWare. On the file-size front the results were very scattershot, with the best performance (21%) being turned in at the 128K file-size, and the worst (-5%) at the 4M file-size. The 'sweet spot' identified in the Writer test had an average improvement of 12%.

There were some trends over all of the writer tests as well. In every case, file-sizes of 16M and larger turned in a positive performance difference when run against NCP-on-Linux. The sweet-spot, file size of 2M or smaller and record size of 32K and smaller, turned in performance markedly better than the overall performance for that test.

Read Tests

The Reader test turned in an overall performance gain of 6% over NetWare. The tendency of the Writer report to show a decrease in performance at the 4K record size doesn't show up here. In fact, the number two and number three highest performance gain values on the chart were in the 4K record size column at the 2M (+38%) and 16M (+31%) file sizes. The 2M file-size showed the highest variability in performance as it had both the highest and lowest performance values on the chart. The 2M file-size with a 256K record size showed a -41% performance hit, and the 2M file-size with a 512K record size showed at +52% performance gain. The overall average for that file-size was 6%.

The Re-Reader test turned in a performance gain of 8%, which is presumably due to server-side caching of data being faster on Linux than on NetWare. There were two far outliers in the data which turned in performances 100% or better than the NetWare data. Looking at the raw data, these two results were due to NCP-on-NetWare turning in really bad numbers for 512K file-size and 8K record size, and 1M file-size and 64K record-size. Other than these two, the data is pretty even. As with the Reader test, the 4K record-size turned in very good numbers, especially at larger file-sizes.

The Random Read test turned in a performance gain of 6% over NCP-on-Netware. This was a hair faster than the initial Reader test, which shows that server-side caching still has a role to play. The range of values on this test was narrower than that reported by the Re-Reader test. There were no real 'hot spots' on the chart. The 4K record-size continued to show the largest variability.

The Backward Read test turned in the best value of the lot with a performance increase of 8% over NCP-on-NetWare. This test also had a far outlier at the 512K file-size/64K record-size level, where the NCP-on-NetWare test turned in an abysmal number. That value was excluded from the averaging, otherwise the performance increase of the test would have been a 9% and change. This test also showed a very strong value for the 4K record size, with an average performance increase of 21%. Another interesting result on this test is that the sweet spot identified in the Writer tests shows up on this one, with an average performance increase of 14%.

Unlike the Writer tests, the Reader tests didn't have any trouble at the 4K record-sizes on larger files. Overall performance was better than NetWare by a noticeable margin. There were a few exceptions, but generally speaking the results were consistent.

Conclusions

It is clear from the data that Novell has somehow managed to make NCP-on-Linux better than it was on NetWare. NetWare's historic claim as the end-all-be-all of File Servers may finally be coming to an end. Now to compare NCP-on-Linux (uncached) vs CIFS-on-Linux (uncached).

Part 8: NCP vs CIFS on Linux
Summary

The analysis is done, and now it is time to make some decisions about what works best for us. As I've stated before, the majority of file-access to the NetWare cluster is with smaller files, and by definition smaller file-ranges. A lot of data on there is in larger files, but the count of those files is pretty small. On the User and Shared volumes, at least 50% of files are 64K or smaller; the smallest file-size in these tests.

I analyzed two big groups, NCP vs CIFS/SMB, and cached vs uncached. The cache/uncache was a surprise of the local settings, and it does taint the data. I hope to do another run with NCP-on-Linux in an uncached mode in order to better compare it against NCP-on-NetWare which seemed to run in an uncached state.

The NCP vs CIFS benchmarks were pretty clear. NCP is engineered to be better at handling files and access patterns in the range our users are most likely to use. This is unsurprising considering that Novell designed NCP to be a file-serving protocol from the ground up, and CIFS/SMB was more general purpose in mind. As such, for big files or large sub-ranges CIFS is the better protocol. In both of the cached and uncached comparisons NCP came out the winner.

When it comes to caching mechanisms, NCP worked best for our environment with one big exception in the 'Re-Reader' test. Microsoft's cache did this caching, so performance in that case was vastly better than the uncached NCP performance.

In the end what have I learned? The fact that the Novell Client performed local caching for the NCP-on-Linux test blew my testing objectives out of the water. In order to make any real tests I need to be able to test NCP-on-Linux in an uncached state, and I'm working on that. According to the tests, NCP-on-Linux is the best combination of protocol and caching.

Look for Part 7, where I compare NCP-on-Linux (uncached) against NCP-on-NetWare, and CIFS-on-Linux.

Part 7: Uncached NCP
Summary

In this section I'm going to compare the two access methods that didn't have any local caching, NCP-on-NetWare and CIFS-on-Linux. The margin of differences between the two shouldn't be as large as it was for the cached methods, simply due to the relative speed of the network involved being a major limiter for speeds.
Write: This test measures the performance of writing a new file. When a new file is written not only does the data need to be stored but also the overhead information for keeping track of where the data is located on the storage media. This overhead is called the “metadata”It consists of the directory information, the space allocation and any other data associated with a file that is not part of the data contained in the file. It is normal for the initial write performance to be lower than the performance of rewriting a file due to this overhead information.
The graph for this test shows a strong correlation to record-size in performance. Clearly NCP-on-NetWare is much better at handling small sub-ranges of files than CIFS-on-Linux. Once the sub-range gets to a certain size between 128K and 512K (depends on file-size) then CIFS-on-Linux provides better performance. For most types of filaccesses our users use, NCP-on-NetWare would provide the better performance.
Re-write: This test measures the performance of writing a file that already exists. When a file is written that already exists the work required is less as the metadata already exists. It is normal for the rewrite performance to be higher than the performance of writing a new file.
As this graph also shows, there is a strong correlation to record-size in performance. The point where CIFS provides better performance comes a bit earlier, but the general trend remains.
Read: This test measures the performance of reading an existing file.
This graph doesn't show as strong a correlation to record size. The performance boost that NCon0n-Linux provides isn't nearly as strong as it was with the previous two writing tests. It seems to do best on files of 64K and in smaller record sizes.
Re-Read: This test measures the performance of reading a file that was recently read. It is normal for the performance to be higher as the operating system generally maintains a cache of the data for files that were recently read. This cache can be used to satisfy reads and improves the performance.
This graph looks a lot like the "read" graph. As above, the performance boost isn't terribly great. File-Size/Record-Size combinations that give a performance difference in excess of 10% are rare.
Random Read: This test measures the performance of reading a file with accesses being made to random locations within the file. The performance of a system under this type of activity can be impacted by several factors such as: Size of operating systemÂ’s cache, number of disks, seek latencies, and others.
This graph continues the trend of the previous 'read' graphs in that it isn't quite as impressive. Record sizes of 128K and smaller yield small gains, and above that line CIFS-on-Linux is the better get. With a few visible exceptions, most performance is also within 10%.
Random Write: This test measures the performance of writing a file with accesses being made to random locations within the file. Again the performance of a system under this type of activity can be impacted by several factors such as: Size of operating systemÂ’s cache, number of disks, seek latencies, and others.
This graph shows very similar trends with the previous Write graph. As with that graph, the break between NCP-on-NetWare being faster and CIFS-on-Linux being faster is when the record-size gets in the 128K-512K range. In terms of raw numbers, the Random Write is slower than the Write test, but this is to be expected.
Backwards Read: This test measures the performance of reading a file backwards. This may seem like a strange way to read a file but in fact there are applications that do this. MSC Nastran is an example of an application that reads its files backwards. With MSC Nastran, these files are very large (Gbytes to Tbytes in size). Although many operating systems have special features that enable them to read a file forward more rapidly, there are very few operating systems that detect and enhance the performance of reading a file backwards.
This graph looks like the previous 'read' graphs.
Record Rewrite: This test measures the performance of writing and re-writing a particular spot within a file. This hot spot can have very interesting behaviors. If the size of the spot is small enough to fit in the CPU data cache then the performance is very high. If the size of the spot is bigger than the CPU data cache but still fits in the TLB then one gets a different level of performance. If the size of the spot is larger than the CPU data cache and larger than the TLB but still fits in the operating system cache then one gets another level of performance, and if the size of the spot is bigger than the operating system cache then one gets yet another level of performance.
This graph looks nearly identical to the 'random write' test before.

While the results aren't as dramatic as they were for the cached methods, they are at least consistant. NCP-on-NetWare provides consistant and real performance improvements over a hardware-identical CIFS-on-Linux (Samba) configuration. Writing performance was much better in the file and record sizes we generally see on our NetWare servers. Large file sizes and record sizes were better handled by CIFS-on-Linux, but such access is a minority on our network. If we had a lot of video editing types around, I'd be singing a different story.

Part 6: Conclusions so far
Summary

In this section I'm comparing the two cached methods, NCP-on-Linux, and CIFS-on-NetWare. I'll do the uncached ones in the next section.

The comparison here is not as much apples-to-apples as I'd like. Microsoft caching, and Novell's caching use different mechanisms, and we're also going over different protocols and platforms as well. Because of this, the trends aren't nearly as clear cut as they were in the previous sections where we compared the differences between platforms.
Write: This test measures the performance of writing a new file. When a new file is written not only does the data need to be stored but also the overhead information for keeping track of where the data is located on the storage media. This overhead is called the 'metadata' It consists of the directory information, the space allocation and any other data associated with a file that is not part of the data contained in the file. It is normal for the initial write performance to be lower than the performance of rewriting a file due to this overhead information.
For this test, NCP-on-Linux outperforms CIFS-on-NetWare in the areas of most interest. As with a few tests so far, the 'sweet spot' seems to be with a file-size under 32MB and a record size under 512KB. NCP-on-Linux particularly out-performs CIFS-on-NetWare in the small file ranges. Improvements of 200-400% are pretty common within the sweet-spot range, with a few combinations (such as 512KB file, 64KB record size) going as high as 1300%.
Re-write: This test measures the performance of writing a file that already exists. When a file is written that already exists the work required is less as the metadata already exists. It is normal for the rewrite performance to be higher than the performance of writing a new file.
For this test, CIFS-on-NetWare outperforms NCP-on-Linux. However, the magnitude isn't nearly to the scale of the Write test. Record size again has something to do with the performance. The two methods reach near parity near a record size of 1MB. Though for files over 32MB, CIFS-on-NetWare provideconsistentnt 5-10% performance increase over NCP-on-Linux across the board.
Read: This test measures the performance of reading an existing file.
For this test there is no clear winner. NCP-on-Linux generally outperforms CIFS-on-NetWare when the record-size is filesize, or filesize/2. It also has small increases, 5-10%, for 16KB record-sizes and files around 8MB. Generally speaking, though, CIFS-on-NetWare outperforms NCP-on-Linux by an average of 7% across the board.
Re-Read: This test measures the performance of reading a file that was recently read. It is normal for the performance to be higher as the operating system generally maintains a cache of the data for files that were recently read. This cache can be used to satisfy reads and improves the performance.
This is very clear-cut. CIFS-on-NetWare blows the pants off of NCP-on-Linux for this test. The average performance increase for everything right up to playing with 512MB file is about 9000%. Why is this? Because NCP-on-Linux does NOT cache this particular test, and CIFS-on-NetWare does. This is a design choice from Novell, presumably.
Random Read: This test measures the performance of reading a file with accesses being made to random locations within the file. The performance of a system under this type of activity can be impacted by several factors such as: Size of operating system'’s cache, number of disks, seek latencies, and others.

For this test NCP-on-Linux is the winner. Especially for small record sizes or small files. For 4K records, the performance increase is 33% and for files 512K and under performance increase averages about 10% over CIFS-on-NetWare. Overall, performance is better by 10-15%.
Random Write: This test measures the performance of writing a file with accesses being made to random locations within the file. Again the performance of a system under this type of activity can be impacted by several factors such as: Size of operating system'’s cache, number of disks, seek latencies, and others.
NCP-on-Linux is the winner in the ranges important to me. CIFS-on-NetWare has better performance for large files at large record-sizes. NCP-on-Linux is clearly better with record sizes 16K and under. 72% better at 4K record size, 55% better at 8K, 37% better at 16K, and 11% better at 32K.
Backwards Read: This test measures the performance of reading a file backwards. This may seem like a strange way to read a file but in fact there are applications that do this. MSC Nastran is an example of an application that reads its files backwards. With MSC Nastran, these files are very large (Gbytes to Tbytes in size). Although many operating systems have special features that enable them to
read a file forward more rapidly, there are very few operating systems that detect and enhance the performance of reading a file backwards.
This is another test where NCP-on-Linux beats out CIFS-on-NetWare. The margin is not great, but consitent. As with the previous test, the best performance is with a 4KB record size. You have to get to the 16MB record-size to get a category that CIFS-on-NetWare outperforms NCP-on-Linux, and even there the difference is 3%. The overall performance increase of NCP-on-Linux is a shade under 9%.
Record Rewrite: This test measures the performance of writing and re-writing a particular spot within a file. This hot spot can have very interesting behaviors. If the size of the spot is small enough to fit in the CPU data cache then the performance is very high. If the size of the spot is bigger than the CPU data cache but still fits in the TLB then one gets a different level of performance. If the size of the spot is larger than the CPU data cache and larger than the TLB but still fits in the operating system cache then one gets another level of performance, and if the size of the spot is bigger than the operating system cache then one gets yet another level of performance.
This test showed a mixed result. For file-sizes 8MB and under, NCP-on-Linux clearly has a lead across all record-sizes. Results get a lot more spotty when file sizes go over that line. Performance is near parity when record-sizes are at 512KB and larger. CIFS-on-NetWare does best in the record-size 512KB and larger, and also in file-sizes 32MB and up. For the most common of file-access types, NCP-on-Linux would provide the best performance.

Overall, NCP-on-Linux appears to beat out CIFS-on-NetWare. The big exception is the ReRead test, where NCP-on-Linux doesn't even attempt to cache and the results are raw-IO. On a client station with small amounts of RAM, these results may be different since the caching being tested here is a function of the local machine rather than the servers. The servers do play a role, however, so this does need to be included.

Part 5: Comparing Uncached, NCP-on-NetWare vs CIFS-on-Linux
Summary

As with the CIFS test, caching was present in one half of the environment so analysis isn't straight forward. NCP-on-Linux involved local caching where NCP-on-NetWare apparently did not. This was a confusing result, since the identical client settings were used for both environments. Also interesting to note is that the results of NCP-on-Linux were similar to the results for CIFS-on-NetWare. There are differences, but the general trends were similar.

As with the CIFS tests, the two tests that give the best uncached results are the Reader and Backward Reader tests.
NCP Reader comparisonReader Test, Comparing NetWare vs. Linux. The value is the multiplier that NetWare is faster than Linux. Units are in KB.

As with the CIFS test, the key value here was the record size. The sense is inverted from the CIFS test, in that it is the Linux environment that performs faster than the NetWare environment. This is a surprising result, considering that NCP is a native NetWare protocol, and NCP on Linux is a relative newcomer. The magnitude of the improvement is comparable to that of CIFS-on-NetWare over CIFS-on-Linux, which is an interesting result by it self. Also, as with the CIFS-on-NetWare, the improvement of NetWare over Linux in the larger record sizes is quite visible. The improvement is on the order of 5%, which comes close to the 7% improvement on larger file-sizes reported by CIFS-on-Linux.
NCP Backward Reader, file size view, comparisonBackward Read Test, File Size view, NetWare vs. Linux. The value is the multiplier that NetWare is faster than Linux. Units are in KB.

This test showed a difference between the CIFS and NCP tests. Unlike the CIFS test, file-size was not a strong determiner of performance. Record size was more closely associated.
NCP Backward Reader, record size view, comparisonBackward Read Test, Record Size view, NetWare vs. Linux. The value is the multiplier that NetWare is faster than Linux. Units are in KB.

As you can see from the chart, record size is the thing that separates performance. The break comes between the 64K and 128K record sizes. Unlike the CIFS results on this test, the level of improvement for NCP-on-Linux is not to the same magnitude as the improvement for CIFS-on-NetWare. As has proven to be common with cached vs. uncached access, larger file-size access for the uncached method is a little faster. In this case about 5%, which isn't close to the 14% gain reported by CIFS-on-Linux.

As with the CIFS tests, the cache mechanism makes checking true performance of the file system under certain levels hard. However, the cache only provides performance boosts below certain file-size and record-size levels, so we do have some data to play with. Not as much as I'd like, but it is still there.

The 'Record Rewrite' test shows how the effectiveness of caching reduces over time.
NCP Record Rewrite, NetWare vs Linux comparison
Record Rewrite Test, average improvement per record-size, NetWare vs. Linux. The value is the average multiplier of NetWare performance over Linux performance, averaged across all file-sizes.

That is a very clear curve, and shows rather well that caching only handles the first 256K of a record rewrite and the rest is handled through normal methods. The point where NCP-on-NetWare pulls ahead of NCP-on-Linux is between 2MB and 4MB. The curve suggests that rewrites higher than 16MB would pull even farther ahead, but that sort of file-access is rather rare, all things considered.

The 'Random Write' test does not show an improvement for the uncached method like it did with the CIFS tests. The improvement is linear starting at 128K and never breaks 1.00. Again, once the record size gets beyond 32MB there may be a point where it does, but again this sort of file-access is rather rare.

The 'Writer' test is cached, which is in the Novell Client spec. The improvements are for file sizes larger than 32MB and record sizes larger than 2MB. In that range the improvement of NCP-on-NetWare is about 7%. In the FileSize > 32MB range, regardless of record-size, the improvement is 1%. In the RecordSize > 2MB range, regardless of record-size, the improvement is 7%. This tells us that Record Size is again the biggest determiner of performance.

Like the CIFS tests, the data here show that NCP-on-Linux is the better bet. This has much more to do with NCP-on-Linux being cached better than NCP-on-NetWare. I am suspicious of this result, since NCP-on-NetWare should have been doing local caching as well, but it wasn't. Since the majority of file-access on our file-servers is going to be with files under 64K in size, NCP-on-Linux is the better bet from a pure performance perspective.

Part 4: Comparing Cached, NCP-on-Linux vs CIFS-on-NetWare
Summary

Because caching was a big difference between the CIFS-on-NetWare and CIFS-on-Linux runs, analysis is a bit difficult. CIFS-on-NetWare had local caching involved, so clearly apparent performance will be better for most usages when connected to CIFS-on-NetWare. True performance is another story, and that's what I'm trying to define in this section.

There were a couple of tests that don't involve the cache mechanism. The "Reader" and the "Backward Read" tests. IOZONE documentation defines the tests as:
Reader: This test measures the performance of reading an existing file.

Backward Read: This test measures the performance of reading and writing a file with accesses being made to random locations within the file. Again the performance of a system under this type of activity can be impacted by several factors such as: Size of operating systemÂ’s cache, number of disks, seek latencies, and others. This test is only available in throughput mode. Each thread/process runs either the read or the write test. The distribution of read/write is done on a round robin basis. More than one thread/process is required for proper operation.

CIFS Reader comparisonReader Test, Comparing NetWare vs. Linux performance. The value is the multiplier that NetWare is better than Linux performance. Units in KB

As you can see from the graph, Record Size is the key determiner of performance for this test. For smaller record sizes, NetWare is clearly better than Linux at CIFS performance. This is on the first read, the 're-read' test had caching enabled and CIFS-on-NetWare was vastly better than CIFS-on-Linux as a result.

For record-sizes larger than 64K, CIFS-on-Linux provided an average improved performance of about 7%.
CIFS Backward Read comparisonBackward Read Test,comparing NetWare vs Linux performance. The value is the multiplier that NetWare is faster than Linux. Units in KB.

As you can see from the graph, it is file-size that determines performance. Not the record size. Thecorrelationn here is less clear than it was for the Reader test, but it is present. For large files, performance on Linux is somewhat better then that on NetWare. When the chart is rotated to present the RecordSize view, there is some improvement for small records but only at really small file-sizes.

For file sizes larger than 8MB, CIFS-on-Linux provided about 14% better performance.

For the tests that do involve the cache-mechanism, we can only compare results for the data-sets that don't involve the cache. Specifically, large file-sizes and large record-sizes. For the 'Record Rewrite' test, which rewrites sub-sets of larger files, Linux provides about 10% improvement over the same test on NetWare. For the 'Random Write' test, which writes to random locations within the file, the improvement for Linux is also about 10%. For the 'Writer' test, which just lays down the file, the improvement is about 15% over NetWare. In all three cases, as the sub-range increases in size the better Linux performs over NetWare.

In the end, for the data most likely to be used by an end-user at WWU, CIFS-on-NetWare is the better choice of the two. Larger Access databases, big Power Point slideshows, and GIS maps may perform slower, but for most file-access it'll be faster.

Part 3: NCP
Home
This'll go over a few posts, just due to the nature of the data and how I'm analyzing it.

Without going into detailed analysis of the data, a certain structure leaps out. Both CIFS-on-NetWare and NCP-on-Linux show clear signs of a local caching mechanism in use. This is odd, since I could have sworn I had NCP-on-NetWare enabled for local caching, but the data does not support that. What I have found is a sort of rule for caching.

For file operations on files 32MB or less, and in nibbles of 256K or less, the caching features strongly affect performance.

For file operations on files between 32MB and 64MB, and in nibbles of between 256K and 512K, caching features weakly affect performance.

For file operations on files larger than 64MB, or in nibbles of 1MB or larger, caching features do not affect performance.

Caching does not improve performance on all tests. Two tests very clearly show no influence of the caching mechanism. The Reader test, and the Backward Reader test. The Reader is the initial read of a file, so that makes sense that the caching would not affect performance. The Backward Reader test reads a file backwards, which is something that the caching mechanisms do not seem to pick up.

When you look at the rules and compare then to file-system statistics, you very clearly see that caching should improve performance on almost all operations the average user will perform. Inventories of our User-directory volumes show that 70% of files, by file-count, are 64K or smaller. Files larger than 32MB are a tiny, tiny percentage of files.

The situation on our big FacStaff shared volume is a little different. There 66% of files are 64K or smaller. In both cases, files larger than 64K consist of the majority of the data on the volumes. In the case of the shared volume, there are more files, as a percentage of total files, larger than the 32MB caching cut-off.

The conclusion you can draw from the above data is that NCP-on-Linux will be perceived as faster than NCP-on-NetWare. NCP-on-NetWare didn't show any caching behavior, so it suffers a major setback when compared to NCP-on-Linux which did exibit that behavior. The above data does not show what the 'true performance' of the two setups are. That'll come later, and the first-look data is that NCP-on-NetWare performs better in a no-caching state than NCP-on-Linux does.

Part 2: CIFS

W-a-y early benchmark results

I have an NCP and CIFS benchmark under my belt against a NW65SP4a box, and the results are weird.

First, the environment:
The ServerThe Client
OES SP1 a.k.a NW65SP4a
WinXPsp2
NCP Caching Enabled
Novell Client, Caching Enabled
OPLOCK 2 Enabled
1GB RAM
CIFS with Oplock2 enabled
1x 3.0GHz Intel CPU
2GB RAMHyperthreading ON
2x 3.2GHz Intel CPU
100MB Ethernet
Hyperthreading OFF

100MB Ethernet


The thing that stands out most clearly is that CIFS/SMB's caching mechanism is far better than NCP's. In several of the test types, througputs were reported in the 'jaw dropping' range for CIFS, and that can only be attributed to pretty agressive caching. Though, once file-sizes get much above 128M, caching only goes so far and you start getting the feel for the efficiency of the base filesystem and network I/O.

That said, probably the best way to test the base system is what IOZONE calls the 'Backward Read' test. The test consists of the file being read backward, so caching mechanisms have to be designed to handle that case. This is the only test where NCP-on-NW stomped CIFS-on-NW across the board (mostly), and even there the performance increase was on the order of 5-15%. The one area on that test that CIFS-on-NW beat out NCP-on-NW was at the 64K file-size with 4K records, where the performance increase for using CIFS-on-NW was on average 13% better.

The performance of the network caching is interesting. It is STILL a common thread in the support forums for the Sysops to recommend turning off NetWare's file-caching features due to continuing and ongoing bugs. Yet in a benchmark I read that compared NetWare against the just-released Windows 2003 Server a couple years ago, in order for NetWare to beat out the Windows server on file-system performance caching and oplocks had to be turned on. At the time, that configuration was a known unstable one in the support forums.

Another area to note in the data I have now, is that network I/O is more of a bottleneck than raw disk I/O. Performances in the graph that are higher than the theoretical 100Mb ethernet max have to be, by definition, the result of client-side caching. This is an important distinction, since our file-servers performance will be judged by how zippy they seem to end-users on mapped drives, not the performance of web/db-applications hosted on the file-server.

Keep in mind, this is just the very early look at the data. I haven't done nearly enough work to draw conclusions. For instance, our Novell Client build may turn off client-side caching in a way I'm not familiar with. These things need checking.

I (re)discovered iozone today. I first used it at OldJob to do some testing of new file-servers we had received, but I had forgotten the NAME of the tool. And without the name, googling didn't get me what I was looking for. Happily, I found it today. YAY!

As I've mentioned before, one of the stopping points for us regarding the future of Novell operating systems is fileserver performance. Now that I have a tool I can run, I plan to do multiple runs of the test suite against both a Netware/NSS back-end and a Linux/NSS back-end (all over NCP) and see what differences we have. This'll be on an HP Bladeserver, so the base hardware is identical. I'll also do what I can to make sure that the partition sizes are similar, and code-base is similar.

If I have time, I may also do the suite over SMB. CIFS for NetWare, and SAMBA for Linux. Due to our love-affair with the Novell Client, even 'whoa!' results won't get us off of NCP and on SMB. But still, it would satisfay a certain intellectual itch.

One thing I'd like to do, but just don't have the resources to pull off, is multiple simultanious runs of iozone. That would better simulate the multi-hit environment we have, and give a better indication of performance under heavy loads.

I'm doing a run right now against a volume in the NetWare cluster. The results so far are interesting. IOZone runs multiple tests on files of various sizes, and also on sub-sets of data inside those files. So when the 'record size' inside a file gets over 64K, the network card on my workstation saturates. TaskManager tells me my NIC is running at 70-85% total CPU, which is really close to the saturation point for switched Ethernet. Since the server is on the other end of a router, my IO is contending with other IO at the router level so I can't get much faster than I am right now.

What I do want to check into is file-access for smaller file-sizes. That represents the vast majority of files on these servers, so represents the key area we're concerned about. If it was all whonking huge GIS files, NO PROBLEM! But no, it is bajillions of .wpd, .ppt, and .jpg files.

If I can make sense of the data, I'll share! How's that for fun?

POWER

The last two days have been spent in the datacenter tracing power-cables. These are the big blind-spots when it comes to labeling, since most of the time we don't know the name of the server when the server gets racked. We do have some documentation, but like all datacenters everywhere there is a certain amount of re-use. For instance, the cable labled "galaxy" is plugged into the "upsmon" server. Ahem.

The reason for all of this is to see if we can shuffle circuits enough to cram in another blade-rack. Our problem is only slightly an underpowered UPS, but more a lack of available circuits. My explorations have proven that we under utilize the circuits we have, and consolidation of circuits will help liberate some. But those old circa-2000 servers with 3 power-supplies in them are a pain in the rear, let me tell you.

When we get the new UPS, we're gonna have to plan for more 3-phase 30-amp circuits from the get-go. Planning for more 120v 1-phase 30A circuits would be a good idea anyway, since it eases rack power-feeding.

OES-SP2 / NW65SP5 released

It came out. Apparently the short lead time for the NetWare patch was to synch patch-cycles with the linux side of the house. I'm not seeing any real new features listed, but I'm still digging.

OES Readme with Sp2

That does say that Novell is no longer discriminating between NetWare and OES service-packs, which was an expected move so no biggie. Also the Server Consolidation Utility and the Server Migration Utility have been merged into the same tool, which is only a logical progression so also no biggie.

It also looks like there is a bug/feature that prevents secure iPrint from working correctly with OS X (Tiger), which will cause problems for us in the dorms. "Happily", Macs are not officially supported by us, and this is done in a 'best effort' way. The fix is to turn off secure iPrint, but that violates certain immutable security principles we have in place. This means that OS X (Tiger) users will just have to live without iPrint until Novell catches up. I'm not sure if this is new with the service-pack, or just a known issue that has gotten enough traction that it made it to the SP documentation.

Other then that, from a NetWare perspective SP5 hasn't changed much. Considering the derth of post-SP4a patches out there right now, I expect SP5 to be dubbed "a good patch" by the support forum folk pretty quick. I know it includes my libC and xdav fixes, which both improve things :).

Playing with OES-Linux SP1

| 1 Comment
I've installed OES-Linux before, so I didn't expect the kinds of problems I had with this latest round. I was attempting to install OES-Linux SP1 into a VM-Ware session, and things went a little pear-shaped when it came time for the eDir install. For reasons beyond my understanding, the eDir install failed.

I've spent the last three days poking at this one. I'm surprised that the GUI didn't work. But the bright side to this is that troubleshooting the problem has caused me to learn a lot more about how all those fun YaST installers work under the hood.

The problem that nailed me was that eDir wouldn't install. It got to the point where it attempted to contact the newly-configured eDir, and gave me an unable to bind to LDAP error. Much poking later, and I discovered that eDir was infact up, but secure LDAP wasn't running. Since eDir was up I was able to point ConsoleOne at it and discovered that the LDAP Server object didn't have a certificate associated with it. Weird. I hand-associated it, and then Secure-LDAP started working.

But 'nldap -s' did not report that it was up, even though I could do secure-LDAP sessions with binds and everything. Odd. More poking, and I suspected that this was due to the eDir install process probably exporting keys or something and nldap not having the right trusted-root to play with. Or something.

The main reason I was trying to install OES-Linux in the first place was that I wanted to get NSS-on-linux running. Once I got an eDir responding to hails, I attempted to get NSS up and running. And it just plain would not install the user-object it needs to manage NSS-on-linux. No matter what I did. It spat an '6f' error, which did not yield to googling.

That wasn't the only feature. There were other oddities in the process that just made it a pain to work with.

So this afternoon I tried an edir-from-scratch again by way of 'ndsconfig new -t treename -a admin.me -n o=me' command. And miracle of miracles, this time it ran to completion without barking about not being able to associate a SSL certificate with LDAP. W-e-i-r-d. The ONLY difference that I know of between this attempt and earleir attempts is that I gave up and told it to install the server in the O rather than in an OU. Why that would make a difference, I just don't know. But it did. EDir installed just fine, and the oddities went away. NSS was able to get its user installed.

Unfortunately, by now I had uninstalled and reinstalled iManager, NSS, SMS, and LUM so many times that things were messy in the file-system. So I don't have a working iManager right now since my web-server on that box has gone away somehow. I don't understand that right now.

Friday, when I'm next at work, I plan on blowing the whole thing away and installing fresh. Only this time with the server in the O from the get-go, so I can see what a 'normal' OES install is supposed to look like. Perhaps then I'll get the NSS-on-linux thing up and running.

Now if only I had a test-box that wasn't also 4 years old, I could get some performance tests out of it. Hmmmmm.

Fun in the dorms

| 1 Comment
Safe Havens had a good one today. (Jan 05, 2006).

Print auditing

One of the things I learned at last year's BrainShare is that Novell is working on expanding their own built-in print auditing. This is an interesting development since pCounter is well developed for this. And in the words of my support tech during the print-pooling problems, Andy is one of the sharpest 3rd party developers he's worked with. My understanding is that the Novell solution wouldn't be as flexible as pcounter, but it wasn't clear in what way, 'less flexible,' will manifest.

It is my prediction that the upcoming SP5 (OES SP2) will include the auditing. But we'll see when that hits filefinder.

On migrations

| 2 Comments
Since Novell is moving away from NetWare as a stand-alone server, the topic of 'what now?' has been brought up. So I built a chart of the key service we provide on NetWare and what could replace them.

NetWare
OES-Lin
Windows
Printing
iPrint
iPrint
Spooler
Print-Audit
pCounter

pCounter
NetStorage
NetStorage
NetStorage
[custom or none]
MyWeb
mod_edir
mod_userdirs

SFTP
OpenSSH
OpenSSH [restricted]

Login Scripts
Login Scripts
Login Scripts
GPO + vb-scripts

OES-Linux is the easier option, but still is not without effort. While the whole thing would present to the end users the same way it always had, there are severe changes on the back end as far as we're concerned. Part of that is learning how to deal with the problems of a new OS, and how to keep it un-hacked. Plus, our print-auditing package doesn't support linux yet (if ever) so that has non-trivial impacts to how we manage paper around here.

Windows... it'll get brought up, even if we pee on it at every chance we get. The fact of the matter is that such a migration would be a lot more visible to the end users, and they'd lose certain services they've come to expect. I personally don't know of a Microsoft analog to NetStorage. The serving of web-pages from home directories is something I can't google an answer for, and I suspect involved non-trivial engineering (apache and judicious use of mod_rewrite might be able to fake it, but I don't know). SSH-based SFTP is something that'll probably require a product purchase from SSH.COM since the free stuff gives me the willies (I don't wanna run anything that requires a Cygwin installation). In short, any migration to a Microsoft based system will be the most painful of our options.

We're still a year or two from the decision point. That'll come when the nodes in the student half of the cluster start showing their age and need to be replaced. Events at BrainShare 2006 (and probably 2007) will act as wildcards as Novell makes their future plans more and more clear. We're gonna need a lot of training.

That was fast

Today I learned that NW65SP5 (yes, SP5) will be released in some form within a week. I don't know if it'll be entering beta, or public, but there will be an Sp5 out soon. My LibC patch fun will be in there. If I'm counting days right, this is about one quarter after SP4a came out. There aren't a lot of post-SP4a patches out there (for the NetWare side anyway), so I'm wondering what's pushing it.

This also has me wondering about BrainShare timing. That's less than three months away now, and Novell has a tendancy to release major things right before or at BrainShare in order to get the maximum spin off of it. Perhaps OES 2.0 will be coming out then?