First post of the year, status report

| 1 Comment

Welcome to 2005. Remember "05" on your checks.

The cluster has survived the weekend at NW6.5. However, I'm not liking some of what I'm seeing. From some of my other servers it looks like NW6.5 is more vulnerable to memory fragmentation than NW6.0 was. We had one cluster node lock hard two days ago, which caused failovers.

One of the services that didn't completely survive was myweb for FacStaff, the webserver that serves this very blog. After looking into what went screwy, it looks like the server served pages for several hours before it started returning a "111" error. When I look up that error message, I find that it is "Generic file system error; see filesyserrno". On a previous troubleshoot of mod_edir and LibC I know that filesyserrno is an error-trapping number. In this case it is not being extracted to the logfiles, so I'm not sure what it returned. The only way I know to grab it again is to set a breakpoint, and that just isn't nice to do with a cluster-node. What it does tell me is that Something went wrong and it couldn't handle it.

This is an example of a problem in LibC, not mod_edir. LibC is the Netware library that is multi-processor aware, long filename aware, and getting more POSIXy as time moves on. The old CLib library which began live in the NW2.x products is none of these. Apache1.3 and the modules that accompany it called mod_hdirs and mod_rdirs were ported to Netware using the CLib library. This is why the Apache1.3 version of MyWeb is far more stable than the Apache2.0 (linked to LibC instead of CLib) version.

MyFiles (a.k.a. NetStorage) has survived the move to NW6.5 very well. Things are getting used, and I now have logfiles to gauge usage. Also important, since we're using Apache2.0 instead of Apache1.3 I now can use RotLogs to make sure my logfiles don't get insanely large. We've had some MyFiles outages in the NW6.0 days due to the error_log file hitting 4GB.

1 Comment

Yay it's back!