Recently in apache Category

NetStorage and IE7

| 1 Comment
Looks like there is a bug in NetStorage (NW65SP7, not sure if SP8 fixes it or not) and IE7. When you're browsing along, select a file for download, then go to File -> Download, you get a login screen. No matter what you enter, it won't let you download the file. Access forbidden!

You can get around it by double-clicking on the file you want to download.

However, this also breaks upload and there is no workaround for that.

Works just fine in non IE7 browsers. I understand this issue is known by Novell.

On the server side, I can see a few signs of this in the log-files. There is a line like this for a failed download attempt:

140.160.246.45 - - [22/Jan/2009:11:15:40 -0800] "GET /oneNet/netstorage/Home@WWU/ac228.tgz HTTP/1.1" 401 1370 "-" "Java/1.4.2_13"

That IP is the server's IP address, not the clients. The user agent is Java. Clearly (to me anyway) Tomcat is proxying the download request and thus creating the new user-agent string. The rest of this session is with a normal IE7/WinXP user-agent.

Now a successful download (firefox):

140.160.246.45 - username [22/Jan/2009:11:16:39 -0800] "GET /oneNet/NetStorage/Home@WWU/cert.txt HTTP/1.1" 200 3329 "-" "Java/1.4.2_13"

The observant may notice some case differences there. I thought of the same thing, and did some poking around in IE to get this:

140.160.246.45 - - [22/Jan/2009:11:18:14 -0800] "GET /oneNet/NetStorage/Home@WWU/cert.txt HTTP/1.1" 401 3329 "-" "Java/1.4.2_13"

Same case as the Firefox access, but still failed. I don't know why this is doing this, but clearly something inside Tomcat isn't happy with how IE7 is handling the POST request that requests the download.

Website stats

We purchased Urchin's web stats package before Google bought them. We're still using that creaky old software, even though its ability to interpret user agent string is diminishing. I'm not in the webmaster crew for this university, I'm just a client. But I do track the MyWeb stats through Urchin.

I also track our MyFiles (NetStorage) stats. This became more interesting the other day when I noticed that just over 40% of the bandwidth used by the Fac/Staff NetStorage server was transferred to user agents that are obviously WebDav. This is of note because WebDav clients are not javascript enabled, and thus will not register to things like Google Analytics. If I had been relying on Google Analytics for stats for NetStorage, I'd have missed the largest single agent-string.

Even though Google bought Urchin, it makes sense that they dropped the logfile-parsing technology in favor of a javascript enabled one. Google is an advertising firm with added services to attract people to their advertising, and it's hard to embed advertising in a WebDav connection. It used to be the case that RSS feeds were similar, but that's now a solved problem (as anyone who has looked at slashdot's feed knows).

In my specific case I want to know what the top pages are, which files are getting a lot of attention, as well as the usual gamut of browser/OS statistics (student myweb has a higher percentage of Mac hits, for one). One of the things I regularly look for are .MP3 files on the Student MyWeb service getting a lot of attention. For a while there the prime user-agents hitting those files were flash-based media player/browsers embedded on web pages, just the sort of thing that only logfile-parsing would catch.

One thing that the NetStorage logs give me is a good idea as to how popular the service is. Since apache logs will show the username if the user is authenticated, I can count how many unique user ID's used the service over a specific period. That can tell me how strong uptake is. I may be wrong, but I believe Google Analytics wouldn't do that.

The Urchin we have is still providing the data I need, but it's still getting stale. It's OS detection is falling apart in this era of Vista and 64-bit anything. But, it's still way better than Analytics for what I need.

Web-servers

Looking at usage stats, the amount of data transferred by Myweb for Students has gone down somewhat from its heyday in 2006. I blame Web 2.0. Myweb is a static HTML service. We don't allow any server-side processing of any kind other than server-side includes. This is not how web-development is done anymore. This very blog is database backed, but Blogger publishes static HTML pages to represent that database, which is why I'm able to host this blog on Myweb for FacStaff.

If we were to provide a full-out hosting service for our students (and staff), I'm sure there would be a heck of a lot more uptake. A few years ago there was a push in certain Higher Ed circles to provide a, "portfolio service", which would host a student's work for a certain time after graduation so they could point employers at it as a reference. We never did that for a variety of reasons (cost being a big one), but the sentiment is still there.

If we were to provide not only full-out hosting, but actual domain-hosting for students, it could fill this need quite well. Online brand is important, and if a student can build a body of work on "$studentname.[com|org|net|biz]" it can be quite useful in hunting down employment. Several of the ResTek technicians I know have their own domains hosting their own blogs, so the demand is there.

I've never worked for a company that did web-hosting as a business item, so I've only heard horror stories of how bad it can get. First of all, we'll need a full LAMP stack server-farm to run the thing. That's money. Second, we'll need the organizational experience with the technology to prevent badly configured Wordpress or PhpBB installs from DoSing other cohosted sites from resource-exhaustion by hackers. This is a worker-hours thing.

Then we'd have to figure out the graduated problem. Once a student graduates, do we keep hosting for them? Do we charge them? Do we force them off the system after a specific time? Questions that need answers, and these are the kinds of questions that contributed to the killing of the portfolio-server idea.

Personally, I think this is something we could provide. However, someone needs to kick the money tree hard enough to shake loose the funds to make it happen. Perhaps Student Tech Fee could do it. Perhaps it could be a 'discounted' added-cost service we provide. Who knows. But we could probably do it.

Web site statistics

We use Urchin 5.6 for our web site statistics. This works better for us than Google Analytics for a number of reasons, which is why it is somewhat irksome that a newer version of the Urchin software hasn't come out. I hear reports that Google, who bought Urchin a while back, is working on a new software based version of their statistics software but I haven't heard much.

I hope it comes out.

Google Analytics is unabashedly designed around advertising-related statistics. No surprise, since that's where the money is to be made. And for that, it works great.

What it doesn't do is tell me a few, very key things:
  • How many total bytes did this web-server serve in this time period? Network monitoring will give me the whole server, but this will give me the specific web-server itself.
  • What are the top 10 hit files?
  • What are the top 10 files generating traffic?
These are things I'm concerned about as a webmaster. This is stuff you can only get by parsing web-server logs.

Of the top 10 hit files on student MyWeb, 6 of them would be revealed with Google Analytics.
Of the top 10 files on student MyWeb generating traffic, which consists of 81% of total data transfer, not a single one would be revealed by Google Analytics.

The top file last week for student MyWeb is an MP3 file generating 31% of total data transfer traffic. After digging into the actual log-files to see what is referring that traffic, I learned that there is a new flash-based music search service out there. While Analytics would track the loading of the flash file itself on those not-WWU servers, it won't track the transfer from my server. That Flash prog definitely doesn't execute custom Javascript.

Google Analytics and server-log parsing programs serve different market segments. Google, understandably, is only interested in the ad-driven segment. I just wish they'd get off their butts and release a new version of the log-parsing Urchin software.
Recent events in Virginia sparked discussions today about if something like that happened to us. All that national attention is akin to getting www.wwu.edu slashdotted, especially any emergency page we may think to prepare. This is why even in this day and age old fashioned media is the best way to get a specific message to LOTS of people. The wwu front page as it exists RIGHT NOW would melt the web-server should something that nationally recognized occur.

That said, given warning we could put together a server that can handle slashdotted loads. We know how. A static page works best, and we have enough web-servers scattered about that running the page through the BigIP to fork loads over 12 servers will allow us to keep up to loads. Heck, I still maintain that the MyWeb servers could handle those loads if given the go-ahead to run by itself.

Running a server with a database of all the students, staff, campus visitors, and Bellingham residents who are confirmed to be Not Dead, the sort of information most in demand by those concerned about aforementioned people, is a lot more work and a lot more resource intensive. Anything database driven has orders of magnitude more resources required to support that level of load.

This isn't something we've felt the need to prepare for, though. We do have an emergency page that can be hosted off site, somewhere, but it isn't designed for this type of disaster. It was designed with a Katrina-level (or more likely in our case, a &*!$ big earthquake in the area) disaster, where the school is closed and the whole region is suffering. Something like the previous paragraph could be hosted in town, even. Heck, even Mt. Baker popping wouldn't do us in because:
  1. We're up wind, so the ashfall wouldn't hit us.
  2. WWU is not in any of the historic lahar paths.
  3. Baker has no history of 'catastrophic flank collapse' eruptions (Mt. St. Helens in 1980).
Who knows. These sorts of events are the type that change disaster planning nation-wide.

Myweb and KML files

I had a call today about getting some google earth extension working correctly from MyWeb. Not a big deal. I gather there is some hot feelings about this, but I'm at at least two removes from where said feelings are. The long and short of it is that .KML files hosted on MyWeb were not rendering correctly because our web-server wasn't configured for that particular mime-type. The fix is as simple as the linked document says. Add those two lines to the MyWeb config file, and done. Or just add the lines to the "mime.types" file.
mime.types:

application/vnd.google-earth.kmz+xml kml
application/vnd.google-earth.kmz kmz
Again, not a big thing.

Point of fact, end-users can add non-supported mime-types to their own myweb directories by creating what Apache calls a ".htaccess file". Because Windows Explorer will not create a file that starts with a dot, I've set things up so the following three file-names can be used for the same thing. This is a file that goes into the directory being served:

.ht
ht.acl
.htaccess

Create an "ht.acl" file, and add the two suggested lines to it:

AddType application/vnd.google-earth.kml+xml .kml
AddType application/vnd.google-earth.kmz .kmz

And TADA! Your KML file works, and I didn't have to do anything. Clearly a system-wide approach is preferred, but this would get an obscure app to work.

I have a directory off of my own myweb directory that I use for "emailing" large attachments. I put the file into that directory, and mail a link to it. Useful for things like 50MB files. Or, on one occasion, a 724MB NetWare service-pack.This is the contents of that ht.acl file:
deny from all
allow from 140.160.0.0/16
Which means, no off-campus access. Which is good, since I don't want that directory craweld by google and indexed ;). If I ever mail large files to any of the ResTek folk, I'll add their 'net into the list.

Oo, not good

Just had an abend on the server handling student MyFiles. And I don't like the look of this Abend.log. Icky.
*********************************************************
Novell Netware, V6.5 Support Pack 5 - CPR Release
PVER: 6.50.05

Server STUSRV2 halted Wednesday, August 2, 2006 5:45:31.592 pm
Abend 1 on P00: Server-5.70.05-1937: CPU Hog Detected by Timer

Registers:
CS = 0060 DS = 007B ES = 007B FS = 007B GS = 007B SS = 0068
EAX = FBF17BC3 EBX = 045F34E0 ECX = 045F3554 EDX = 00000046
ESI = 05ED3DE4 EDI = 05ED3DE4 EBP = 1AFCCF56 ESP = 96C5E970
EIP = 00000000 FLAGS = 00200002


Running process: Apache_Worker 145 Process
Thread Owned by NLM: APACHE2.NLM
Stack pointer: 96C5E988
OS Stack limit: 96C50840
Scheduling priority: 67371008
Wait state: 3030070 Yielded CPU
Stack: --FBF17BC3 ?
00114541 (LOADER.NLM|WaitForSpinLock+71)
--00000000 (LOADER.NLM|KernelAddressSpace+0)
0011435D (LOADER.NLM|kspinlock_patch+76)
8EF637FB (NWUTIL.NLM|_SCacheFreeMP+3B)
--045F3554 ?
--05ED3DE4 ?
--05ED3DE4 ?
8EF62BF5 (NWUTIL.NLM|NWUtilFree+25)
--045F34E0 ?
--05ED3DE4 ?
8EF571E9 (NWUTIL.NLM|dt$ConfigFile+C9)
--05ED3DE4 ?
--8EFB6280 ?
--05ED3DE4 ?
--8D098D60 ?
--1E8011B4 ?
8EF57211 (NWUTIL.NLM|CF_Delete+11)
--05ED3DE4 ?
--00000002 (LOADER.NLM|KernelAddressSpace+2)
003630FD (SERVER.NLM|FunnelingWrapperReturnsHere+0)
--05ED3DE4 ?
--CCCCCCCC ?
--CCCCCCCC ?
--CCCCCCCC ?
--CCCCCCCC ?
--CCCCCCCC ?
--CCCCCCCC ?
--CCCCCCCC ?
--CCCCCCCC ?
--CCCCCCCC ?


See the end of that stack trace? All those "CCCCC" entries? Me thinks something busted a buffer somewhere. We'll see if this reoccurs. Oh I hope I don't have another cycle of NetStorage abends. Those are hard on servers.

Tags: ,