Recently in stats Category

Our tape library is showing its years, and it's time to start moving the mountain required to get it replaced with something. So this afternoon I spent some quality time with google, a spread-sheet, and some oldish quotes from HP. The question I was trying to answer is what's the optimal mix of backup to tape and backup to disk using HP Data Protector. The results were astounding.

Data Protector licenses backup-to-disk capacity by the amount of space consumed in the B2D directories. You have 15TB parked in your backup-to-disk archives, you pay for 15TB of space.

Data Protector has a few licenses for tape libraries. They have costs for each tape drive over 2, another license for libraries with between 61-250 slots, and another license for unlimited slots. There is no license for fibre-attached libraries like BackupExec and others do.

Data Protector does not license per backed up host, which is theoretically a cost savings.

When all is said and done, DP costs about $1.50 per GB in your backup to disk directories. In our case the price is a bit different since we've sunk some of those costs already, but they're pretty close to a buck fiddy per GB for Data Protector licensing alone. I haven't even gotten to physical storage costs yet, this is just licensing.

Going with an HP tape library (easy for me to spec, which is why I put it into the estimates), we can get an LTO4-based tape library that should meet our storage growth needs for the next 5 years. After adding in the needed DP licenses, the total cost per GB (uncompressed, mind) is on the order of $0.10 per GB. Holy buckets!

Calming down some, taking our current backup volume and apportioning the price of largest tape library I estimated over that backup volume and the price rises to $1.01/GB. Which means that as we grow our storage, the price-per-GB drops as less of the infrastructure is being apportioned to each GB. That's a rather shocking difference in price.

Clearly, HP really really wants you to use their de-duplication features for backup-to-disk. Unfortunately for HP, their de-duplication technology has some serious deficiencies when presented with our environment so we can't use it for our largest backup targets.

But to answer the question I started out with, what kind of mix should we have, the answer is pretty clear. As little backup-to-disk space as we can get away with. The stuff has some real benefits, as it allows us to stage backups to disk and then copy to tape during the day. But for long term storage, tape is by far the more cost-effective storage medium. By far.

Browser usage on tech-blogs

| No Comments | No TrackBacks
Ars Technica just posted their August browser update. They also included their own browser breakdown. ArsTechnica is a techie site, so it comes as no surprise what so ever that Firefox dominates at 45% of browser-share. This made me think about my own readership.

Browser share piechart for September 09
As you can see, Firefox makes up even more of the browser-share here (50.34%). Interestingly on the low end, Opera is actually the #3 browser (4.46%), not Safari (3.43%). Looking at the version breakdown for those IE users, only 17% of them are on IE6. Yay!

ArsTechnica's Safari numbers are not at all surprising, since they cover a fair amount of Apple news and I don't.

So yeah, Tech blogs and sites don't have a lot of IE traffic. Or, so I believe. What are your numbers?

Printing habits

| 3 Comments | No TrackBacks
Some students are going to be in for a rude, rude surprise real soon. Today alone there is a student who has printed off 210 pages. Looking at their print history, they printed off 100 copies of two specific handouts (in batches of 50), and that's 40% of their entire quota for the quarter. Once they hit the ceiling, they'll have to pay to get more. This is different from last year!

We always got a few students who rammed their head against the 500 page limit within two weeks of quarter start. I'm sure we'll get some this quarter too. There may be heated tempers at the Helpdesk as a result, but thems the breaks.
As I've mentioned several times here, our data-center was designed and built in the 1999-2000 time frame. Before then, Technical Services had offices in Bond Hall up on campus. The University decided to move certain departments that had zero to very little direct student contact off of campus as a space-saving measure. Technical Service was one of those departments. As were Administrative Computing Services, Human Resources, Telecom, and Purchasing.

At that time, all of our stuff was in the Bond Hall data-center and switching room. This predates me (December 2003), so I may be wrong on some of this stuff. That's a tiny area, and the opportunity to design a brand new data-center from scratch was a delightful one for those who were here to partake of it.

At the time, our standard server was, if I've got the history right, the HP LH3. Like this:
An HP LH3
This beast is 7U's high. We were in the process of replacing them with HP ML530's, another 7U server, when the data-center move came, but I'm getting a bit ahead of myself. This means that the data-center was planned with 7U servers in mind. Not the 1-4U rack-dense servers that were very common at that time.

Because the 2U flat-panel monitor and keyboard drawers for rack-dense racks were so expensive, we decided to use plain old 15-17" CRTs and keyboard drawers in the racks themselves. These take up 14U. But that's not a problem!

A 42U rack can take 4x of those 7U servers, and one of the 14U monitor/keyboard combinations for a total of...42U! A perfect fit! The Sun side of the house had their own servers, but I don't know anything about those. With four servers per rack, we put in a Belkin 4-port PS-2 KVM switch (USB was still too new fangled in this era, our servers didn't really have USB ports in them yet) in each. As I said, a perfect fit.

And since we could plan our very own room, we planned for expansion! A big room. With lots of power overhead. And a generator. And a hot/cold aisle setup.

Unfortunately... the designers of the room decided to use a bottom-to-top venting strategy for the heat. With roof mounted rack fans.
Rack fans

And... solid back doors.

Rack back doors

We got away with this because only HAD four servers per rack, and those servers were dual processor 1GHz servers. So we only had 8 cores running in the entire rack. This thermal environment worked just fine. I think each rack probably drew no more than 2KW, if that much.

If you know anything about data-center air-flow, you know where our problems showed up when we moved to rack-dense servers in 2004-8 (and a blade rack). We've managed to get some fully vented doors in there to help encourage a more front-to-back airflow. We've also put some air-dams on top of the racks to discourage over-the-top recirculation.

And picked up blanking panels. When we had 4 monster servers per rack we didn't need blanking panels. Now that we're mostly on 1U servers, we really need blanking panels. Plus a cunning use of plexi-glass to provide a clear blanking panel for the CRTs still in the racks.

And now, we have a major, major budget crunch. We had to fight to get the fully perforated doors, and that was back when we had money. Now we don't have money, and still need to improve things. We're not baking servers right now, but temperatures are such that we can't raise the temp in the data-center very much to save on cooling costs. Doing that will require spending some money, and that's very hard right now.

Happily, rack-dense servers and ESX have allowed us to consolidate down to a lot fewer racks, where we can concentrate our good cooling design. Those are hot racks, but at least they aren't baking themselves like they would with the original kit.

We go through paper

| No Comments | No TrackBacks
Spring quarter is done, and grades are turned in. Time to take a look at student printing counts.

Things are changing! Printing costs do keep rising, and for a good discussion of realities check this article from ATUS News. The numbers in there are from Fall quarter. Now I'll give some Spring quarter numbers (departmental break down is different than the article, since I'm personally not sure which labs belong to which department):

Numbers are from 3/29/09-6/17/09 (8:17am)
Total Pages Printed: 1,840,186
ATUS Labs: 51.5%
Library Printers: 22.9%
Housing Labs: 14%
Comms Bldg: 6.8%
Average pages printed per user: 152
Median pages printed per user: 112
Number of users who printed anything: 12,119
Number of users over 500 pages: 292
Number of users over 1000 pages: 7
"Over quota" pages printed: 181,756

Students get a free quota of 500 pages each quarter. Getting more when you run out is as simple as contacting a helpdesk and getting your quota upped. As you can see, 2.4% of users managed to account for 9.9% of all pages printed.

It is a common misconception that the Student Tech Fee pays for the 500 pages each student gets. In actuality, that cost is eaten by each of the departments running a student printer, which means that printing has historically been a gratis benefit. There is a proposal going forward that would change things so you'd have to pay $.05/page for upping your quota. The top print-user of this quarter would have had to have paid a bit over $69 in additional quota if we were using the pay system. If ALL of the users who went over 500 pages had to pay for their additional quota, that would have generated a bit under $9088 in recovered printing costs.

Of course, if added print quota were a cost-item it will depress that overage activity. Another curve-ball is the desire of certain departments to offer color printing, but they really want to recapture their printing costs for those printers. So far we haven't done a good job of that, but we are in the process of revising the printing system to better handle it. It'll require some identity-management code changes to handle the quota management process differently, but it is at least doable.

Once you start involving actual money in this transaction, it opens up a whole 'nother can o worms. First off, cash-handling. Our help-desks do not want to be in the business of dealing with cash, so we'll need some kind of swipe-kiosh to handle that. Also, after-hours quota-bumps will need to be handled since our helpdesks are not 24/7 and sometimes students run out of quota at 6am during finals week and have an 8am class.

Second, accounting. Not all labs are run by ATUS. Some labs are run by departments that charge lab-fees that ostensibly also support lab printing. Some way to apportion recovered printing costs to departments will have to be created. Those departments that charge lab-fees may find it convenient to apply a portion of those fees to their student's page-quota.

Third, extra-cost printing like color. PCounter allows printers to be configured to not allow the use of 'free quota' for printing. So in theory, things like color or large-format-printers could be configured to only accept paid-for quota. These devices will have a cost higher than $.05/page, perhaps even $.25/page for color, or much higher for the LFP's. It could also mean some departmental lab-managers may decide to stop accepting free-quota in their labs and go charge-only.

Fourth, carry-over of paid quota. If a student forks over $20 for added quota, that should carry over until it is used. Also, it is very likely that they'll demand a refund of any unspent print-quota on graduation. Some refund mechanism will also need to be created.

I don't know at what step the proposal is, all I know is that it hasn't been approved yet. But it is very likely to be approved for fall.

MyWeb statistics

| No Comments | No TrackBacks
An interesting inversion has happened recently. The Fac/Staff myweb has overtaken the Student myweb in terms of data moved. Most of the student-side top-transfer stuff are mp3 files and other media files, where on the fac/staff side the top transfers seem to be big zip-files for classwork.

It would not surprise me to learn that some faculty have figured out that you can get around Copy Services by putting together PDFs of the articles they want to pass out to students. I already know that many faculty use MyWeb for distributing things to their students, not everyone loves BlackBoard.

Paper paper paper

| 1 Comment | No TrackBacks
It is spring break, so I can now run nice statistics on our pcounter printers! Not all of our labs are on pcounter, so these numbers are not the total printing at WWU. However, if plans to recover printing costs by charging for quota increases go through, I expect I should have the entire campus on pcounter within 3 quarters of go-live. Students WILL find the 'free' labs. And the lab admins will notice that their printing costs go way up. And then they'll call me.

Winter quarter
Pages printed: 1,781,272
Total number of students who printed at least one page: 12,482
Average pages printed by a student: 142.6
Median pages printed: 106
The busiest printer printed 133,791 pages
Res-halls printed 255,344 pages

That's a lot of paper.

Spam stats

| No Comments | No TrackBacks
It has been a while. Some current spam stats from our border appliance:


Processed Spam Suspected Spam Attacks Zombie Blocked Allowed Viruses Suspected Virus Worms Unscannable Malware Passworded .zip file
Content
Last 30 days
9,489,662 4,328,334 (46%) 8,895 (<> 221,926 (2%) 4,163,253 (44%) 1 (<> 10,248 (<> 1,265 (<> 1,233 (<> 2,590 (<> 26,508 (<> 0 (0%) 13 (<> 0 (0%)

That is a lot of spam. The 'Zombie' class are connections rejected due to IP reputation. Between zombies and outright spam detections, 95% of all incoming mail has been rejected over the last 30 days. Shortly after the McColo shutdown that percentage dropped a lot. We're now back to where we were before the shutdown.

Website stats

| No Comments | No TrackBacks
We purchased Urchin's web stats package before Google bought them. We're still using that creaky old software, even though its ability to interpret user agent string is diminishing. I'm not in the webmaster crew for this university, I'm just a client. But I do track the MyWeb stats through Urchin.

I also track our MyFiles (NetStorage) stats. This became more interesting the other day when I noticed that just over 40% of the bandwidth used by the Fac/Staff NetStorage server was transferred to user agents that are obviously WebDav. This is of note because WebDav clients are not javascript enabled, and thus will not register to things like Google Analytics. If I had been relying on Google Analytics for stats for NetStorage, I'd have missed the largest single agent-string.

Even though Google bought Urchin, it makes sense that they dropped the logfile-parsing technology in favor of a javascript enabled one. Google is an advertising firm with added services to attract people to their advertising, and it's hard to embed advertising in a WebDav connection. It used to be the case that RSS feeds were similar, but that's now a solved problem (as anyone who has looked at slashdot's feed knows).

In my specific case I want to know what the top pages are, which files are getting a lot of attention, as well as the usual gamut of browser/OS statistics (student myweb has a higher percentage of Mac hits, for one). One of the things I regularly look for are .MP3 files on the Student MyWeb service getting a lot of attention. For a while there the prime user-agents hitting those files were flash-based media player/browsers embedded on web pages, just the sort of thing that only logfile-parsing would catch.

One thing that the NetStorage logs give me is a good idea as to how popular the service is. Since apache logs will show the username if the user is authenticated, I can count how many unique user ID's used the service over a specific period. That can tell me how strong uptake is. I may be wrong, but I believe Google Analytics wouldn't do that.

The Urchin we have is still providing the data I need, but it's still getting stale. It's OS detection is falling apart in this era of Vista and 64-bit anything. But, it's still way better than Analytics for what I need.

Browser stats

| No Comments | No TrackBacks
Some stats of who reads this blog (roughly):
FF3

45%
IE

42%
FF2
4%
Other
9%

And operating system:

XP
63.20%
Vista
14.98%
Linux
12.86%
OS-X
5.40%
Other
2.33%
Win2003

1.23%

Firefox out numbers IE, which is no surprise considering the type of search-traffic I get. I'm a bit surprised at how little Safari I see. Also interesting is that Linux edges out Mac on the OS chart, again not that surprising considering what I talk about here.

Other Blogs

My Other Stuff

About this Archive

This page is an archive of entries from June 2010 listed from newest to oldest.

May 2010 is the previous archive.

July 2010 is the next archive.

Find recent content on the main index or look in the archives to find all content.