Recently in stats Category

Well, that'll be fun to watch

| 1 Comment

With Google shutting down Google Reader, how about 87% of my subscribers read my blog, it's going to be fun to watch how the reader percentages shift over the next four months. Back when I started tracking what's consuming the feed Google Reader wasn't around, Bloglines was the over-50% leader. That's since changed.

As of right now, the #2 reader is 'unknown' at 3.4%. Mozilla's built-in reader is in the #3 spot at 1.9%.

In four months time, when Google shuts off Google Reader I'm sure those numbers will be radically different. I'll probably lose a very large number of subscribers from simple inertia. Hey, that happens. I'm interested to see how the feed-reading market solidifies in a post-GReader world.

Interestingly, they're not shutting down Feed Burner. Considering that the vast majority of the readers hitting Feed Burner are, well, Google Reader, I wouldn't be surprised if that also goes in the next round of Spring Cleaning.

One of the questions that SysAdmins frequently get asked is:

I have a web application based on $Platform. It needs to support $NumUsers concurrent connections. How much server do I need?
$Platform can be anything from 'php' to 'tomcat' to the incredibly unhelpful 'linux'. $NumUsers can be anything from a reasonable number to completely unreasonable numbers representing the anticipated worst-case (or maybe that's 'best case') scenario (50,000 users! Two meeeeelion Users!).

The answer they're looking for is:

Two AWS large instances.

The answer they'll get is:

It depends on the application code.
They're laboring under the misconception that $Platform and $NumUsers are the only variables in the Grand Equation of Scaling. HAHahahahaahaha.  There actually is a GEoS, but I'm getting ahead of myself.



A year of stats

2011 was a busy year. I changed jobs and became a moderator on ServerFault, both of which impacted my blogging activity. The former more than the later due to greatly reduced incidences of boredom. What kind of traffic did I get last year? Not a lot, but enough.

  • 40K pageviews over the year.
  • 20K separate visitors, clearly I get a lot of search traffic.
  • 1.1M feed-hits.
  • 412K feed-article views. Clearly most of my readers are using, er, readers.
  • Busiest day: January 27 (I made reddit)
  • Max feed-subscribers: 450
  • Max feed-hits on a day: 1,295

Top Content, Web

  1. The Linux Boot Process, a chart
  2. LIO-Target on OpenSUSE
  3. Powershell and ODBC
  4. Reverting LVM Snapshots
  5. Sysadmin Best Practices

Top Referring Sites

  1. Reddit (nearly all of it for the boot-process chart)
  2. Google (thank you search)
  3. Stumbleupon (more boot-process chart, but a few others as well)
  4. Planet Sysadmin (an RSS aggregator of a bunch of sysadmin blogs)
  5. ServerFault (some from questions, more from links in the chat-room. Hi guys).

Is this the big-time? Hardly. Making reddit that one time added about 40 subscribers. Leaving WWU lost me about 50 subscribers, which I've since gained back with interest for various reasons.

This does represent growth over 2010, which I'm quite OK with. Blogging for a for-profit company that has secret-sauce to protect and I'm also working on has constrained what I can talk about here, which is why ServerFault content has been more prominent of late. However, 2012 is the year where some of what I'm working on will be let out the door to soar (or crater), so hopefully I'll be able to talk more about that stuff once it's out.

Meta-commentary on sysadminly things in general will continue, though. 

Happy New Year!

Changing student storage habits

I had to do some maintenance on my script that gathers disk-space usage, so the stats database has been on my mind lately. It's been a while since I posted any graphs. This particular graph is a unified chart of the student home-directory volumes over time. I merged the NetWare and Windows volumes into a single space-used chart.

stu-vols-2011.png
This is a very noisy chart.The discontinuities are mostly student-account-purge events that happen once a quarter, but the fall purge is by far the largest.

Note the downward tail at the end! The same chart for staff is a pretty smooth line straight up at a pretty steady slope. This? Clearly usage-habits are changing. I don't know if this is reflected by habitual USB-drive use or if they're using the cloud in some way to store their files, but clearly student-driven storage demand (at least for home-directories) is falling.

One area where it is clearly increasing is the Blackboard Content volume.
bbcontent-2011.png
This data is noisy in that we purge old courses, but we've also changed how many quarters of courses we keep in the system. Looking at this growth chart, it's pretty clear to me that the downtick in student home-directory and class-volume consumption is made up for in increased Blackboard usage. Each quarter more and more professors sign on, other professors increase their usage, and the average size of the files being passed into the system increases.

The ebb and flow of student life

Looking at our bandwidth chart for the last day I can really tell it's finals week.
Student-Ebb-Flow.png
See that trail-off about 1am last night (Sunday)? That normally starts earlier and bottoms out faster when students aren't up doing finals-related things. I know from watching printing-activity reports for overnights that for the first part of finals week our printing activity between 1:30am and 6am is markedly higher than it is at any other time during the quarter. Bandwidth-usage also increases during this time as they take YouTube breaks and whatnot whilst typing madly. By Friday the chart should be a lot flatter as students who don't have end-of-week finals up-root and leave for home mid-week.

Back on printing, our morning peak starts earlier during finals. During most of the quarter usage starts rising about 6:30am and doesn't really get going until 7:30-8:00. This time of the quarter the rise starts at 6am, and is a lot busier, earlier. The steady drumbeat of mid-terms means that there are usually a couple of people pulling all-nighters starting roughly the 3rd week of classes, but finals really focus everyone.

Usage statistics

| 1 Comment
There is a not at all surprising disconnect between what Google analytics reports for this blog and what logfile analysis reports. In light of the FTC's push for an "opt out" button for tracking, I'm guessing the javascript-method of website tracking is going to be less effective.

Operating system:

Google Analytics
Log analysis
Windows
66.7%
58.7%
Linux
23.8%
22.9%
Mac
9.2%
3.7%
Other

14.6%
Interestingly, log analysis also breaks down the OS versions in use. I'm happy to note that the large majority of the Linux users are Suse variants. XP users still outnumber Vista/Win7 users.

Browser:

Google Analytics
Log analysis
Firefox
37.3%
44.3%
Internet Explorer
22.22%
21.9%
Chrome
29.37%
9.8%
Opera
3.17%
6.7%
Safari
4.76%
1.6%
Other/unknown
3.18%
15.7%
The other/unknown is likely the log-analysis engine's inability to figure out some agent strings. At a guess it's really under-reporting all the Chrome users out there. Even so, there are significant differences between the two. To me this looks like Firefox users are much more likely to be using NoScript.

And finally, once browsers start scrambling the User Agent String, even that will be not useful for this kind of tracking.

We go through paper

We're a University, and you'd expect that in this modern era of ipads replacing textbooks and suchlike that our paper costs would be going down. You'd be wrong. We go through a heck of a lot of paper in a quarter. Spring quarter earlier this year generated 1,899,865 pages of printing, which is actually a bit up from what we did last year. Ouch.

For a nice visual clue to what we go through in a day, here is Monday of this week:
Pages-per-hour for September 27, 2010
41,375 pages is the total for Monday. Monday is also our heaviest printing day. That spike you see between 11am and Noon is regular. We've had the 11am printing peak for years. There is a smaller spike between 1 and 2pm. This time of quarter we don't have any printing going on at 5am, though the closer we get to Finals Week the more dark-o-night printing goes on.


Times change, alas

| 2 Comments
Right now we're giving serious consideration to using folder mount-points in Windows in order to solve a specific storage problem. The one thing that make me go, "oh, please, no," is the fact that the disk-space monitoring script I've been using for years, the one that also monitors NetWare, Windows and ESX, can't handle folder-mounts. Why? Because the Windows SNMP agent doesn't give any information about folder-mounts, just drive-letter mounts.

SNMP was very nice since I didn't have to use Windows to get the information I needed. However, Microsoft hasn't been really paying attention to SNMP in recent versions so I am not at all surprised to learn that this hasn't been put in place. Or if it is, they're using a MIB I don't know about.

I suspect I'll have to carve my script up in twain, into Windows and non-Windows variants. That way I can continue to keep data in this particular database (with data that goes back to 2004!).

But still, the core engineering of this guy was done back in 2001, with efforts later on to shim in  Windows and ESX support. I looked into Linux a couple years ago and determined that I could add support for that pretty simply, but never did as we didn't have a call for it yet. 9 years is a long life for a script like this. I suppose it's time.

Or maybe we can not use folder-mounts.

The costs of backup upgrades

| 1 Comment
Our tape library is showing its years, and it's time to start moving the mountain required to get it replaced with something. So this afternoon I spent some quality time with google, a spread-sheet, and some oldish quotes from HP. The question I was trying to answer is what's the optimal mix of backup to tape and backup to disk using HP Data Protector. The results were astounding.

Data Protector licenses backup-to-disk capacity by the amount of space consumed in the B2D directories. You have 15TB parked in your backup-to-disk archives, you pay for 15TB of space.

Data Protector has a few licenses for tape libraries. They have costs for each tape drive over 2, another license for libraries with between 61-250 slots, and another license for unlimited slots. There is no license for fibre-attached libraries like BackupExec and others do.

Data Protector does not license per backed up host, which is theoretically a cost savings.

When all is said and done, DP costs about $1.50 per GB in your backup to disk directories. In our case the price is a bit different since we've sunk some of those costs already, but they're pretty close to a buck fiddy per GB for Data Protector licensing alone. I haven't even gotten to physical storage costs yet, this is just licensing.

Going with an HP tape library (easy for me to spec, which is why I put it into the estimates), we can get an LTO4-based tape library that should meet our storage growth needs for the next 5 years. After adding in the needed DP licenses, the total cost per GB (uncompressed, mind) is on the order of $0.10 per GB. Holy buckets!

Calming down some, taking our current backup volume and apportioning the price of largest tape library I estimated over that backup volume and the price rises to $1.01/GB. Which means that as we grow our storage, the price-per-GB drops as less of the infrastructure is being apportioned to each GB. That's a rather shocking difference in price.

Clearly, HP really really wants you to use their de-duplication features for backup-to-disk. Unfortunately for HP, their de-duplication technology has some serious deficiencies when presented with our environment so we can't use it for our largest backup targets.

But to answer the question I started out with, what kind of mix should we have, the answer is pretty clear. As little backup-to-disk space as we can get away with. The stuff has some real benefits, as it allows us to stage backups to disk and then copy to tape during the day. But for long term storage, tape is by far the more cost-effective storage medium. By far.

Browser usage on tech-blogs

Ars Technica just posted their August browser update. They also included their own browser breakdown. ArsTechnica is a techie site, so it comes as no surprise what so ever that Firefox dominates at 45% of browser-share. This made me think about my own readership.

Browser share piechart for September 09
As you can see, Firefox makes up even more of the browser-share here (50.34%). Interestingly on the low end, Opera is actually the #3 browser (4.46%), not Safari (3.43%). Looking at the version breakdown for those IE users, only 17% of them are on IE6. Yay!

ArsTechnica's Safari numbers are not at all surprising, since they cover a fair amount of Apple news and I don't.

So yeah, Tech blogs and sites don't have a lot of IE traffic. Or, so I believe. What are your numbers?