March 2009 Archives

When perfection is the standard

The disaster recovery infrastructure is an area where perfection is the standard, and anything less than perfection is a fault that needs fixing. It shares this distinction with other things like Air Traffic Control and sports officiating. In any area where perfection is the standard, any failure of any kind brings wincing. There are ways to manage around faults, but there really shouldn't be faults in the first place.

In ATC there are constant cross-checks and procedures to ensure that true life-safety faults only happen after a series of faults. In sports officiating, the advent of 'instant replay' rules assist officials in seeing what actually happened from angles other than the ones they saw, all as a way to improve the results. In DR, any time a backup or replication process fails, it leaves an opening through which major data-loss can possibly occur. Each of these have their unavoidable, "Oh *****," moments. Which leads to frustration when it happens too often.

At my old job we had taken some paperwork steps towards documenting DR failures. We didn't have anything like a business-continuity process, but we did have tape backup. When backups failed, there was a form that needed to be filled out and filed, explaining why the fault happened and what can be done to help it not happen again. I filled out a lot of those forms.

Yeah, perfection is the standard for backups. We haven't come even remotely close to perfection for many, many months. Some of it is simple technology faults, like DataProtector and NetWare needing tweaking to talk to each other well or over-used tape drives giving up the ghost and requiring replacement. Some of it is people faults, like forgetting to change out the tapes on Friday so all the weekend fulls fail due to a lack of non-scratch media. Some of it is management process faults, like discovering the sole tape library fell off of support and no one noticed. Some of it is market-place faults, like discovering the sole tape library will be end-of-lifed by the vendor in 10 months. Some of these haven't happened yet, but they are areas that can fail.

If the stimulus fairy visits us, backup infrastructure is top of the list for spending.

Budget, conflicting info

Apparently President Shepard heard different things than I did in the news. I'm going to trust him more than us. In his words:
This morning, the Senate announced a budget that proposes a cut of $513,000,000 for public higher education. For our institution, that would be a reduction of 25% in our 2009-10 budget.
Which is much higher than the 14% I mentioned earlier today. He also confirmed the tuition cap is kept. He did not reveal if WWU is getting it's student FTE count cut. As with all such things, we still need to hear from the House, and the two bills need to get reconciled.

Still, 25% is a major blow.

Budget non-rumors

The Senate Democrats had a presentation today where they went over budget items. In one of the documents they released:
  • Held tuition increases at current-law rate.
  • Cut all of higher education by 14 percent.
  • WHO’S AFFECTED: 10,500 fewer students will attend college
14% is a heck of a lot better than the 25-30% rumored earlier. The student FTE reduction will hit the UofWA harder than it will us, but each lost student is its own reduction in state funds. Of course, the House version may be very different. What we don't yet know is if there is any WWU-specific legislation in this proposal. Plus, both bills have to be reconciled before it is signed by the Governor.

Still, it is nice to see a number significantly lower than the one we'd been expecting. Here is to hoping it stays low.

Update. The President speaks.
Ars Technica has an article up called, When every student has a laptop, why run computer labs?

It's a good question. But before I go into it, I should mention something. What I do for WWU doesn't have a lot to do with our labs. The biggest interaction I have with them is for printing and maybe some Zen or GPO policies. I also know some of the people who support them, and I sit in meetings where other people gripe about them. So I'm speaking as someone who works around people who deals with them, not as someone who deals with them or has any decision making power.

Why run computer labs?

In the beginning it was to provide computers to students who didn't have one.
Then, it was to provide on-campus computers to students who didn't have a laptop.

Now that almost every student has a computer, and most of those laptops, it makes a less sense. Centralized printers where they can print off assignments from their own hardware? Yes. 60 seat general computing labs? Um.

The point is made in the Ars Technica article that specialized software that students generally wouldn't have, such as SPSS or the full Adobe Acrobat suite, are a good reason to have them. This is true. We have not only the general computing labs run by ATUS, but we also have special purpose labs run by ATUS and the various colleges. We now have a lab that has a large format printer, something I guarantee no student has in their dorm or apartment, and a flat-bed scanner. One non-ATUS lab has VMWare Workstation installed on all the workstations. Some of the general computing labs are actual classrooms some of the time.

In our specific case, we have one software package in universal use that greatly encourages the existence of the general computing lab.

The Novell Client.

In order to get drive-map access to the NetWare cluster, you need that. This is not a package you want to inflict on a home machine without the victim knowing what they're in for. So we need to provide computers with the client installed so students can get at their files simply. WebDav through NetStorage goes some of the way, but it can be tricky to set up.

If we were a pure Windows network, it wouldn't be so bad. Both OSX and all the major Linuxes come with Samba pre-installed, which eases access to Windows networks. Printing isn't quite as convenient, but at least you can get at your files easy enough once you're inside the firewall.

In the end, except for our NCP dependencies, we could possibly close some of our GC labs to save money. However, we do track lab utilization, and those numbers may tell a different story. I know some students don't bother hauling their laptop to campus so long as they can use a lab machine for a quick social-networking fix. If we start closing labs those students will start hauling their gear to campus and we can save money. I still think we need to provide general access printers at various spots, which is something that Novell iPrint is rather good for. We also need to provide access to the special software packages that are needed for teaching, things like SPSS and MatLab.

The role of the computer lab has changed now that all but a few students have laptops. We still need them for specialized teaching functions, but general access to computing is no longer a primary function. The convenience factor of simple internet access drives some usage, and it may even be a majority. But the labs aren't going away any time soon. Their printers, even less so.

Budget rumors

The President sent out an email updating us on the budget problems. He expects the Legislature to announce the nearly final budget by the end of the week. If I'm remembering my legislative process right, both houses have to reconcile in time to send the bill to the Governor around the 22nd of April. In the words of the email,
We have no inside information. And, for months, each day has brought a different rumor. But, the conversations the last few days that we have all been having -- faculty, staff, unions, university leadership -- are coalescing.
He made a speech recently where he talked about what he has been hearing. Those remarks are posted, and on page 6 is the good stuff. The current 'possibly real' rumor is a state-funds cut of 25-30%, which contrasts with the 20% cut (IIRC) that was in the Governor's proposed budget back in December. It also seems that the Legislature is not allowing us to exceed the 7% maximum tuition hike. What this means for us remains to be seen, and we still have to wait until official word.

Budgetary efficiency

| 1 Comment
As I've mentioned, there is a major budget shortfall coming real soon. In the past two weeks various entities have gone before the Board of Trustees discussing the effects of budget cuts on their units. The documents submitted can be found here. The Vice Provost of Information and Telecommunication, my grand-boss, also had a presentation. Which you can view here. The especially nosy can even listen to the presentation here, the 12:45 file and starts about a minute-plus in.

There were some interesting bits in the presentation:
"In 2007, our central IT staff (excluding SciTech & Secretarial) totaled 73 persons. The average for our peer group (with greater than 10,000 student FTE) was 81 people. While a difference of 8 FTE may not seem great, it has a significant impact on our ability to support our users. This is compounded if we consider that student FTE grew a cumulative 16% in the past decade; faculty and staff FTE grew at a cumulative 14% while ITS staff declined 3% "
So, our supported environment grew, and we lost people. Right. Moving on...
"Similarly the budget numbers reveal the same trend. Western's 2007 operating budget for ITS was 6.65 million. The average for our peer institutions (with greater than 10,000 student FTE) was 8.17 million. Total budgets including recharge and student technology fees were 7.8 million for Western and 10.3 million for our peer group."
And we're under-resourced compared to our peer institutions. Right.

This can be spun a couple of ways. The spin being given right now, when we're being faced with a major budget cut, is that we're already running a very efficient operation, and cutting now would seriously affect provided services. A couple years ago when times were more flush, the spin was that we're under resourced compared to our peer institutions, and this is harming service robustness.

Both, as it happens, are true. We are running a very lean organization that gets a lot done for the dollars being spent on it. At the same time, the very same shoe-string attitude has overlooked certain business continuity concerns that worry those of us who will have to rebuild everything in the case of a major disaster. Like the facilities budget, we also run a 'deferred maintenance' list of things we'd like to fix now but aren't critical enough to warrant emergency spending. Since every dollar is a dear dollar, major purchases such as disk arrays or tape libraries have to last a long, long time. We still have some HP ML530 servers in service, and that is a 9 year old server (old enough that HP lists Banyan Vines drivers for the server).

This is continually vexing to vendors who cold-call me. Even in more flush times, anything that costs more than $3000 required pushing to get, and anything that cost over $20,000 was pretty much out of the question. Storage arrays that even on academic discount cost north of $80,000 require exceptional financing and can take several years to get approved. In budget constrained times such as these, anything that costs over $1000 has to go before a budget review process.

It is continually aggravating to work in an organization as under resourced as we are. Our disaster recovery infrastructure is questionable, and business-continuity is a luxury we just plain can't afford. Two years ago there was a push for some business continuity, but it ran smack into the shoe-string. The MSA1500 that I've railed about so long was purchased as a BC device, but it is fundamentally unsuited to the task. Getting data onto it was a mish-mash of open source and hand-coded rigging. We've since abandoned that approach as it looks like 2012 may be the earliest we can afford to think about it again.

As a co-worker once ranted, "They expect enterprise level service for a small-business budget."

You'd think this would be the gold plated opening for open source software. It hasn't been. Our problem isn't so much software as it is hardware. If we can GET the hardware for business continuity, it'll probably be open source software that actually handles the data replication. Replacing Blackboard with Moodle will require new hardware, since we will have to dual-stack for two years in order to handle grade challenges for classes taught on Blackboard. Moodle would also require an additional FTE due to the amount of customizations required to make it as Blackboard-like as possible. And these are only two examples.

It was very encouraging to see that the top level of our organization (that Vice Provost) is very aware of the problem.

Paper paper paper

| 1 Comment
It is spring break, so I can now run nice statistics on our pcounter printers! Not all of our labs are on pcounter, so these numbers are not the total printing at WWU. However, if plans to recover printing costs by charging for quota increases go through, I expect I should have the entire campus on pcounter within 3 quarters of go-live. Students WILL find the 'free' labs. And the lab admins will notice that their printing costs go way up. And then they'll call me.

Winter quarter
Pages printed: 1,781,272
Total number of students who printed at least one page: 12,482
Average pages printed by a student: 142.6
Median pages printed: 106
The busiest printer printed 133,791 pages
Res-halls printed 255,344 pages

That's a lot of paper.

Storage that makes you think

| 2 Comments
Anandtech has a nice article up right now that compares SAS, SATA, and SSD drives in a database environment. Go read it. I'll wait.

While the bulk of the article is about how much the SSD drives blow the pants off of rotational magnetic media, the charts show how SAS performs versus SATA. As they said at the end of the article:
Our testing also shows that choosing the "cheaper but more SATA spindles" strategy only makes sense for applications that perform mostly sequential accesses. Once random access comes into play, you need two to three times more SATA drives - and there are limits to how far you can improve performance by adding spindles.
Which matches my experience. SATA is great for sequential loads, but is bottom of the pack when it comes to random I/O. In a real world example, take this MSA1500CS we have. It has SATA drives in it

If you have a single disk group with 14 1TB drives in it, this gives a theoretical maximum capacity of 12.7TB (that storage industry TB vs OS TB problem again). Since you can only have LUNs as large as 2TB due to the 32-bit signed integer problem, this would mean this disk group would have to be carved into 7 LUNs. So how do you go about getting maximum performance from this set up?

You'll have to configure a logical volume on your server such that each LUN appends to the logical volume in order, and then make sure your I/O writes (or reads) sequentially across the logical volume. Since all 7 LUNs are on the same physical disks, any out-of-order arrangement of LUN on that spanned logical volume would result in semi-random I/O and throughput would drop. Striping the logical volume just ensures that every other access requires a significant drive-arm move, and would seriously drop throughput. It is for this reason that HP doesn't recommend using SATA drives in 'online' applications.

Another thing in the article that piqued my interest is there on page 11. This is where they did a test of various data/log volume combinations between SAS and SSD. The conclusion they draw is interesting, but I want to talk about it:
Transactional logs are written in a sequential and synchronous manner. Since SAS disks are capable of delivering very respectable sequential data rates, it is not surprising that replacing the SAS "log disks" with SSDs does not boost performance at all.
This is true, to a point. If you have only one transaction log, this is very true. If you put multiple transaction logs on the same disk, though, SSD becomes the much better choice. They did not try this configuration. I would have liked to have seen a test like this one:
  • Three Data volumes running on SAS drives
  • One Log volume running on an SSD with all three database logs on it
I'm willing to bet that the performance of the above would match, if not exceed, running three separate log volumes running on SAS.

The most transactional database in my area is probably Exchange. If we were able to move the Logs to SSD's, we very possibly could improve performance of those databases significantly. I can't prove it, but I suspect we may have some performance issues in that database.

And finally, it does raise the question of file-system journals. If I were to go out and buy a high quality 16GB SSD for my work-rig, I could use that as an external journal for my SATA-based filesystems. As it is an SSD, running multiple journals on it should be no biggie. Plus, offloading the journal-writes should make the I/O on the SATA drives just a bit more sequential and should improve speeds. But would it even be perceptible? I just don't know.

Evolution and Exchange 2007

The question came up today, so I googled around. Turns out the MAPI plugin for Evolution is out there and can be installed on most OpenSUSE builds.

http://download.opensuse.org/repositories/GNOME://Evolution://mapi/

However, it required Samba 4. Presumably for the RPC interface. So if you're not willing to upgrade your Samba, then you can't use it. Still, nice to see it out there!

Stimulus fairy wish list

There is a list on my whiteboard. It is the wish list of infrastructure projects I'd really like to see if the stimulus fairy decides to pay WWU a visit. We're higher-ed. And the stimulus bill includes funding for higher-ed. It could happen! Heck, I have two coworkers who are in the Seattle area right now listening to a demo just in case aforementioned fairy does drop by.

Anyway, the wish-list (or Santa! Give me hardware!)
  • 2 more enclosures for the existing EVA6100
  • A new EVA4400 with all eight enclosures
  • HP EVA replication software, so we can mirror the 6100 to the new 4400.
  • Data Protector licensing for everything we need
  • A new tape library, the one we have is creaky
  • New 64-bit servers for the main file-serving cluster
These would make me a happy, happy geek. Said coworkers are working on something else above and beyond these. But I'm not saying what that is. If the fairy does drop by, I will.

However, and there is always one, there is a problem with air-lifting big wads of cash into an IT environment and then spending it all. When it comes time to replace the existing EVA6100, we will have to pay for something equivalent. Since it would have had the EVA replication software on it, it is now about twice as expensive as a simple hardware replacement would suggest. The replication software would quickly become line-of-business with significant future expenses deriving from its purchase.

Maintenance has to be factored in to anything we spend fairy-money on. There is a certain amount of money we can spend on catching up our IT deferred maintenance backlog, like that creaky tape library I just mentioned, but that won't come close to the fairy-money numbers being bandied about. As hard as leaving big piles of cash laying by the side of the road there on the road-side is, there is some money that it is safer not to touch. Such as anything with a yearly maintenance fee. Or version upgrade fees for the upgrade we'll need to do in 3-6 years.

Those organizations who live on grant-money know this very well. However, here at ITS at WWU, other than Student Tech Fee funds we don't live on grant-money. The stimulus-fairy counts as grant money that could leave behind a future liability that STF can't come close to being able to cover.

We'll see what happens.

Death of cc.wwu.edu

| 1 Comment
Part of the process of moving the students over to Exchange Labs is decommissioning the cc.wwu.edu domain for email. Students have been there for a loooong time, and once upon a time faculty/staff mail was there as well. We've since moved to wwu.edu for our fac/staff domain.

Next week we're turning off cc.wwu.edu for fac/staff. The students still over there will be moved slowly over to the hosted solution. The Fac/Staff users will be moved to Exchange, period.

This has created some heated feelings as there are professors who've published books and have "@cc.wwu.edu" printed in the books. I'm not sure how we're handling that, but... that's not my email system. Email addresses do tend to get stale after a while, and that's just a fact of the internet.

However, one of the guys in the office here was one of the very first people to get an email address at cc.wwu.edu way back in the dark and misty reaches of a more trusting internet. I don't know how long he had that address, but it very well could have been over 20 years. He's letting it go with a tear in his eye, but not a big one. He's one of the unlucky schmucks with his first name as his username, and it's in every. single. solitary. mail-list known to God and man. His @cc.wwu.edu account has been nothing but a spam trap for years now.

GroupWise survey

| 1 Comment
Novell just posted on Cool Solutions a GroupWise survey they're running. It sounds like they're looking for futures for the product. As GW8 is out, I'm guessing Novell is looking to refine their GroupWise road map. So, if you're a GW user/admin, go forth and take it!

I didn't do the survey because, erm, I'm no longer a GroupWise admin. But that's no reason not to share!

But what about GroupWise

| 7 Comments
Today I picked up my dead tree version of NetworkWorld, and saw an item on the cover:

Looking to exchange Exchange?
Joel Snyder tested six alternatives to Microsoft's Exchange 2007. OUr findings: the Exchange alternatives are adequate for midsized networks, but Exchange offers the most comprehensive set of features and management hooks for networks of all sizes. Page 22
(online version)

GroupWise was NOT in this test. This surprised me greatly, as the Big Three mailers have always been Exchange, Notes, and GroupWise. Notes was also left out of this test. The online version already has a few comments regarding GroupWise, and Joel Snyder replied with this:
By Joel Snyder on Tue, 03/10/2009 - 10:09am.

Sorry, Groupwise fans, but Novell just didn't show up on our radar in the mid-size email business.

When you're looking at this space, Microsoft and Lotus together own 96% of the on-site mail service in businesses (the numbers that IDC, Ferris, and Radicati offer all vary a lot, but no one seems to give the non-MS/non-Lotus camp more than 10% total for everyone). Slicing up the remaining piece is a pretty difficult task, with lots of little players. While Groupwise used to be a major mover-and-shaker, there is no obvious "#3" in this business anymore.

It's clear that we've got a pile of Groupwise fans (why am I remembering the Windows vs. Netware war of about 10 years ago???) here, so maybe we should take a quick look at Groupwise and see how it stacks up.

Considering Novell has spent quite a lot of effort trying to convince people that they're the number three behind the MS/IBM duopoly, this is somewhat concerning. I have no idea what the real market-share numbers are for the mid-size enterprise groupware market.

Anatomy of an adware install

A bit of analysis I had to do in the past couple days. I'm sharing because I don't do this all that often. I'm pretty handy with wireshark, so I got asked to interpret a capture of an infection process.

The sequence of events, as near as I can figure:
  • User runs the bad file
  • postcard.exe checks http://whatismyip.com/autmation/n09230945.asp to get the local IP address
  • File throws them at Hallmark.com displaying the ecard. Awww.
  • Hallmark throws the user some advertising from a bunch of places.
  • 5 minutes pass where nothing happens
  • Postcard.exe does an HTTP POST to 85.12.43.102 (Netherlands) with encrypted data
  • 85.12.43.102 replies with a bunch of encrypted data. Presumably, this is the command file.
  • Postcard.exe opens three connections to 82.98.235.205 (Belgium), getting a trio of windows files of some kind. I think they're DLL files that compliment postcard.exe. That or the chopped up pieces of javawm.exe.
  • Postcard.exe does an HTTP POST to 85.17.169.56 (Netherlands) with a bunch of HTTP headers populated with crypted data.
  • 85.17.169.56 replies with an HTTP 200/OK, a bunch of HTTP headers that contain redirection servers, stats servers, and other information useful for adware, as well as a 143KB file of some kind.
  • Infected computer connects to 83.149.75.33 (Netherlands) and does an HTTP GET with a series of parameters. This is probably a status message of some kind. Remote side returns 404-not-found.
  • 5 minutes pass where nothing happens on the network, but the local machine falls deeper into the clutches of the adware czars.
  • Someone launches IE, and it goes to http://runonce.msn.com/, the default XP home page. Probably just to see what happens.
  • HTTP connection to Key Bank, redirected to https://www.key.com/, where I can't see squat. SSL doing its job.
  • Parallel to the KeyBank connection, an SSL connection to 216.236.233.68, an iP hosted in Denmark. This resolves to "key.tcliveus.com", which is very probably legitimate traffic directed by www.key.com.
  • Connection to 83.149.115.156 (Netherlands), almost definitely the adware. Phoning in that IE went to http://runonce.msn.com/. The reply directs the client to connect to 82.98.235.58. Meanwhile, keybank session continues.
  • SSL Connection to 66.235.132.62, a host in the 2o7.net advertising network. Very probably legitimate from Key Bank.
  • HTTP connection to 82.98.235.58 (Netherlands), as directed. Supplies URL given to it by 83.149.115.156. Server returns the URL http://privacyscanner15.com/sysgd09_2/3/10232 (don't go there). Meanwhile, Keybank session continues.
  • HTTP connection to 209.249.222.48, which is privacyscanner15.com, with the supplied URL.
  • Key Bank session finishes cleanly.
  • HTTP connections to privacyscanner15.com, clearly rendering the page, pulling graphics and the evil javascripts.
  • Key Back session resumes. SSLed, so I have no idea what's going on.
  • HTTP connection to 83.149.75.33, but I can't tell what it does because…
  • End of capture.
Ripping into the javascript with a very, very handy Firefox plugin called "JavaScript Deobfuscator", I hit the page from my Linux machine to see what those scripts did. If you click "yes", it forces the download of an executable file that contains a Trojan. I haven’t unpacked it to see what it does.

This is pretty clearly the trace of an adware installer. However, the adware points the user to a site where they'll get further infected first thing. Depending on how gullible the user is, they may or may not fall for it.

All the Netherlands addresses come from the same netblock owner, a place called “LeaseWeb”.

Budget crunch, the first laws

| 1 Comment
The first law relating to the Washington State budget crunch was signed last week. For a synopsis provided by the Professional Services Organization here at WWU (the unit I belong to, what with me being a salaried, exempt employee) can be found here.

There are a number of things in there:
  • Pay freeze for non-classified positions (non-union) in the 2007-2011 time-frame. All things considered, I wasn't expecting to get one until 2012 anyway. At least it isn't a pay cut!
  • Formalizing the hiring freeze with a few exceptions. Academic Affairs, the group Tech Services belongs to, is part of the exemption. So if my office mate dies in a horrible car accident, we can (theoretically) replace him before 2012. However, HR positions are NOT covered.
  • A ban on equipment purchases exceeding $5000. There is an exception process requiring proof of emergency. This will be interesting to see work out, as a bunch of our stuff turns 5 years old next May, and that'll mean greatly increased support-contract costs. We'll be able to make the case that replacing the old crap with new stuff will save money in the 1-3 year range, but the costs will be over the cut-off. That'll be a fun fight.
  • Formalizing the out-of-state travel ban.
More when it comes out.