Thursday, February 11, 2010

Spending money

Today we spent more money in one day than I've ever seen done here. Why? Well substantiated rumor had it that the Governor had a spending freeze directive on her desk. Unlike last year's freeze, this one would be the sort passed down during the 2001-02 recession; nothing gets spent without OFM approval. Veterans of that era noted that such approval took a really long time, and only sometimes came. Office scuttle-butt was mum on whether or not consumable purchases like backup tapes would be covered.

We cut purchase orders today and rushed them through Purchasing. A Purchasing who was immensely snowed under, as can be well expected. I think final signatures get signed tomorrow.

What are we getting? Three big things:
  1. A new LTO4 tape library. I try not to gush lovingly at the thought, but keep in mind I've been dealing with SDLT320 and old tapes. I'm trying not to let all that space go to my head. 2 drives, 40-50 slots, fibre attached. Made of love. No gushing, no gushing...
  2. Fast, cheap storage. Our EVA6100 is just too expensive to keep feeding. So we're getting 8TB of 15K fast storage. We needs it, precious.
  3. Really cheap storage. Since the storage area networking options all came in above our stated price-point, we're ending up with direct-attached. Depending on how we slice it, between 30-35TB of it. Probably software ISCSI and all the faults inherent in the setup. We still need to dicker over software.
But... that's all we're getting for the next 15 months at least. Now when vendors cold call me I can say quite truthfully, "No money, talk to me in July 2011."

The last thing we have is an email archiving system. We already know what we want, but we're waiting on determination of whether or not we can spend that already ear-marked money.

Unfortunately, I'll be finding out a week from Monday. I'll be out of the office all next week. Bad timing for it, but can't be avoided.

Labels: , , ,


Wednesday, February 03, 2010

Free information

Charles Stross had a nice piece this morning about that long time hacker slogan, "Information wants to be free". It's a good read, so I'll wait while you go read it. It focuses on the different definitions of free. One means, "no cost," like those real-estate fliers you see at the grocery store. The other means, "free to move," like Amazon MP3 Store mp3 files. Different, see.

Part of his point is that it is one thing to enable information to be free, and quite another to create free information. Information creation is the ultimate validation of this credo. In his case, he can work with his publishers to release novels in a non-DRMed format; something he has done once and will do again soon.

But he closes with a question:
What have you created and released lately?
That's a very good question. The quick answer to that is this blog. My experiences wrestling with technology have proven useful to others. The search key-words that drive people here have evolved over time, but give a nice snapshot for what issues people are having and are looking for answers about. For a long time that was news about the Novell client for Vista. Right this moment the top trending keywords all include two of the following terms 'cifs', 'Windows 7', 'Netware', and 'OES', strongly suggests people looking for how to connect Vista/Win7 to NetWare/OES. Comments I've received have also proven that what I've posted here has been useful to others.

But what about beyond that? I've written a couple of AppNotes for Novell over the years covering topics that the NetWare-using community didn't have adequate coverage over. Novell has always had a stake in 'community', which fosters this sort of information sharing.

I've also been active on ServerFault, a sort of peer-support community for system administrators. I don't get as good data about what my contributions there are being used for, but I do still get comments on accepted answers months after their original posting. I'm in the top 25 for reputation there, so that's something.

It doesn't look like a lot, but it is free information out there. In both senses of the word.

Labels:


Tuesday, February 02, 2010

Budget plans

Washington State has a $2.6 Billion deficit for this year. In fact, the finance people point out that if something isn't done the WA treasury will run dry some time in September and we'll have to rely on short-term loans. As this is not good, the Legislature is attempting to come up with some way to fill the hole.

As far as WWU is concerned, we know we'll be passed some kind of cut. We don't know the size, nor do we know what other strings may be attached to the money we do get. So we're planning for various sizes of cuts.

One thing that is definitely getting bandied about is the idea of 'sweeping' unused funds at end-of-year in order to reduce the deficits. As anyone who has ever worked in a department subject to a budget knows, the idea of having your money taken away from you for being good with your money runs counter to every bureaucratic instinct. I have yet to meet the IT department that considers themselves fully funded. My old job did that; our Fiscal year ended 12/15, which meant that we bought a lot of stuff in October and November with the funds we'd otherwise have to give back (a.k.a. "Christmas in October"). Since WWU's fiscal year starts 7/1, this means that April and May will become 'use it or lose it' time.

Sweeping funds is a great way to reduce fiscal efficiency.

In the end, what this means is that the money tree is actually producing at the moment. We have a couple of crying needs that may actually get addressed this year. It's enough to completely fix our backup environment, OR do some other things. We still have to dicker over what exactly we'll fix. The backup environment needs to be made better at least somewhat, that much I know. We have a raft of servers that fall off of cheap maintenance in May (i.e. they turn 5). We have a need for storage that costs under $5/GB but is still fast enough for 'online' storage (i.e. not SATA). As always, the needs are many, and the resources few.

At least we HAVE resources at the moment. It's a bad sign when you have to commiserate with your end-users over not being able to do cool stuff, or tell researchers they can't do that particular research since we have no where to store their data. Baaaaaad. We haven't quite gotten there yet, but we can see it from where we are.

Labels: , , ,


Thursday, January 28, 2010

Evolving best-practice

As of this morning, everyone's home-directory is now on the Microsoft cluster. The next Herculean task is to sort out the shared volume. And this, this is the point where past-practice runs smack into both best-practice, and common-practice.

You see, since we've been a NetWare shop since, uh, I don't know when, we have certain habits ingrained into our thinking. I've already commented on some of it, but that thinking will haunt us for some time to come.

The first item I've touched on already, and that's how you set permissions at the top of a share/volume. In the Land of NetWare, practically no one has any rights to the very top level of the volume. This runs contrary to both Microsoft and Posix/Unix ways of doing it, since both environments require a user to have at least read rights to that top level for anything to work at all. NetWare got around this problem by creating traverse rights based on rights granted lower down the directory structure. Therefore, giving a right 4 directories deep gave an inplicit 'read' to the top of the volume. Microsoft and Posix both don't do this weirdo 'implicit' thing.

The second item is the fact that Microsoft Windows allows you to declare a share pretty much anywhere, and NetWare was limited to the 'share' being the volume. This changed a bit when Novell introduced CIFS to NetWare, as they introduced the ability to declare a share anywhere; however, NCP networking still required root-of-volume only. At the same time, Novell also allowed the 'map root' to pretend there is a share anywhere but it isn't conceptually the same. The side-effect of being able to declare a share anywhere is that if you're not careful, Windows networks have share-proliferation to a very great extent.

In our case, past-practice has been to restrict who gets access to top-level directories, greatly limit who can create top-level directories, and generally grow more permissive/specific rights-wise the deeper you get in a directory tree. Top level is zilch, first tier of directories is probably read-only, second tier is read/write. Also, we have one (1) shared volume upon which everyone resides for ease of sharing.

Now, common-practice among Microsoft networks is something I'm not that familiar with. What I do know is that shares proliferate, and many, perhaps most, networks have the shares as the logical equivalent of what we use top-level directories for. Where we may have a structure like this, \\cluster-facshare\facshare\HumRes, Microsoft networks tend to develop structures like \\cluster-facshare\humres instead. Microsoft networks rely a lot on browsing to find resources. It is common for people to browse to \\cluster-facshare\ and look at the list of shares to get what they want. We don't do that.

One thing that really gets in the way of this model is Apple OSX. You see, the Samba version on OSX machines can't browse cluster-shares. If we had 'real' servers instead of virtual servers this sort of browse-to-the-resource trick would work. But since we have a non-trivial amount of Macs all over the place, we have to pay attention to the fact that all a Mac sees when they browse to \\cluster-facshare\ is a whole lot of nothing. We're already running into this, and we only have our user-directories migrated so far. We have to train our Mac users to enter the share as well. For this reason, we really need to stick to the top-level-directory model as much as possible, instead of the more commonly encountered MS-model of shares. Maybe a future Mac-Samba version will fix this. But 10.6 hasn't fixed it, so we're stuck for another year or two. Or maybe until Apple shoves Samba 4 into OSX.

Since we're on a fundamentally new architecture, and can't use common-practice, our sense of best-practice is still evolving. We come up with ideas. We're trying them out. Time will tell just how far up our heads are up our butts, since we can't tell from here just yet. So far we're making extensive use of advanced NTFS permissions (those permissions beyond just read, modify, full-control) in order to do what we need to do. Since this is a deviation from how the Windows industry does things, it is pretty easy for someone who is not completely familiar with how we do things to mess things up out of ignorance. We're doing it this way due to past-practice and all those Macs.

In 10 years I'm pretty sure we'll look a lot more like a classic Windows network than we do now. 10 years is long enough for even end-users to change how they think, and is long enough for industry-practice to erode our sense of specialness more into a compliant shape.

In the mean time, as the phone ringing off the hook today foretold, there is a LOT of learning, decision-making, and mind-changing to go through.

Labels: , ,


Monday, January 25, 2010

Storage tiers

Events have pushed us to give a serious look at cheaper storage solutions. What's got our attention most recently is HP's new LeftHand products. That's some nice looking kit, there. But there was an exchange there that really demonstrated how the storage market has changed in the last two years:

HP: What kind disk are you thinking of?
US: Oh, probably mid tier. 10K SAS would be good enough.
HP: Well, SAS only comes in 15K, and the next option down is 7.2K SATA. And really, the entire storage market is moving to SAS.

Note the lack of Fibre Channel drives. Those it seems are being depreciated. Two years ago the storage tier looked like this:
  1. SATA
  2. SAS/SCSI
  3. FC
Now the top end has been replaced.
  1. SATA
  2. SAS
  3. SSD
We don't have anything that requires SSD-levels of performance. Our VMWare stack could run quite happily on sufficient SAS drives.

Back in 2003 when we bought that EVA3000 for the new 6 node NetWare cluster, clustering required shared storage. In 2003, shared storage meant one of two things:
  1. SCSI and SCSI disks, if using 2 nodes.
  2. Fibre Channel and FC Disks if using more than 2 nodes.
With 6 nodes in the cluster, Fibre Channel was our only choice. So that's what we have. Here we are 6+ years later, and our I/O loads are very much mid-tier. We don't need HPC-level I/O ops. CPU on our EVA controllers rarely goes above 20%. Our I/O is significantly randomized, so SATA is no good. But we need a lot of it, so SSDs become prohibitive. Therefore SAS is what we should be using if we buy new.

Now if only we had some LTO drives to back it all up.

Labels: ,


Monday, January 04, 2010

On living as Root

Yesterday on Slashdot, one of the Ask Slashdot questions was: " In your experience, do IT administrators abuse their supervisory powers?"

That's a good question. BOFH humor aside, it has been my experience that the large majority of us don't do so intentionally. Most of what happens is the petty stuff that even regular helpdesk staff do, like take home enterprise license keys. We shouldn't do that, and licensing technology is improving to the point where such pilferage is becoming a lot easier to detect; at some point Microsoft will blacklist some large org's enterprise key for having been pirated and woe unto the IT department that lets that happen.

But what about IT administrators?

First, IT Administrators come in many types. But I'll focus on my own experiences living with enhanced privs. As it happens, I've spent the large majority of my IT career with a user account with better than average privs.

File Access

I can see everything! One of the harder things to keep in mind is what files I can see as me, and what files I can see as my role as sysadmin. This can be hard, especially when I'm rooting about for curiosity. We still add my user to groups even though I can see everything, and I consciously limit myself to only those directories those groups have access to when privately data-mining. You want this. This is one of the top hardest things for a new sysadmin to get used to.

With my rights it is very easy for me to pry into HIPPA protected documents, confidential HR documents, labor-relations negotiations documents, and all sorts of data. I don't go there unless directed as part of the normal execution of my duties. Such as setting access controls, troubleshooting inaccessible files, and restoring data.

I haven't met any sysadmins who routinely spelunk into areas they're not supposed to. They are out there, sadly enough, but it isn't a majority by any stretch.

Email Access

I read your email, but only as part of my duties. Back when we were deploying the Symantec Brightmail anti-spam appliances I read a lot of mail tagged as 'suspect'. I mean, a lot of it. It took a while to tune the settings. Even just subject-lines can be damning. For instance, the regular mails from Victoria Secret were getting flagged as 'suspect' so anyone who ordered from them and used their work account as the email account was visible to me. A BOFH would look for the male names, print out the emails, and post them on the office bulletin board for general mockery. Me? I successfully forgot who got what.

One gray area is the 'plain view' problem. If I'm asked to set or troubleshoot Outlook delegates on a specific mailbox, I have to open their mailbox. During that time certain emails are in plain view as I navigate to the menu options I need to go to in order to deal with delegates. Some of those emails can be embarrassing, or downright damning. So far I don't officially notice those mails. Very happily, I've yet to run into anything outright illegal.

Another area that has me looking for specific emails is phishing. If we identify a phishing campaign, the Exchange logs are very good at identifying people who respond to it. I then take than list and look for specific emails in specific mailboxes to see what exactly the response was. While this also has the plain-view problem described above, it does allow us to identify people who gave legitimate password info, and those replying with derision and scorn (a blessed majority). Those that reply with legitimate login info get Noticed.

Internet Monitoring

This varies a LOT from organization to organization. WWU doesn't restrict internet access except in a few cases (no outbound SMTP, no outbound SMB), so we're not a good example. My old job was putting into place internet blockers and an explicit login before access to the internet was granted, which allowed very detailed logs to be kept on who went where. As it happened, IT was not the most privileged group; that honor was held by the Attorney's office.

While IT was restricted, I knew the firewall guys. They worked two cubes down. So if I needed to access something blocked, I could walk down the hall and talk to them. I'd have to provide justification, of course, but it'd generally get granted. The fact that I was one of the people involved with Information Security and helped them make the filters unavoidable helped in this.

But the Slashdot questioner does make a good point. Such IT sites do generally get let through the filters. I strongly suspect this is because the IT users are very close to the managers setting filtering policy so are able to make the convincing, "but these sites are very useful in support of my job," arguments. Sites such as serverfault and stackoverflow are very useful for solving problems without expensive vendor contracts. Sites supporting the function of non-IT departments are not so lucky.

Whether or not the grand high IT Admins get unrestricted access to the internet depends a LOT on the organization in question. My old place was good about that.

Firewall Exceptions

This is much more of a .EDU thing since we're vastly more likely to have a routable IPv4 address on our workstation than your non-educational employers. In smaller organizations where your server guys are the same guys who run the network, good-ole-boys comes into play exceptions are much more common. For larger orgs like ours that have server-admin and network-admin split out, it depends on how buddy-buddy the two are.

This is one area where privilege hath its perks.

As it happens, I do have the SSH port on my workstation accessible from the internet. The firewall guys let me have that exception because I also defend servers with that exception, and therefore I know what I'm doing. Also, it allows me into our network in case either VPN server is on the fritz. And considering that I manage one of the VPN servers, having a back-door in is useful.

Other areas

Until a couple weeks ago the MyWeb service this blog used to be served from was managed by me. Which meant I got to monitor the log files for obvious signs of abuse. Generally, if something didn't break the top 10 access files I officially didn't notice. If a specific file broke 25% of total traffic, I had to take notice. Sometimes those files were obviously fine files (home shot video pre-YouTube), others (MP3 archives, DIVX movies) were not so innocent.

One day the user in question was a fellow IT admin. This was also the first time I saw staff doing this, so the protocols were non-existent. What I did was print off the report in question, circle the #1 spot they occupied, and wrote a note that said, in brief:
If this had been a student, the Provost would have been notified and their accounts suspended. The next time I'll have to officially notice.
And then put it on their chair. It never happened again.

Another area is enterprise software licenses. I mentioned that at the top of this post, but as more and more software gets repackaged for point-n-click distribution fewer and fewer IT staff need to know these impossible to memorize numbers. Also helping this trend is the move towards online License Servers, where packages (I'm thinking SPSS right now) need to be told what the license server is before they'll function; you can't take something like that home with you.

Things like Vista or Windows 7 activation codes are another story, but Microsoft has better tracking than they did in the XP days. If you activate our code on your home network, Microsoft will notice. The point at which they'll take action is not known, but when it does all Vista or Windows 7 machines we have will start throwing the, "your activation code has been reported as stolen, please buy a new one," message.

The software industry as a whole is reconfiguring away from offline activation keys, so this 'perk' of IT work is already going by the way-side. Yes, taking them home has always been illegal in the absence of a specific agreement to allow that. And yet, many offices had an unofficial way of not noticing that IT staff were re-using their MS Office codes at home.

That got long. But then, there are a lot of facets to professional ethics and privilege in the IT Admin space.

Labels:


Monday, December 21, 2009

How I got into this in the first place

How did you get into sysadmin stuff?

The flip answer is, "A case of mono in 1998."

The full answer is that I intended to get into system administration right out of college. When I made the decision to not pursue graduate school, I chose to join the real world. There were several reasons for this, chief among them being that the field I was interested in involves lot of math and I was strictly a C student there. As for what I'd do in the real world, well... by this time I had learned something about myself, something taught ably by the CompSci program I was getting around to finishing.

Broken code makes me angry. Working on broken code all day makes me angry all around. Since programming involves working on code that is by definition always broken, it didn't seem like the right career for me.

Since this was early 1996 when I realized this, I made this decision at a time when I had friends who had skipped college all together to work in internet startups, get paid in stock options, and otherwise make a lot of money. I didn't see much of them except online (those wacky startup death-marches). That wasn't a gravy train I could get on and survive sane. So, no programming career for me. SysAdmin it was!

I paid for my own Certified Novell Administrator (NW4.10 IIRC) that September while I was working temp jobs. One of the temp jobs went permanent in January of 1997, and I was hired at the bottom rung: Help Desk.

This wasn't all bad, as it happened. Our helpdesk had all of 4 people on it at the time, one dispatcher who half-timed with Solaris and Prime admin work, and three technicians. We pretty much did it all. Two of 'em handled server side stuff (NetWare exclusively) when server stuff needed handling, and all three of us dealt with desktop stuff.

Then I got mono in the summer of 1998. I was out for a week. When I came back, my boss didn't believe I was up for the full desktop rotation and grounded me to my desk to update documentation. Specifically, update our Windows 95 installation guides. What was supposed to take a week took about 6 hours. Then I was bored.

And there was this NetWare 3.11 to NetWare 4.11 upgrade project that had been languishing un-loved due to lack of time from the three of us. And here I was desk-bound, and bored. So I dug into it. By Thursday I had a full migration procedure mapped out, from server side to things that needed doing on the desktop. We did the first migration that August, and it worked pretty much like I documented. The rest of the NW3.x to NW4.11 migrations went as easily.

From there it was a slam-dunk that I get into NetWare Sysadmin work. I got into Windows admin that December while I was attending my Windows NT Administration classes. On Monday of Week 2 (the advanced admin class if I remember right) I got a call from my boss telling me that the current NT administrator had given 2 weeks notice and announced he was going on 2 weeks of vacation, and I'd be the new NT guy when I got back from class.

In his defense, he was a Solaris guy from way back and was actively running Linux at home and other places. He had, "I don't do Windows," in his cube for a while before management tapped him to become the NT-guy. When I got his servers after he left I found the Cygwin stack, circa 1998, on all of them. He had his preferences. And he left to do Solaris admin Somewhere Else. He really didn't want to do Windows.

So within 8 months of getting a fortuitous case of mononucleosis, I was a bone-fide sysadmin for two operating systems. Sometimes life works that way.

Labels: ,


Thursday, December 10, 2009

Old hardware

Watching traffic on the opensuse-factory mailing list has brought home one of the maxims of Linuxdom that has been true for over a decade: People run Linux on some really old crap. And really, it makes sense. How much hardware do you really need for a router/firewall between your home network and the internet? Shoving packets is not a high-test application if you only have two interfaces. Death and fundamental hardware speed-limits are what kills these beasts off.

This is one feature that Linux shares with NetWare. Because NetWare gets run on some really old crap too, since it just works, and you don't need a lot of hardware for a file-server for only 500 people. Once you get over a 1000 or very large data-sets the problem gets more interesting, but for general office-style documents... you don't need much. This is/was one of the attractions for NetWare, you need not much hardware and it runs for years.

On the factory mailing list people have been lamenting recent changes in the kernel and entire environment that has been somewhat deleterious for really old crap boxes. The debate goes back and forth, but at the end of the day the fact remains that a lot of people throw Linux on hardware they'd otherwise dispose of for being too old. And until recently, it has just worked.

However, the diaspora of hardware over the last 15 years has caught up to Linux. Supporting everything sold in the last 15 years requires a HELL of a lot of drivers. And not only that, but really old drivers need to be revised to keep up with changes in the kernel, and that requires active maintainers with that ancient hardware around for testing. These requirements mean that more and more of these really old, or moderately old but niche, drivers are drifting into abandonware-land. Linux as an ecosystem just can't keep up anymore. The Linux community decries Windows for its obsession with 'backwards compatibility' and how that stifles innovation. And yet they have a 12 year old PII box under the desk happily pushing packets.

NetWare didn't have this problem, even though it's been around longer. The driver interfaces in the NetWare kernel changed a very few times over the last 20 years (such as the DRV to HAM conversion during the NetWare 4.x era, and the introduction of SMP later on) which allowed really old drivers to continue working without revision for a really long time. This is how a 1998 vintage server could be running in 2007, and running well.

However, Linux is not NetWare. NetWare is a special purpose operating system, no matter what Novell tried in the late 90's to make it a general purpose one (NetWare + Apache + MySQL + PHP = a LAMP server that is far more vulnerable to runaway thread based DoS). Linux is a general purpose operating system. This key difference between the two means that Linux got exposed to a lot more weird hardware than NetWare ever did. SCSI attached scanners made no sense on NetWare, but they did on Linux 10 years ago. Putting any kind of high-test graphics card into a NetWare server is a complete waste, but on Linux it'll give you those awesome wibbly-windows.

There comes a time when an open source project has to cut away the old stuff. Figuring this out is hard, especially when the really old crap is running under desks or in closets entirely forgotten. It is for this reason that Smolt was born. To create a database of hardware that is running Linux, as a way to figure out driver development priorities. Both in creating new, missing drivers, and keeping up old but still frequently used drivers.

If you're running a Pentium 2-233 machine as your network's NTP server, you need to let the Linux community know about it so your platform maintains supportability. It is no longer good enough to assume that if it worked in Linux once, it'll always work in Linux.

Labels: , , ,


Monday, December 07, 2009

Account lockout policies

This is another area where how Novell and Microsoft handle a feature differ significantly.

Since NDS was first released back at the dawn of the commercial internet (a.k.a. 1993) Novell's account lockout policies (known as Intruder Lockout) were set-able based on where the user's account existed in the tree. This was done per Organizational-Unit or Organization. In this way, users in .finance.users.tree could have a different policy than .facilities.users.tree. This was the case in 1993, and it is still the case in 2009.

Microsoft only got a hierarchical tree with Active Directory in 2000, and they didn't get around to making account lockout policies granular. For the most part, there is a single lockout policy for the entire domain with no exceptions. 'Administrator' is subjected to the same lockout as 'Joe User'. With Server 2008 Microsoft finally got some kind of granular policy capability in the form of "Fine Grained Password and Lockout Policies."

This is where our problem starts. You see, with the Novell system we'd set our account lockout policies to lock after 6 bad passwords in 30 minutes for most users. We kept our utility accounts in a spot where they weren't allowed to lock, but gave them really complex passwords to compensate (as they were all used programatically in some form, this was easy to do). That way the account used by our single-signon process couldn't get locked out and crash the SSO system. This worked well for us.

Then the decision was made to move to a true blue solution and we started to migrate policies to the AD side where possible. We set the lockout policy for everyone. And we started getting certain key utility accounts locked out on a regular basis. We then revised the GPOs driving the lockout policy, removing them from the Default Domain Policy, creating a new "ILO polcy" that we applied individually to each user container. This solved the lockout problem!

Since all three of us went to class for this 7-9 years ago, we'd forgotten that AD lockout policies are monolithic and only work when specified in Default Domain Policy. They do NOT work per-user the way they are in eDirectory. By doing it the way we did, no lockout policies were being applied anywhere. Googling on this gave me the page for the new Server 2008-era granular policies. Unfortunately for us, it requires the domain to be brought to the 2008 Functional Level, which we can't do quite yet.

What's interesting is a certain Microsoft document that suggested settings of 50 bad logins every 30 minutes as a way to avoid DoSing your needed accounts. That's way more that 6 every 30.

Getting the forest functional level raised just got more priority.

Labels: , , , , , ,


Tuesday, November 17, 2009

Restrictive internet policies

A friend of mine griped today:
In a stroke of utter WTF-ness... my workplace has blocked access to LinkedIn.com.
It's not so WTF for me as I can see why it was blocked. LinkedIn is seen as a tool for people looking to transition jobs. So if you're blocking Monster and Dice, then LinkedIn is right up there with it. The fact that it also is a useful way to network for business is beside the point. From earlier gripes, this particular workplace is on a crusade to block all social-networking sites. I only saw this post because of email-to-post gateways, and they haven't blocked gmail yet.

It is situations like these that give rise to the scenario I described back in June: I Want my SSH. Additionally, a lot of social networking sites are publishing apps for the various app-driven smartphones out there. For users willing to invest a bit of money into it, corporate firewalls are no longer the barrier to slacking they once were.

Labels: ,


Thursday, October 29, 2009

A matter of policy

This has been a long standing policy in Technical Services, dating to the previous VP-IT and endorsed by the current one. This policy concerns email like this, generally from a manager of some kind:
"[Person X] no longer works here. Please change their password and give it to [Person Y] so they can handle email. And please set an out-of-office rule notifiying people of [Person X's] absence."
To which we politely decline. What we will do is set the out-of-office rule, that's just fine. We'll also either give a PST extract of Person X's mailbox, or if there really is no other way (the person was the Coordinator of the Z's for 20+ years and handled all the communications themselves before retiring/dying) we'll grant read-access to the mailbox to another person, and effectively turn the Person X account into a group account but lacking send-as rights.

What we will categorically not do is change a password for an inactive user and give the login to someone else. It comes down to identity theft. If we give Person Y the login info for Person X, Person Y can send email impersonating Person X. And that is wrong on a number of levels.

We resist giving access to the mailbox as well, since a non-trivial proportion of end-users give their work email as the email address for web-registration pages all over the internet. And thus that's where the "password reminder" emails get sent. Having access to someone else's mailbox is a good way to start the process of hacking an identity.

Yes, we do occasionally get a high level manager pushing us on this. But once we explain our rationalle, they've backed down so far. There is a reason we say no when we say no.

Labels: ,


Wednesday, October 28, 2009

You can tell I've been at this a while

Last night while I was sleeping, I had a dream. In my dream I was at my desk at work. I picked up my flashlight for some reason and just then the power decided to drop. DARKNESS. And the UPS alarm in the distance. This was concerning since my workstation is on a power outlet attached to the datacenter UPS, so if my computer was out, chances were real good the entire datacenter was also down. Very bad.

Happily I just happened to have my flashlight in hand! So I powered on and went to the datacenter door. But my access card wouldn't work. The card-reader has its own internal battery, so it not reading me at all, or even giving me the access-denied angry-beep, was doubly bad. Happily, coworker dropped by and could get in so I ghosted on in behind him. The room was noisy and had all the right lights. But the UPS was still alarming. Not surprising, it's supposed to do that.

Then I woke up. I checked the clock, still had power. And there was a beep in the distance.

A smoke alarm was crying for a new battery. At 5:30am. It's just a single beep, but it seems my unconscious mind interpreted that as a UPS alarm even though those are ususally three beeps.

Labels:


Thursday, October 22, 2009

Windows 7 releases!

Or rather, its retail availability is today. We're on a Microsoft agreement, so we've had it since late August. And boy do I know that. I've been having a trickle of calls and emails ever since the beta released about various ways Win7 isn't working in my environment and whether I have any thoughts about that. Well, I do. As a matter of fact, Technical Services and ATUS both have thoughts on that:

Don't use it yet. We're not ready. Things will break. Don't call us when it does.

But as with any brand new technology there is demand. Couple that with the loose 'corporate controls' inherent in a public Higher Ed institution and we have it coming in anyway. And I get calls when people can't get to stuff.

The main generator of calls is our replacement of the Novell Login Script. I've spoken about how we feel about our login script in the past. Back on July 9, 2004 I had a long article about that. The environment has changed, but it still largely stands. Microsoft doesn't have a built in login script the same way NetWare/OES has had since the 80's, but there are hooks we can leverage. One of my co-workers has built a cunning .VBS file that we're using for our login script, and does the kinds of things we need out of a login script:
  • Run a series of small applications we need to run, which drive the password change notification process among other things.
  • Maps drives based on group membership.
  • Maps home directories.
  • Allows shelling out to other scripts, which allows less privileged people to manage scripts for their own users.
A fair amount of engineering did go into that script, but it works. Mostly. And that's the problem. It works good enough that at least one department on campus decided to put Vista in their one computer lab and rely on this script to get drive mappings. So I got calls shortly after quarter-start to the effect of, "your script don't work, how can this be fixed." To which my reply was (summarized), "You're on Vista and we told y'all not to do that. This isn't working because of XYZ, you'll have to live with it." And they have, for which I am greatful.

Which brings me to XYZ and Win7.

The main incompatibility has to do with the NetWare CIFS stack. Which I describe here. The NetWare CIFS stack doesn't speak NTLMv2, only LM and NTLM. In this instance, it makes it similar to much older Samba versions. This conflicts with Vista and Windows 7, which both default their LAN Manager Authentication Level to "NTLMv2 Responses Only." Which means that out of the box both Vista and Win7 will require changes to talk to our NetWare servers at all. This is fine, so long as they're domained we've set a Group Policy to change that level down to something the NetWare servers speak.

That's not all of it, though. Windows 7 introduced some changes into the SMB/CIFS stack that make talking to NetWare a bit less of a sure thing even with the LAN Man Auth level set right. Perhaps this is SMB2 negotiations getting in the way. I don't know. But for whatever reason, the NetWare CIFS stack and Win7 don't get along as well as the Vista's SMB/CIFS stack did.

The main effect of this is that the user's home-directory will fail to mount a lot more often on Win7 than on Vista. Also, other static drive mappings will fail more often. It is reasons like these that we are not recommending removing the Novell Client and relying on our still in testing Windows Login Script.

That said, I can understand why people are relying on the crufty script rather than the just-works Novell Login Script. Due to how our environment works, The Vista/Win7 Novell Client is dog slow. Annoyingly slow. So annoyingly slow that not getting some drives when you log in is preferable to dealing with it.

This will all change once we move the main file-serving cluster to Windows 2008. At that point, the Windows script should Just Work (tm). At that point, getting rid of the Novell Client will allow a more functional environment. We are not at that point yet.

Labels: , , ,


Thursday, October 15, 2009

It's the little things

Right now our Microsoft migration schedule is hung up on backup licenses. Backing up clustered servers requires extensions, which we didn't notice back when we priced out the project. It is things like these that make for cost-overruns. The long and the short of it is, we're not migrating anything until we can legally back up the new environment. Period. That's just how it is.

As most of the budget arm-wrestling happens above me, I only get bits and pieces. Since we don't spend our money, we spend other people's money, we have to convince other people that this money needs to be spent. I understand there was some pushback when the quote came in, and we've been educating about what exactly it would mean if we don't do this.

I understand the order is in the works, and we're just waiting on license codes. But until they arrive (electronic delivery? What's dat?) we simply can not move forward. That's just how it is.

Labels: ,


Friday, September 25, 2009

More thoughts on the Novell support change

Something struck me in comments on the last post about this that I think needs repeating on a full post.

Novell spent quite a bit of time attempting to build up their 'community' forums for peer-support. Even going so far as to seed the community with supported 'sysops' who helped catalyze others into participating, and creating a vibrant peer support community. This made sense because it built both goodwill and brand loyalty, but also reduced the cost-center known as 'support'. All those volunteers were taking the minor-issue load off of the call-in support! Money saved!

Fast forward several years. Novell bought SuSE and got heavily into Open Source. Gradually, as the OSS products started to take off commercially, the support contracts became the main money maker instead of product licenses. Just as suddenly, this vibrant goodwill-generating peer-support community is taking vital business away from the revenue-stream known as 'support'. Money lost!

Just a simple shift in the perception of where 'support' fits in the overall cost/revenue stream makes this move make complete sense.

Novell will absolutely be keeping the peer support forums going because they do provide a nice goodwill bonus to those too cheap to pay for support. However.... with 'general support' product-patches going behind a pay-wall, the utility of those forums decreases somewhat. Not all questions, or even most of them for that matter, require patches. But anyone who has called in for support knows the first question to be asked is, "are you on the latest code," and that applies to forum posts as well.

Being unable to get at the latest code for your product version means that the support forum volunteers will have to troubleshoot your problem based on code they may already be well past, or not have had recent experience with. This will necessarily degrade their accuracy, and therefore the quality of the peer support offered. This will actively hurt the utility of the peer-support forums. Unfortunately, this is as designed.

For users of Novell's active-development but severe underdog products such as GroupWise, OES2, and Teaming+Conferencing, the added cost of paying for a maintenance/support contract can be used by internal advocates of Exchange, Windows, and SharePoint as evidence that it is time to jump ship. For users of Novell's industry-leading products such as Novell Identity Management, it will do exactly as designed and force these people into maintaining maintenance contracts.

The problem Novell is trying to address are the kinds of companies that only buy product licenses when they need to upgrade, and don't bother with maintenance unless they're very sure that a software upgrade will fall within the maintenance period. I know many past and present Novell shops who pay for their software this way. It has its disadvantages because it requires convincing upper management to fork over big bucks every two to five years, and you have to justify Novell's existence every time. The requirement to have a maintenance contract in order for your highly skilled staff to get at TIDs and patches, something that used to be both free and very effective, is a real-world major added expense.

This is the kind of thing that can catalyze migration events. A certain percentage will pony up and pay for support every year, and grumble about it. Others, who have been lukewarm towards Novell for some time due adherence to the underdog products, may take it as the sign needed to ditch these products and go for the industry leader instead.

This move will hurt their underdog-product market-share more than it will their mid-market and top-market products.

If you've read Novell financial statements in the past few years you will have noticed that they're making a lot more money on 'subscriptions' these days. This is intentional. They, like most of the industry right now, don't want you to buy your software in episodic bursts every couple years. They want you to put a yearly line-item in your budget that reads, "Send money to Novell," that you forget about because it is always there. These are the subscriptions, and they're the wave of the future!

Labels: , ,


Thursday, September 24, 2009

Very handy but terrible plugin

Yes, this plugin is a terrible idea.

But then, so are appliances with built in self-signed SSL certificates you can't change. You take what you can get.

Labels: ,


Tuesday, September 08, 2009

DNS and AD Group Policy

This is aimed a bit more at local WWU users, but it is more widely applicable.

Now that we're moving to an environment where the health of Active Directory plays a much greater role, I've been taking a real close look at our DNS environment. As anyone who has ever received any training on AD knows, DNS is central to how AD works. AD uses DNS the way WinNT used WINS, the way IPX used SAPs, or NetWare uses SLP. Without it things break all over the place.

As I've stated in a previous post our DNS environment is very fragmented. As we domain more and more machines, the 'univ.dir.wwu.edu' domain becomes the spot where the vast majority of computing resources is resolveable. Right now, the BIND servers are authoritative for the in-addr.arpa reverse-lookup domains which is why the IP address I use for managing my AD environment resolves to something not in the domain. What's more, the BIND servers are the DNS servers we pass out to every client.

That said, we've done the work to make it work out. The BIND servers have delegation records to indicate that the AD DNS root domain of dir.wwu.edu is to be handled by the AD DNS servers. Windows clients are smart enough to notice this and do the DNS registration of their workstation name against the AD DNS servers and not the BIND servers. That said, the in-addr.arpa domains are authoritative on the BIND servers so the client's attempt to register the reverse-lookup records all fail. Every client on our network has Event Log entries to this effect.

Microsoft has DNS settings as a possible target for management through Group Policy. This could be used to help ensure our environment stays safe, but will require analysis before we do anything. Changes will not be made without a testing period. What can be done, and how can it help us?

Primary DNS Suffix
Probably the simplest setting of the lot. This would allow us to force all domained machines to consider univ.dir.wwu.edu to be their primary DNS domain and treat it accordingly for Dynamic DNS updates and resource lookups.

Dynamic Update
This forces/allows clients to register their names into the domain's DNS domain of univ.dir.wwu.edu. Most already do this, and this is desirable anyway. We're unlikely to deviate from default on this one.

DNS Suffix Search List
This specifies the DNS suffixes that will be applied to all lookup attempts that don't end in period. This is one area that we probably should use, but don't know what to set. univ.dir.wwu.edu is at the top of the list for inclusion, but what else? wwu.edu seems logical, and admcs.wwu.edu is where a lot of central resources are located. But most of those are in univ.dir.wwu.edu now. So. Deserves thought.

Primary DNS Suffix Devolution
This determines whether to include the component parts of the primary dns suffix in the dns search list. If we set the primary DNS suffix to be univ.dir.wwu.edu, then the DNS resolver will also look in dir.wwu.edu, and wwu.edu. I believe the default here is 'True'.

Register PTR Records
If the in-addr.arpa domains remain on the BIND servers, we should probably set this to False. At least so long as our BIND servers refuse dynamic updates that is.

Registration Refresh Interval
Determines how frequently to update Dynamic registrations. Deviation from default seems unlikely.

Replace Addresses in Conflicts
This is a setting for handling how multiple registrations for the same IP (here defined as multiple A records pointing to the same IP) are to be handled. Since we're using insecure DNS updates at the moment, this setting deserves some research.

DNS Servers
If the Win/NW side of Tech Services wishes to open warfare with the Unix side of Tech Services we'll set this to use the AD DNS servers for all domained machines. For this setting overrides client-side DNS settings with the DNS servers defined in the Group Policy. No exceptions. A powerful tool. If we set this at all, it'll almost definitely be the BIND DNS servers. But I don't think we will. Also, it may be true that Microsoft has removed this from the Server 2008 GPO, as it isn't listed on this page.

Register DNS Records with Connection-Specific DNS Suffix
If a machine has more than one network connection (very, very few non VMWare host-machines will) allow them to register those connections against their primary DNS suffix. Due to the relative derth of configs, we're unlikely to change this from default.

TTL Set in the A and PTR Records
Since we're likely to turn off PTR updates, this setting is redundant.

Update Security Level
As more and more stations domain, there will come a time when we may wish to cut out the non-domained stations from updating into univ.dir.wwu.edu. If that times come, we'll set this to 'secure only'. Until then, won't touch it.

Update Top Level Domain Zones
This allows clients to update a TLD like .local. Since our tree is not rooted in a TLD, this doesn't apply to us.

Some of these can have wide ranging effects, but are helpful. I'm very interested in the search-list settings, since each of our desktop techs has tens of DNS domains to chose from depending on their duty area. Something here might greatly speed up resouce resolution times.

Labels: , ,


Tuesday, September 01, 2009

Pushing a feature

One of the things I have missed when Novell went from SLE9 to SLE10 was the lack of a machine name in the title-bar for YaST. It used to look like this:

The old YaST titlebar

With that handy "@[machinename]" in it. These days it is much less informative.

The new YaST titlebar

If you're using SSH X-forwarding to manage remote servers, it is entirely possible you'll have multiple YaST windows open. How can you tell them apart? Back in the 9 days it was simple, the window told you. Since 10, the marker has gone away. This hasn't changed in 11.2 either. I would like this changed so I put in a FATE request!

If you'd also like this changed, feel free to vote up feature 306852! You'll need a novell.com login to vote (the opensuse.org site uses the same auth back end so if you have one there you have one on OpenFATE).

Thank you!

Labels: , ,


Friday, August 28, 2009

Fabric merges

When doing a fabric merge with Brocade gear, when they say that the Zone configuration needs to be exactly the same on both switches, they mean that. The merge process does no parsing, it just compares the zone config. If the metaphorical diff returns anything it doesn't merge. So if one zone has a swapped order of two nodes but is otherwise identical, it'll not merge.

Yes, this is very conservative. And I'm glad for it, since failure here would have brought down our ESX cluster and that's a very wince-worthy collection of highly visible services. But it took a lot of hacking to get the config on the switch I'm trying to merge into the fabric to be exactly right.

Labels: , ,


Tuesday, August 18, 2009

Didn't know that

The integrated network card in the HP DL380-G2 doesn't have a Windows Server 2008 driver. Anywhere. And the forum post that says you can use the 2003 driver on it lies, unless there is some even sneakier way of getting a driver in than I know of.

This is a problem, as that's one of our Domain Controllers. But not much of one, since it's one of the three DC's in the empty root (our forest is old enough for that particular bit of discredited advice) and all it does is global-catalog work. And act as our ONLY DOMAIN CONTROLLER on campus. In the off chance that a back-hoe manages to cut BOTH fiber routes to campus, it's the only GC up there.

Also, since it couldn't boot from a USB-DVD drive I had to do a parallel install of 2008 on it. So I still had my perfectly working 2003 install available. So I just dcpromoed the 2003 install and there we are!

Once we get a PCI GigE card for that server I can try getting 2008 working again.

Labels: ,


Thursday, August 13, 2009

Why we still use WINS when we have AD

WINS... the Windows Internet Name Service. Introduced in, I believe, Windows NT 3.5 in order to allow Windows name resolution to work across different IP subnets. NetBIOS relies on broadcasts for name resolution, and WINS allowed it to work by using a unicast to the WINS server to find addresses. In theory, DNS in Active Directory (now nine years old!) replaced it.

Not for us.

There are two things that drive the continued existence of WINS on our network, and will ensure that I'll be installing the Server 2008 WINS server when I upgrade our Domain Controllers in the next two weeks:
  1. We still have a lot of non-domained workstations
  2. Our DNS environment is mind-bogglingly fragmented
Here is a list of domains we have, and this is just the domains we're serving with DHCP. There are a lot more:
  • admcs.wwu.edu
  • ac.bldg.wwu.edu
  • ae.bldg.wwu.edu
  • ah.bldg.wwu.edu
  • ai.bldg.wwu.edu
  • cv.bldg.wwu.edu
  • es.bldg.wwu.edu
  • om.bldg.wwu.edu
  • rh.bldg.wwu.edu
  • rl.bldg.wwu.edu
  • archives.wwu.edu
  • bh319lab.wwu.edu
  • bldg.wwu.edu
  • canada.wwu.edu
  • ci.wwu.edu
  • clsrm.wwu.edu
  • cm.wwu.edu
  • crc.wwu.edu
  • etd110.lab01.wwu.edu
  • fm.wwu.edu
  • hh101lab.wwu.edu
  • hh112lab.wwu.edu
  • hh154lab.wwu.edu
  • hh245lab.wwu.edu
  • history.wwu.edu
  • lab03.wwu.edu
  • math.wwu.edu
  • mh072lab.wwu.edu
  • psych.wwu.edu
  • soclab.wwu.edu
  • spmc.wwu.edu
  • ts.wwu.edu
There are more we're serving with DHCP, I just got bored making the list. The thing is, a lot of those networks, and especially the labs, contain 100% domained workstations. Since we only have the one domain, this means all those computers are in a flat DNS structure. In effect, each domained workstation on campus has two DNS names: the one on our BIND servers, and the one in the MS-DNS servers.

That said, for those machines that AREN'T in the domain the only way they can find anything is to use WINS. We will be using until the University President says unto the masses, "Thou Shalt Domain Thy PC, Or Thou Shalt Be Denied Service." Until then, WINS will continue to be the best way to find Windows resources on campus.

Labels: ,


Tuesday, August 11, 2009

Changing the CommandView SSL certificate

One of the increasingly annoying things that IT shops have to put up with is web based administration portals using self-signed SSL certificates. Browsers are increasingly making this setup annoying, and for a good reason. Which is why I try and get these pages signed with a real key if they allow me to.

HP's Command View EVA administration portal annoyingly overwrites the custom SSL files when it does an upgrade. So you'll have to do this every time you apply a patch or otherwise update your CV install.
  1. Generate a SSL certificate with the correct data.
  2. Extract the certificate into base-64 form (a.k.a. PEM format) in separate 'certificate' and 'private key' files.
  3. On your command view server overwrite the %ProgramFiles%\Hewlett-Packard\sanworks\Element Manager for StorageWorks HSV\server.cert file with the 'certificate' file
  4. Overwrite the %ProgramFiles%\Hewlett-Packard\sanworks\Element Manager for StorageWorks HSV\server.pkey file with the 'private key' file
  5. Restart the CommandView service
At that point, CV should be using your generated certificates. Keep these copied somewhere else on the server so you can quickly copy them back in when you update Command View.

Labels: , , ,


Non-paid work hours

Ars Technica has an article up today about workers who put in a lot of unpaid hours thanks to their mobile devices. This isn't a new dynamic by any means, we had a lot of this crop up when Corporate web-mail started becoming ubiquitous, and before that with the few employees using remote desktop software (PCAnywhere anyone?) to read email from home over corporate dialup. The BlackBerry introduced the phenomena to the rest of the world, and the smartphone revolution is bringing this to the masses.

My old workplace was union, so was in the process of figuring out how to compensate employees for after-hours call-out shortly after we got web-mail working. There were a few state laws and similar rulings that directed how it should be handled, and ultimately they decided on no less than 2-hours overtime pay for issues handled on the phone, and no less than 4-hours overtime pay for issues requiring a site-visit. Yet, no payment for being officially on-call with a mandatory response time; it was seen that actually responding to the call was the payment. Even if being on-call meant not being able to go to a child's 3 hour Dance recital.

Now that I'm an exempt employee, I don't get anything like overtime. If I spend 36 hours in a weekend shoving an upgrade into our systems through sheer force of will, I don't automatically get Monday off or a whonking big extra line-item on my next paycheck. It's between me and my manager how many hours I need to put in that week.

As for on-call, we don't have a formal on-call schedule. All of us agree we don't want one, and strive to make the informal one work for us all. No one wants to plan family vacations around an on-call schedule, or skip out of town sporting events for their kids just so they can be no more than an hour from the office just in case. It works for us, but all it'll take to force a formal policy is one bad apple.

For large corporations with national or global workforces, such gentleman's agreements aren't really doable. Therefore, I'm not at all surprised to see some lawsuits being spawned because of it. Yes, some industries come with on-call rotations baked in (systems administration being one of them). Others, such as tech-writing, don't generally have much after-hours work, and yet I've seen second hand such after hours work (working on docs, conference calls, etc) consume an additional 6 hours a day.

Paid/unpaid after hours work gets even more exciting if there are serious timezone differences involved. East Coast workers with the home-office on the West Coast will probably end up with quite a few 11pm conference calls. Reverse the locations, and the West Coast resident will likely end up with a lot of 5am conference calls. Companies that have drank deeply from the off-shoring well have had to deal with this, but have had the benefit of different labor laws in their off-shored countries.

"Work" is now very flexible. Certain soulless employers will gleefully take advantage of that, which is where the lawsuits come from. In time, we may get better industry standard practice for this sort of thing, but it's still several years away. Until then, we're on our own.

Labels: ,


Friday, August 07, 2009

Identity Management in .EDU land

We have a few challenges when it comes to an identity management system. As with any attempt to automate identity management, it is the exceptions that kill projects. This is an extension of the 80/20 rule, where 80% of the cases will be dead easy to manage, and it's the 20% that are special are where most of the business-rules meeting-time will be spent.

In our case, we have two major classes of users:
  • Students
  • Employees
And a few minor classes littered about like Emeritus Professors. I don't quite know enough about them to talk knowledgeably.

The biggest problem we have are how to handle the overlaps. Student workers. Staff who take classes. We have a lot of student workers, but staff who take classes are another story. The existence of these types of people make impossible having the two big classes as exclusive.

Banner handles this case pretty well from what I understand. The systems I manage, however, are another story. With eDirectory and the Novell Client, we had two big contexts named Students and Users. If your object was in one, that's the login script you ran. Active Directory was until recently Employee-only because of Exchange. We put the students in there (with no mailboxes of course) two years ago, largley because we could and it made the student-employee problem easier to manage.

One of the thorniest questions we have right now is defining, "when is a student a student with a job, and when is a student an employee taking classes." Unfortunately, we do not have a handy business rule to solve that. A rule, for example, like this one:
If a STUDENT is taking less than M credit-hours of classes, and is employed in a job-class of C1-F9, then they shall be reclassed EMPLOYEE.
That would be nice. But we don't have it, because the manual exception-handling process this kicks off is not quite annoying enough to warrant the expense of deciding on an automatable threshold. Because this is a manual process, people rarely get moved back across the Student/Employee line in a timely way. If the migration process were automated, certain individuals would probably flop over the line every other quarter.

This one nice example of the sorts of discussions you have to have when rolling out an identity management automation system. If we were given umpty thousand dollars to deploy Novell IDM in order to replace our home-built system, we'd have to start having these kinds of discussions again. Even though we've had some kind of identity provisioning system since the early 90's. Because we DO have an existing one, some of the thornier questions of data-ownership and workflow are already solved. We'd just have to work through the current manual-intervention edge cases.

Labels: ,


Monday, August 03, 2009

Robust NTP environments

Due to my background as a NetWare guy, time-synchronization is something I pay attention to. Early versions of NDS were touchy about that, since the time-stamp was used in the conflicting-edits resolution process. NetWare didn't use a full up NTP client for this, Novell built their own form of it based on NTP code and called it TimeSync. Unlike NTP, TimeSync did what it could to ensure the entire environment was within a second or two of a single time. Because of the lower time resolution, it synced a lot faster than NTP did, and this was considered a good thing since out-of-sync time was considered an outage.

With that in mind, it is no surprise that I like to have a solid time-sync process in place on my networks. One of the principles of Novell's TimeSync config was the concept of a time-group. A group of servers who coordinated time between themselves, and a bunch of clients who poll the members of the time-servers for correct time. Back before internet connections were as ubiquitous as air, this was a good way for an office network to maintain a consensus time. Later on, TimeSync gained the ability to talk over TCP/IP, and could use NTP sources for external time, and this allowed TimeSync to hook into the universial time coordinated (UTC) system.

You can create much the same kind of network with NTP as you could with TimeSync. It requires more than one time server, but your clients only have to directly speak with one of the time servers in the group. Yet the same type of robustness can be had.

The concept is founded in the "peer" association for NTP. The definition of this verb is rather dry:
For type s addresses (only), this command mobilizes a persistent symmetric-active mode association with the specified remote peer.
And doesn't tell you much. This is much clearer:
Symmetric active/passive mode is intended for configurations were a clique of low-stratum peers operate as mutual backups for each other. Each peer operates with one or more primary reference sources, such as a radio clock, or a set of secondary (stratum, 2) servers known to be reliable and authentic. Should one of the peers lose all reference sources or simply cease operation, the other peers will automatically reconfigure so that time and related values can flow from the surviving peers to all hosts in the subnet. In some contexts this would be described as a "push-pull" operation, in that the peer either pulls or pushes the time and related values depending on the particular configuration.
Unlike TimeSync, if all the peers lose their upstreams (the internet connection is down) then the entire infrastructure goes out of sync. This can be mitigated somewhat through judicious use of the 'maxpoll' parameter; set it high enough, and it can be hours (or days if you set it really high) before the peer even notices it can't talk to its upstream and will continue to report in-sync time to clients.

It is also a very good idea to use ACLs in your ntp.conf file to restrict what types of connections clients can mobilize. It is quite possible to be evil to NTP servers. You can turn on enough options to allow trouble-shooting, but not allow config changes.

It is a very good idea for your peers to be cryptographically associated with each other as well. There are at least two methods for this with NTP, v3's autokey, and v4's symmetric key. Autokey is a somewhat easier to set up preshared-key system, symmetric key is more secure, either is more preferable to nothing.

Here is a pair of /etc/ntp.conf files for a hypothetical set of WWU time-servers (items like drift-file and logging options have been omitted):
server 0.north-america.pool.ntp.org maxpoll 13
server 1.north-america.pool.ntp.org maxpoll 13
peer 140.160.247.31 key 1

enable auth monitor
keys /etc/ntp.keys
trustedkey 1
requestkey 1

restrict default ignore
restrict 140.160.0.0 mask 255.255.0.0 nomodify nopeer
restrict 140.160.247.31
server 2.north-america.pool.ntp.org maxpoll 13
server 3.north-america.pool.ntp.org maxpoll 13
peer 140.160.11.86 key 1

enable auth monitor
keys /etc/ntp.keys
trustedkey 1
requestkey 1

restrict default ignore
restrict 140.160.0.0 mask 255.255.0.0 nomodify nopeer
restrict 140.160.11.86

The 'maxpoll' values ensure that once time has been synchronized for long enough, the time between polls of the upstream NTP servers will be 137 minutes. Hopefully, any internet outages should be less then that. Setting max-poll to even higher values will allow longer times between polling intervals, and therefore longer internet outage tolerance. This can get QUITE long, I've seen some NTP servers that poll twice a week.

The key settings set up an Autokey-style crypto system. The "key 1" option on the peer line indicates that the designated connection should use crypto validation. The actual data passed isn't encrypted, the crypto is used for identity validation. This prevents spoofing of time, which can lead to wildly off time values.

The 'restrict' lines tell the NTPD to ignore off campus requests for time (it'll still listen, but return access-denied to all requests), allow on-campus users to get time and do time tracing but nothing else, and allow full access to the peer time server. In theory, inbound NTP traffic should be stopped at the border firewall but just in case it'll deny any that get through.

This is a two server setup, but three or more server could easilly be involved. For a network our size (large) and complexity (simple), two to three time-servers is probably all we need. The peered time-servers will all report in-sync so long as one still considers itself in-sync with an upstream time-server.

Because peers sync time amongst themselves, clients only have to talk to a single time-server to get valid time. Of course, that introduces a single-point-of-failure in the system if that time-host ever has to go down. Because of this, I strongly recommend configuring NTP clients to use at least two upstreams.

Enjoy high quality time!

Labels:


Thursday, July 30, 2009

Datacenter environment

We're having a major heat-wave. The Sea-Tac airport set a record yesterday for hottest temperature on record at 103 degrees. Bellingham too, the old record of 94 set in 2007 was surpassed by the 96 reading of yesterday. Today is cooler, but still well above average for out here.

Much as I was tempted to show up for work today in a tank top, shorts, and flip-flops, I resisted. First of all, I did have a meeting up on campus with some executive-types higher than me so I had to keep up appearances. Also, flip-flops aren't that good for hiking the mile plus to campus.

Of course, today is a day when I get to do a surprise server rebuild in the datacenter! I just spent the last hour standing on a vent tile setting up a server reformat. While I'm not wearing flip-flops, I am wearing shorts. I was cold, so I went for a walk around the building to warm up, and it performed admirably.

Happily, since we have a data-center in the building, the building itself has AC. Not all buildings here do. In fact, the building I had that meeting in did not have any AC, just some moving air.

We have enough AC in the datacenter that the room isn't any hotter today than it gets in mid January. That's nice to have.

Labels:


Monday, July 27, 2009

Service delivery in .EDU-land

Matt of standalone-sysadmin fame asked:
I take it from the terminology ("fall quarter") that you work at a university.

How often do you re-engineer your infrastructure, or roll out new servers? Do yo align them to the school quarters? I'm interested in knowing how other people make decisions on roll-outs.
Until a couple weeks ago, this blog was hosted on a server named, "myweb.facstaff.wwu.edu," which should give you a real good idea of where I work ;). So yes, a university. We're also on quarters, not semesters, so our school year is a bit different than those that have only three terms a year instead of four.

For things that will require disruptive downtime for critical systems that'll exceed a few hours, we keep those to the times we're not teaching. We have on the order of 21,000 actual students kicking around (the FTE count is much smaller, we have a lot of part-timers) so outages get noticed. We have students actively printing and handing in homework to Blackboard at 4am, so 'dark of night' is only good for so many things.

The biggest is the summer intersession, which this year is between 8/25 @ Noon (the point grades are due from faculty) to roughly 9/18 (when students start moving into the dorms), is reserved for the big and disruptive projects. Things like completely migrating every file we have to new hardware, upgrading the ERP system we use (SCT Bannder), replacing the router core, upgrading our SAN-based disk-arrays, or upgrading Blackboard. Winter break and Spring break are the other times during the year when this kind of activity can take place.

Winter has a couple weeks to work with, but we're generally rather short-staffed during that period so we try not to do big stuff. Spring is just a few days, so things like a quick point-level upgrade to Blackboard could be done, something that doesn't require extensive testing, validation, or data conversion. Summer intersession is where the big heavy lifting can take place, and we do try and work our various vacations around this particular time of the year.

But we can and do roll new stuff out during session. If the new thing isn't disruptive to established work-flow it is a lot easier, or it just adds functionality to something they're already using. Anything student-visible gets extra scruiteny, as the potential for massive amounts of work on the part of our helpdesk is a lot higher. A lot of our decisions have significant inputs from the, "How much extra work will our Helpdesk experience as a result of this change?" question.

Also, the work varies. Some years we have a lot going on in the summer. This year we only have the one major project. In years when we have a lot going on, we've started planning the summer project season as early as March. Some things, like the router core update and the Banner updates, are known about 18 months or more in advance due to budgeting requirements. Other things, like Blackboard updates and oddly enough this Novell -> Windows migration project, aren't really committed to until May or later.

As for determining when what gets updated/upgraded, that's the responsibility of the maintainers of that application, infrastructure, or hardware to start. Due to the budget cycle, big ticket items are generally known about very far in advance of the actual project implementation stage. Everything eventually falls into the project coordination sphere, which is a very large part of the Technical Services Manager's job (you too can be my new boss! But wouldn't THAT be awkward?) . The TS Manager coordinates with the Academic Computing director and the Administrative Computing director, as well as the Vice Provost of course, to mutually set priorities and allocate resources.

p.s.: The Technical Services page for Organization Size is horribly horribly wrong. We have more servers then that for both MS and Linux. We have less NetWare servers, and by now less Unix servers. And way more disk space then that.

Labels:


Tuesday, July 21, 2009

Digesting Novell financials

It's a perennial question, "why would anyone use Novell any more?" Typically coming from people who only know Novell as "That NetWare company," or perhaps, "the company that we replaced with Exchange." These are the same people who are convinced Novell is a dying company who just doesn't know it yet.

Yeah, well. Wrong. Novell managed to turn the corner and wean themselves off of the NetWare cash-cow. Take the last quarterly statement, which you can read in full glory here. I'm going to excerpt some bits, but it'll get long. First off, their description of their market segments. I'll try to include relevant products where I know them.

We are organized into four business unit segments, which are Open Platform Solutions, Identity and Security Management, Systems and Resource Management, and Workgroup. Below is a brief update on the revenue results for the second quarter and first six months of fiscal 2009 for each of our business unit segments:



Within our Open Platform Solutions business unit segment, Linux and open source products remain an important growth business. We are using our Open Platform Solutions business segment as a platform for acquiring new customers to which we can sell our other complementary cross-platform identity and management products and services. Revenue from our Linux Platform Products category within our Open Platform Solutions business unit segment increased 25% in the second quarter of fiscal 2009 compared to the prior year period. This product revenue increase was partially offset by lower services revenue of 11%, such that total revenue from our Open Platform Solutions business unit segment increased 18% in the second quarter of fiscal 2009 compared to the prior year period.

Revenue from our Linux Platform Products category within our Open Platform Solutions business unit segment increased 24% in the first six months of fiscal 2009 compared to the prior year period. This product revenue increase was partially offset by lower services revenue of 17%, such that total revenue from our Open Platform Solutions business unit segment increased 15% in the first six months of fiscal 2009 compared to the prior year period.

[sysadmin1138: Products include: SLES/SLED]



Our Identity and Security Management business unit segment offers products that we believe deliver a complete, integrated solution in the areas of security, compliance, and governance issues. Within this segment, revenue from our Identity, Access and Compliance Management products increased 2% in the second quarter of fiscal 2009 compared to the prior year period. In addition, services revenue was lower by 45%, such that total revenue from our Identity and Security Management business unit segment decreased 16% in the second quarter of fiscal 2009 compared to the prior year period.

Revenue from our Identity, Access and Compliance Management products decreased 3% in the first six months of fiscal 2009 compared to the prior year period. In addition, services revenue was lower by 40%, such that total revenue from our Identity and Security Management business unit segment decreased 18% in the first six months of fiscal 2009 compared to the prior year period.

[sysadmin1138: Products include: IDM, Sentinal, ZenNAC, ZenEndPointSecurity]



Our Systems and Resource Management business unit segment strategy is to provide a complete “desktop to data center” offering, with virtualization for both Linux and mixed-source environments. Systems and Resource Management product revenue decreased 2% in the second quarter of fiscal 2009 compared to the prior year period. In addition, services revenue was lower by 10%, such that total revenue from our Systems and Resource Management business unit segment decreased 3% in the second quarter of fiscal 2009 compared to the prior year period. In the second quarter of fiscal 2009, total business unit segment revenue was higher by 8%, compared to the prior year period, as a result of our acquisitions of Managed Object Solutions, Inc. (“Managed Objects”) which we acquired on November 13, 2008 and PlateSpin Ltd. (“PlateSpin”) which we acquired on March 26, 2008.

Systems and Resource Management product revenue increased 3% in the first six months of fiscal 2009 compared to the prior year period. The total product revenue increase was partially offset by lower services revenue of 14% in the first six months of fiscal 2009 compared to the prior year period. Total revenue from our Systems and Resource Management business unit segment increased 1% in the first six months of fiscal 2009 compared to the prior year period. In the first six months of fiscal 2009 total business unit segment revenue was higher by 12% compared to the prior year period as a result of our Managed Objects and PlateSpin acquisitions.

[sysadmin1138: Products include: The rest of the ZEN suite, PlateSpin]



Our Workgroup business unit segment is an important source of cash flow and provides us with the potential opportunity to sell additional products and services. Our revenue from Workgroup products decreased 14% in the second quarter of fiscal 2009 compared to the prior year period. In addition, services revenue was lower by 39%, such that total revenue from our Workgroup business unit segment decreased 17% in the second quarter of fiscal 2009 compared to the prior year period.

Our revenue from Workgroup products decreased 12% in the first six months of fiscal 2009 compared to the prior year period. In addition, services revenue was lower by 39%, such that total revenue from our Workgroup business unit segment decreased 15% in the first six months of fiscal 2009 compared to the prior year period.

[sysadmin1138: Products include: Open Enterprise Server, GroupWise, Novell Teaming+Conferencing,

The reduction in 'services' revenue is, I believe, a reflection in a decreased willingness for companies to pay Novell for consulting services. Also, Novell has changed how they advertise their consulting services which seems to also have had an impact. That's the economy for you. The raw numbers:


Three months ended


April 30, 2009

April 30, 2008

(In thousands)


Net revenue
Gross
profit


Operating
income (loss)


Net revenue
Gross
profit


Operating
income (loss)

Open Platform Solutions


$ 44,112
$ 34,756

$ 21,451

$ 37,516
$ 26,702

$ 12,191

Identity and Security Management



38,846

27,559


18,306


46,299

24,226


12,920

Systems and Resource Management



45,354

37,522


26,562


46,769

39,356


30,503

Workgroup



87,283

73,882


65,137


105,082

87,101


77,849

Common unallocated operating costs





(3,406 )

(113,832 )



(2,186 )

(131,796 )























Total per statements of operations


$ 215,595
$ 170,313

$ 17,624

$ 235,666
$ 175,199

$ 1,667



























Six months ended


April 30, 2009

April 30, 2008

(In thousands)


Net revenue
Gross
profit


Operating
income (loss)


Net revenue
Gross
profit


Operating
income (loss)

Open Platform Solutions


$ 85,574
$ 68,525

$ 40,921

$ 74,315
$ 52,491

$ 24,059

Identity and Security Management



76,832

52,951


35,362


93,329

52,081


29,316

Systems and Resource Management



90,757

74,789


52,490


90,108

74,847


58,176

Workgroup



177,303

149,093


131,435


208,840

173,440


155,655

Common unallocated operating costs





(7,071 )

(228,940 )



(4,675 )

(257,058 )























Total per statements of operations


$ 430,466
$ 338,287

$ 31,268

$ 466,592
$ 348,184

$ 10,148

So, yes. Novell is making money, even in this economy. Not lots, but at least they're in the black. Their biggest growth area is Linux, which is making up for deficits in other areas of the company. Especially the sinking 'Workgroup' area. Once upon a time, "Workgroup," constituted over 90% of Novell revenue.
Revenue from our Workgroup segment decreased in the first six months of fiscal 2009 compared to the prior year period primarily from lower combined OES and NetWare-related revenue of $13.7 million, lower services revenue of $10.5 million and lower Collaboration product revenue of $6.3 million. Invoicing for the combined OES and NetWare-related products decreased 25% in the first six months of fiscal 2009 compared to the prior year period. Product invoicing for the Workgroup segment decreased 21% in the first six months of fiscal 2009 compared to the prior year period.
Which is to say, companies dropping OES/NetWare constituted the large majority of the losses in the Workgroup segment. Yet that loss was almost wholly made up by gains in other areas. So yes, Novell has turned the corner.

Another thing to note in the section about Linux:
The invoicing decrease in the first six months of 2009 reflects the results of the first quarter of fiscal 2009 when we did not sign any large deals, many of which have historically been fulfilled by SUSE Linux Enterprise Server (“SLES”) certificates delivered through Microsoft.
Which is pretty clear evidence that Microsoft is driving a lot of Novell's Operating System sales these days. That's quite a reversal, and a sign that Microsoft is officially more comfortable with this Linux thing.

Labels: , , , , , , , ,


This page is powered by Blogger. Isn't yours?