September 2012 Archives

Founding a chapter takes people

Two kinds of people are needed for this DC chapter:

  1. LOPSA members, or people willing to become one. These will be our Charter Members.
  2. People willing to present on things.

The two can be one and the same person, but we still need to cover both types.

Presenting really isn't that hard, we're not holding you up to the standard shown at industry conferences. What we need is interesting. We already do interesting in the halls at conferences as we swap stories, so plenty of us already have the technical story to tell. The next step is working up some slides, or a demo, to show off whatever that is.

What qualifies as a good topic? Plenty!

  • Solving a tricky cross-platform issue in Puppet/cfengine/chef/bcfg.
  • Adding Windows to an existing Linux-only configuration-management system.
  • Disaster recovery post-mortem notes.
  • Office campaigns to convince financial powers-that-be that a major upgrade really is in their best interest, how the campaign was fought and won.
  • Anything having to do with scaling out systems, and problems encountered.
  • Creative ways of dealing with bring-your-own-device policies.

We have tech-startups, big government, and mid-size private companies all around the area, so there is a lot of potential audience for your story. And these meetings are the kind of place you can share just those stories.

Think you'd like to listen to these stories?

Think you could maybe share one or two?

Fill out this form!

Or drop a comment on this post! They're screened so I'll see them before they go public, and I promise I won't publish the comment if you ask me not to.



This came across Twitter this morning:

copyright-teens.png
You know, that works suspiciously well. Certain Forces are most definitely pursuing correct behavior the same way they still tell kids to "just wait, it's better if you do. Trust us."

Starting a local LOPSA chapter

One thing lead to another and I'm now helping to co-found a Washington DC LOPSA chapter with LOPSA board-member Evan Pettery. We've had a chapter in the area for some time, the Crabby Admins, but I've yet to make a meeting since getting from downtown DC to where those meetings are is quite a challenge. Of the "get home from work early then drive there through Rush Hour traffic to a spot equidistant between DC and Baltimore" kind of challenge. We expect this new chapter will draw from the central DC and Northern Virginia areas.

It also helps that our meeting site is on Metro, just off Franklin Square. In fact, it's the same spot as the DC RUG and some MongoDC meetings.

What we need right now are two big things:

  • People who want to attend
  • People who want to show off what they've been working on

Drop a note here, or on the website and we'll get in touch.





Arr, there be testin

That paragon of admin-like personage, Captain Limoncelli has gone and thrown up a new website. If you be the type who finds yer self roped to a semaphore all the bloody day, deal with damned scriveners any time something explodes the wrong way, then you be kind who'd like to know about that there site.

http://www.opsreportcard.com/

A while back now he made this test, see. It kinda got around. So the good Cap'n took the ole quill in hand and did up a good bit o writin'. During that there test's travels a few clarifying comments shall we say were made. So an update were in order and this site be it. Chock full of details, and thick enough to club even the most weevil-eaten powder-head into some sense.

The cloud will happen

Like many olde tyme sysadmins, I look at 'cloud' and shake my head. It's just virtualization the way we've always been doing it, but with yet another abstraction layer on top to automate deploying certain kinds of instances really fast.

However... it's still new to a lot of entities. The concept of an outsourced virtualization plant is very new. For entities that use compliance audits for certain kinds of vendors it is most definitely causing something of a quandary. How much data-assurance do you mandate for such suppliers? What kind of 3rd party audits do you mandate they pass? Lots of questions.

Over on 3 Geeks and a Law Blog, they recently covered this dynamic in a post titled The Inevitable Cloud as it relates to the legal field. In many ways, the Law field shares information handling requirements similar to the Health-Care field, though we don't have HIPPA. We handle highly sensitive information, and who had access to what, when, and what they did with it can be extremely relevant details (it's called spoliation). Because of this, certain firms are very reluctant to go for cloud solutions.

Some of their concerns:

  • Who at the outsourcer has access to the data?
  • What controls exist to document what such people did with the data?
  • What guarantees are in place to ensure that any modification is both detectable and auditable?

For an entity like Amazon AWS (a.k.a. Faceless Megacorp) the answer to the first may not be answerable without lots of NDAs being signed. The answers to the second may not even be given by Amazon unless the contract is really big. The answers to the third? How about this nice third-party audit report we have...

The pet disaster for such compliance officers is a user with elevated access deciding to get curious and exploiting a maintenance-only access method to directly access data files or network streams. The ability of an entity to respond to such fears to satisfaction means they can win some big contracts.

However, the costs of such systems are rather high; and as the 3 Geeks point out, not all revenue is profit-making. Firms that insist on end-to-end transport-mode IPSec and universally encrypted local storage all with end-user-only key storage are going to find fewer and fewer entities willing to play ball. A compromise will be made.




However, at the other end of the spectrum you have the 3 person law offices of the world and there are a lot more of them out there. These are offices who don't have enough people to bother with a Compliance Officer. They may very well be using dropbox to share files with each other (though possibly TrueCrypted), and are practically guaranteed to be using outsourced email of some kind. These are the firms that are going into the cloud first, pretty much by default. The rest of the market will follow along, though at a remove of some years.

Exciting times.

Charging by the hour, a story of clouds

Question: When a (IaaS) cloud provider charges per hour for a machine, what's it an hour of? Do I get charged when it's doing nothing? If so, why is that fair?

All the IaaS cloud providers I've run into (which isn't all of them by any stretch) charge by the running hour. If that micro-mini instance is doing nothing but emailing the contents of a single file once a day, it'll still get charged for 24 hours of activity if left on. The same goes for a gargantuGPU instance doing the same work, it'll just cost more to do nothing.

Why is that fair?

Because of resources.

The host machine running all of these virtual machines has many resources. CPU, memory, disk, network, the usual suspects. These resources have to be shared between all the virtual machines. Lets take a look at each and see how easy that is.

CPU

To share CPU between VMs the host has to be able to share execution between them. Much like we do... well, practically everywhere now. We've been doing multiprocess operating systems for a while now. Sharing CPU cycles is dead easy. If a process needs a lot it gets what's available. If it needs none, it gets none. A thousand processes all doing nothing causes... nothing to happen! It's perhaps the easiest thing to share. But, we'll see.

Memory

We've been sharing RAM between processes, with good isolation even, for some time now. Even Apple has joined that game to great effect. Unlike CPU, processes sit on RAM the entire time they're running. It may be swapped out by the OS, but it's still accounted for.

Disk

Disk? Disk is easy. It's just files. Each file gets so much, and more if needed up until you run out. At which point you run into problems. Each VM uses disk to store its files, as you'd expect.

Network

To share network, a host machine has to proxy network connections from a VM. Which... it kinda already does for normal OS processes, like, say, Apache, or MySQL. If a process doesn't need any network resources, none gets used. If it needs some, it uses up to what's available. A thousand processes all doing nothing uses no network resources. Same for VMs really. Its right up there with CPU for ease of sharing.

Now ask yourself. Of these four major resources, which of them are always consumed when a VM (or if you rather, a process) is running?

If you said "memory and disk" you've been paying attention.

If you said "all but network, and maybe even that too", you've been auditing this answer for technical accuracy and probably noticed a few (gross) simplifications so far. Please bear with me!

Now of the two always-consumed resources, memory and disk, which is going to be the more constrained one?

If you look at it from the old memory hierarchy chart based on "how long does the CPU have to wait if it needs to get data from a specific location", you can begin to see a glimmer of the answer here. This is usually measured in CPU cycles spent waiting for data. The lower down the chart you get (faster) the more expensive the storage. A 2.5GHz CPU will have 2.5 billion cycles in a second. Remember that number.

A 7.2K RPM hard-drive, the type you can get in 1TB sizes for cheap, has a retrieval latency of 8.9 miliseconds. Which means that best-case the 2.5GHz CPU will wait  22,250,000 cycles before it gets the data it needs. Thats... really slow, actually.

The RAM in that 2.5GHz server can be fetched in 10 nanoseconds. Which means that best-case the 2.5GHz CPU will wait only... 25 cycles.

Biiiiiiig differences there! RAM is vastly faster. Which means its also vastly more expensive[1]. Which in turn means that RAM is going to be the more constrained resource.



So we have determined that of the four resource types RAM is the most expensive, always-on resource. Because of that, RAM amount is the biggest driver of cost for cloud-computing providers. It's not CPU. This is why that 64MB RAM VM is so much cheaper per-hour than something with 1.6GB in it, even if they get the same CPU resources.

Because RAM amount used is the cost-center, and a 1.6GB VM is using that 1.6GB of RAM all the time, the cloud providers charge by hour of run-time. And this is fair. Now you know.



[1]: How much more expensive? A 1TB disk can be had for $90. 1 TB of RAM requires a special machine (higher end servers), and will run you a bit under $12,000 at today's prices.
This conversation seems to come in waves, but there is a growing sentiment in the community that ServerFault is beginning to drown in the clueless. This is of concern for several reasons:

  • When sysadmins feel besieged they get mean.
  • When long-time answerers feel that they're being taken advantage of, they leave.
  • When the site looks like a bunch of first-timers asking questions, it is not inviting to the long-time pros we're looking to attract.

All of this is bad for the community. When sysadmins revert to dragon-in-the-datacenter behavior it certainly does drive people off the site; this is intended, but the the effects are more damaging than the egos of the driven-off. We have had long time answerers leave the site specifically because they got tired of answering the same few basic-basic questions over and over again, and then got tired of closing-as-duplicate those answers once a canonical answer was created.

ServerFault is unique among StackExchanges (for released sites; one or two in beta are thinking of going the same route) in being focused on professionals, and not anyone with an interest who can talk intelligently. This is a purposeful restriction in target-market, and as it turns out it's really hard to do it right when also going for open-admission.

The problem is this:

surly-target-market.png

There are a lot more unprofessional sysadmins out there than professional ones. A LOT more. This is a classic long-tail problem. Previous attempts at building gathering-houses for sysadmins to share knowledge have gone the closed-admission route as a way of filtering the population, but none of them ever gained the ubiquity of StackOverflow. Or came even close.

Part of it is simple discoverability; if people don't know about it, they won't know that joining is a good idea.

Part of it is barriers; by giving a hurdle to get over, the lazyAdmin won't bother going over that hurdle unless they know darned good and well that it's worth the effort.

The StackOverflow model improves both of those by leaps and bounds. But what's to prevent it from turning into an IRC-style cesspit of snarkasm, mockery and belittling? That's the hard part. So far ServerFault has done pretty good in keeping it from falling into the IRC pitfalls, but it's continual work. 

Another problem we're facing is that our discoverability has improved over the last year:

But our question and answer rates are flat or weakly growing. Being more discoverable means that a larger portion of people interested in "server stuff" can find the site. However, due to the proportions of those people being unprofessional (by the SF community consensus definition of it) being rather small, it means that we should be seeing an overall increase in question-rates.

Since question growth is flatish, where does the perception of being flooded with the unclued come from? Worryingly, it could come from more clued users leaving us and being replaced by the untrained hoards. This is causing frustration, whatever the cause.

We've made it this far by being good about not using snark and mockery to correct those who stray from the best-practices path. If we want ServerFault to continue growing in the good way, we need to keep that up.

Genesis of Mr Grouch

| 2 Comments
It begins innocuously, a trouble-ticket from one of your users:

Printer praccounting02 no longer prints anything.
After half an hour the helpdesk has added a few notes to the ticket:

Has been happening for about a week. Seemed to happen to a few people at a time. Now no one can print.
Now it's up to you.

In the course of your investigation you discover the following:

  • The office received an upgrade to MS Office 2010 in stages over the last three weeks.
  • That upgrade project completed two days ago.
  • The user who reported it was on vacation until yesterday.
  • The department has three of the same kind of printer, but only one seems to be experiencing issues.
  • The print-server shows received and printed jobs for all three, but that one provably is not ironing paper despite what the print-logs show.

That pretty clearly points a finger at the Office upgrade as being somehow involved, but the other two printers not being affected are, shall we say, confounding variables. What's up with that one printer?

You dig deeper:

  • That one printer has some upgrades the other two don't:
    • It has a 4th paper-tray that can hold 2000 sheets.
    • It has had a memory upgrade
  • The drivers to all three are the same, since that model of printer gets the same driver enterprise-wide.
  • That printer seems to only print whacking huge Excel spreadsheets.
  • The other two printers have the more normal mix of email, Word, and printed off web-sites but very few Excel jobs.

Ahah! Whatever is going on is related to very big Excel jobs. You relay this to the helpdesk and they're able to reproduce it with the printer by them (same model, by chance). Big Excel files, usually more than 10 pages, with at least one Hidden column. Jobs hit the printer and nothing happens, but are recorded as 'printed' by the print-server. At least it beats vomiting paper...

Suspecting that Office 2010 may have added something printer-drivers don't like, you hunt up an upgraded driver for that particular printer model. The changelog for the driver doesn't make much sense, but it is newer than what's in use by a good 18 months. You give it to the helpdesk, and it fixes their problem neatly; jobs enter and get printed as you'd expect. Horoay! A solution.

Since this is the first in the field driver install, the helpdesk invites you out to make sure they're doing it right. No problem, we like consistency around here. So you go with them to the affected office. Doing the trouble-reporter first since it'll give them resolution.

You two talk up and announce that you need to upgrade a printer-driver to make that printer work again.

They fail to purchase.

What? No! I asked you to fix the ****** printer not my *** ****** computer. Fix that!
It takes a while, but eventually the two of you convince this departmental dragon that it wasn't his fault the printer was broken, it was the standard driver, that everyone was experiencing it, and it was a failure of the Office 2010 Deployment Project to catch the very-large-Excel problem back in testing. He's still grumbling, but at least he lets you into his computer.

And it works the first time.

He grumbles a thankyou, and you two move on to the next station.

Just another day in the office for the Helpdesk tech; they are front-line customer-service professionals after all, but not so much for the SA who got dragged into it. No one likes to have their good work thrown in their face and rejected for provably wrong reasons. It gets to you after a while. And if you're like a LOT of sysadmins out there you cultivate a fine sense of sarcasm, because so very few people out there are able to appreciate the finely reasoned research that led you to this particular conclusion.

Eventually the helpdesk will stop asking you out into the field (all that sarcasm makes their job harder, something they really don't need). Which is all to the good! Fewer end users to deal with. Except for the few who get your direct phone number or email, but sarcasm is good for that too, so that goes down to a trickle.

This is how sysadmins earn their reputation for being unapproachable grumpasaruses.

This is a defense mechanism, pure and simple.

However, other people are key parts of our jobs no matter how much some of us would wish otherwise. There is almost guaranteed to be a boss of some kind somewhere. Unless you do all of your own part-sourcing and replacement, vendors are going to touch some parts of your infrastructure. Peers in other departments. Other SA's in IRC while you troubeshoot a problem. Or end users noticing a problem not covered by the monitoring infrastructure and passing the word on.

That last bit is perhaps the most important. While the human layer of the monitoring environment is the most error-prone, it can notice errors that the rest of the automation doesn't. So it pays to be sure such error reports get to you. Which means being at least somewhat approachable.