Yes, that happens

| 3 Comments

We all know it can happen, a BIOS update of some kind bricks whatever just got flashed, but it's one of those things you hope happens to other people first so you know not to go there. It happened to me recently, which got me thinking about continuous deployment from a hardware POV. Hardware being what it is, hard, you can't iterate and roll-back the way you can do software. There is no such thing as Vagrant for Embedded Systems that I've found!

The problem of, "when do I update the firmware for my server," is one that faces anyone with a physical infrastructure. There isn't really a globally accepted best-practice for this one, though the closest I can find is:

If the vendor lists the update as critical, apply it.
If you're experiencing one of the problems listed in the fixes, apply it.
If vendor tech-support tells you to apply it, apply it.
Otherwise, don't apply it.

But only apply it to a test device first to verify it actually fixes the problem. Then roll it out.

Doing so pro-actively is kind of risky, and only really useful in repurposing scenarios. Also, this 'best practice' assumes you have identical hardware to actually test with. Which a lot of us don't, and often can't due to slight differences between servers of the same model.

So. For those of us who are working on infrastructures either small enough to not be able to afford test hardware, or diverse enough that there is no such thing as a common class of machine, what are we to do?

Hope, mostly, and trust in your vendor support contracts to ship you new hardware in case you get a brick.

Or, trust in your redundancies and treat new-firmware-updates like a lost-server outage. If you get a brick, you're still within your failure tolerance and know not to go there for the rest of 'em. This is the approach we ended up taking, and it worked. We were running without our scale-test environment for a few days but production was unaffected until we could unbrick the affected machines.

In our case I suspect we had a v1.0 hardware revision, and the newest firmware was only backwards compatible for v1.0a and newer or something. I don't have proof of this, but that's what it feels like. Of course, this eventuality was not mentioned in the release-notes anywhere. Thus, testing.

Announcement.

Like, really early. And this is a good thing! In previous years it opened just about the time my employers stopped accepting travel requests, since they both wanted several months lead time to get the best deals on flights. This is a good thing they're finally doing!

Not that I need it this year since LISA is in DC this time around. And on the Metro, so I won't even need a hotel.

But still, a trend I encourage!

Intern season comes soon

| No Comments

And we're big fans of them. Being a small company, our interns actually do interesting stuff.

This summer I'd like to have a more Ops-oriented dev-Intern, since we need help with things like:

  • Deployment automation
    • Software package install automation (not everything can be puppeted, alas)
    • OS configuration optimization
  • Automation of statistics gathering
  • Automation of new-machine staging
  • Building pretty interfaces to the automation for a diverse audience (all of our Engineers are in the on-call schedule, you see)
  • Building pretty interfaces for status tracking of how things are running (application-specific things, not just OS/HW level things)

In short, we need a DevOps intern. Or if you're old-school, we need a Systems Programmer. But this also involves more than just automation engineering! Oh, yes!

  • Working through the software-package dependency tree for upgrading the Linux distro-version underlaying large parts of our infrastructure!
    • Package names change, which affects the configuration management setups!
    • Libraries perform differently!
    • Libraries go away completely, which mean we need to roll them and deploy them ourselves!

And that's not all! Such an intern is guaranteed to be involved in our August Major Maintenance window, which is very likely to include a bunch of hardware things! We don't know what that'll be yet, as that is determined by how fast we grow in the next two months, but it's likely to involve hardware, and it's likely to involve this Intern in the process of getting it integrated! It'll be a long weekend, but that's how these things work. Experience!

The Write The Docs conference is running right now, and a session just got done about search-oriented documentation (the slide-deck) and it hit all kinds of bells for me. I'm a technical user of a very wide variety of documentation, and I work in an industry that coined the term RTFM. We are consumers of documentation in all of its various forms:

  • Straight up manuals on paper, sitting on a shelf, that arrived with the product (back when manuals still shipped with product)
  • Offline manuals in CD-ROM form (in that time between when physical manuals stopped shipping and everyone had an Internet connection).
  • Online manuals in HTML form.
  • Support databases listing targeted resolutions of problems and technical notes hilighting obscure bits of config-trivia.
  • Random Internet forums with posts from fellow lost people having the same problem.
  • Random wikis.
  • Vendor-specific product forums attempting to provide a 'Community' experience to support (and take some load off of their support people).
  • Internal ticketing databases.
  • Internal and external bugtrackers.

How do we find all that crap?

Google.

This is why none of us have cracked open an offline manual (or put in a CD-ROM) in years. Our portal into documentation is the search-engine, either the majors or the one built into our internal tools (where the majors can't find it). But search is our index.

For those of us who write end-user visible documentation, keeping this in mind is paramount. Enhancing searchability means enhancing metadata like tags, so tag your doc with the words users actually search for not their actual names. As a consumer of documentation I can only cheer this kind of effort to improve discoverability.

In defense of monoculture

| 1 Comment

Sarcasm setting: Subtle (some of you will miss this disclaimer)

In recent months I've noticed a decided trend towards considering WebKit to be the Internet Default Browser. This is nothing but good, as that is most definitely a driver of industry innovation. The decision by Opera to drop Presto and adopt WebKit was one I cheered; for years Opera has been pissed on by web developers as 'weird', so hopefully this will cause more sites to put that browser-badge on their Supported Browser shelf.

Such monocultures are actually good for the web, as they provide a driver for innovation. Only having to build web-sites to a single quirk-standard makes it a lot easier to create well-working web-sites, and that drives growth. More startups can get out the door faster in order make more money, and as we all know it's startups who disrupt industries. The fact that so many of these startups are using Chrome (and by extension WebKit) as their standard browser is a clear indication that we're heading towards another era of monoculture centered growth.

WebKit conquered the old standard for two big, big reasons:

  1. It works on mobile. Mobile is where all the growth is these days, so working on mobile is a major, major thing. And Microsoft wasn't the first mover in this space, Apple (WebKit) was. Firefox (Gecko) doesn't work on mobile (until very, very recently), so it wasn't going to do it either.
  2. It also works on Macs. So many webdevs are doing their dev-work on Apple hardware these days that "works on a Mac" is a key driver for growth. IE doesn't. WebKit does.

The modern web is a decidedly heterogeneous place, so the ability to run on anything is very key. IE can't, Firefox only recently got that ability, but WebKit has already been doing that for years. A WebKit monoculture is in the cards.

At least until Google decided to fork it. We don't need more fragmentation in the web rendering spaces, we need less. This saddens me.

The push for IPv6

| 2 Comments

This is inspired from last night's LOPSA DC meeting. The topic was IPv6 and we had a round-table.

One of the big questions brought up was, "What's making me go IPv6?"

The stock answer to that is, "IPv4 addresses are running out, we'll have to learn at some point or be left behind."

That's all well and good, but for us? Most of us are working in, for, or with the US Government, an entity that is not going to be experiencing v4 address scarcity any time soon. What is going to push us to go v6 (other than the already existing mandate to have support that is)?

In my opinion, it'll come from the edges. IPv6 is a natural choice for rapidly expanding networks such as wireless networks, and extremely large networks like Comcast/Verizon run for their kit. These are two areas where sysadmins in general don't deal with much at all (VPN and mobile-access being the two major exceptions).

If your phone has an IPv6 address and accesses the IPv4 internet through a carrier-grade NAT device, you may never notice. Joe Average User is going to be even less likely to notice so long as that widget just works. Once v6 is in the hands of the "I don't care how it works so long as it works" masses, it'll start becoming our problem.

Once having a native v6 site means slightly better perceived mobile performance (those DNS lookups do cause a bit of latency you know), you can guarantee that hungry startups are going to start pushing v6 from launch. Once that ecosystem develops it'll start dragging the entrenched legacy stuff (the, er, government) along with it.  Some agency sites are very sensitive to performance perception and will adapt early. Others only put their data online because they were told to and will only move when the pain gets to be too much.

Business-to-business links (or those between .gov agencies, and their .com suppliers) will likely stay v4 for a very, very long time. Those will also be subject to pain-based mitigation strategies.

But the emergence of v6 on mobile will likely push a lot of us to get v6 to at least our edges. Internal use may be long time coming, but it'll show up at all because of the need to connect with others.

Because...

ComplexityIsTheMindKiller.png

This is why you see outage notices like:

Things broke. We fixed it. Carry on.

And security bulletins like:

This patch fixes a remote access vulnerability in Windows.

Which tends to inflame our detail-oriented sysadminny sensibilities. Our whole world is complexity, we like to see it. Lets us know that things are normal.

Well, that'll be fun to watch

| 1 Comment

With Google shutting down Google Reader, how about 87% of my subscribers read my blog, it's going to be fun to watch how the reader percentages shift over the next four months. Back when I started tracking what's consuming the feed Google Reader wasn't around, Bloglines was the over-50% leader. That's since changed.

As of right now, the #2 reader is 'unknown' at 3.4%. Mozilla's built-in reader is in the #3 spot at 1.9%.

In four months time, when Google shuts off Google Reader I'm sure those numbers will be radically different. I'll probably lose a very large number of subscribers from simple inertia. Hey, that happens. I'm interested to see how the feed-reading market solidifies in a post-GReader world.

Interestingly, they're not shutting down Feed Burner. Considering that the vast majority of the readers hitting Feed Burner are, well, Google Reader, I wouldn't be surprised if that also goes in the next round of Spring Cleaning.

Incubating culture

| No Comments

This article drifted across my social-sphere in the last couple days:

http://blog.prettylittlestatemachine.com/blog/2013/02/20/what-your-culture-really-says/

It's a critique of startup-culture, especially agile culture. But what do we all mean by culture?

Culture: It's the unwritten rules and expectations governing interpersonal relationships.


Culture is the expectation that the Owner shall never be talked to except through a manager (jumping the line is really frowned on).

Culture tells you never to leave for home until your manager has left (don't get to the office at 7am, leave at 7pm).

Culture is what keeps the newest-hire from taking more than a single day vacation near Christmas/New Years (we were all in that barrel, now it's your turn).

Culture is the expectation that you're not really working unless you are seen to be in the office with your butt in a chair (don't be the first one in the office, never leave first).

Culture is what forces you to go to after-work outings with your co-workers when it is the owner organizing it, no matter how 'optional' they say it is. (when the boss says 'optional' what he really means is he'll be very disappointed in you but won't fire you).

Culture is what causes all of your coworkers ask you where your job-interview is when you show up to work in a button-up and tie (the only reason a dev wears a tie is to get a job).

Culture is not having a beer fridge in the office. Culture is being thought Not A Team Player if you don't drink.

Culture is not having ping-pong tournaments. Culture is being unable to get in the In Crowd if you have all the hand/eye coordination of a gerbil.

Having lived in startup-land for a while now, rubbed shoulders with the residents, chatted about work/life balance around the free meals at conferences, and all in all been more aware of people talking about startup culture, this article makes a lot of good points.

One friend of mine called out a specific line in this article, the We don't have managers, and the company is managed without a hierarchy one. That's one I hadn't heard of before, but apparently it's a thing. My job isn't like that. We have managers, they... manage. Like they should. Go, team!

A couple of the others are great ideas for smaller companies but completely fail to scale to larger sizes. I'm thinking of, meetings are evil, we have as few of them as possible. Culturally speaking, the failure-mode of no meetings is siloization. If you're small enough everyone is in the same silo, it works. If you're not... problems. This is line is pushed in job-adverts to attract creators, but their managers most definitely have meetings. And sometimes those meetings are sneaky, they're one-on-ones at your desk.

The we don't have a vacation policy thing is spot on. Without that little tickle of, "You have 12 days of vacation left, you're going to lose them if you don't use them by the end of the year," you don't actually take them. At both prior jobs, both with vacation carry-over limits, once most people got enough time on the job to actually hit those limits they actually did hit them. Especially at WWU where I had 4 or 5 weeks of vacation; one co-worker took Fridays off for two months as a way to burn his back. If left to our own devices we'd probably take 2-3 weeks a year.

My current employer is one of the "don't have a vacation policy" places, and people do not take as much vacation as they would if it was being accounted. Due to the gobs of it I got at WWU I'm already used to just taking time off when I need it, but I am missing the 'vacation at home for a week' I ended up taking once in a while to make the books balance.

The we have a team of people who are responsible for organizing frequent employee social events item is not one we have (1: not VC funded, 2: not big enough yet) but I know people who work at such places. And yes, the person in charge of this is a woman, or if it's a team it's mostly women on it. The critique on diversity is very much valid.

What I did this last Tuesday

| No Comments

Last Tuesday was a Grand Unified Meetup of the DC-area DevOps-like groups. Lopsa-DC was there, as was Crabby Admins, DevOpsDC and a bunch of others I hadn't heard of before. It was great. And now the YouTube videos are posted!

The first presentation on Ops School:

Ops School is something I first heard about at this last LISA conference, and even spoke with the guys for a while. This is a project that sprang out of Etsy, but is worth it for the rest of us. While there are some sysadminly degree programs out there, the vasty majority of us learned it on-the-job as it were. Ops School is aiming to build an actual curriculum for people to try and get basically fluent or to brush up on areas we're not all that familiar with.

I already have ideas on how I can contribute.

The second talk, on how to have a career in what it is that we do:

Occasionally profane, but a good overview of how we have jobs in this thing. And a critique of the whole DevOps idea in the first place, which is a very nice contrast to the gestalt view I've picked up elsewhere.

DevOps is all about teaching Ops to Dev.

Theo's point is that Ops is something that everyone needs to get fluent in. That's how a business, especially a web-oriented one, survives. It means that Dev learns to be cautious about making changes to live systems. It means the sysadmins learn that certain kinds of outage (the partial, some-features-are-disabled kind) can be more damaging to a company than a full-stop outage.

And Ops is not the sole purview of the sysadmin-staff, the Sales staff are extremely ops-oriented. It means customer care is a major factor in deciding which changes are worth the risk, and it's usually not the Sysadmin staff who know best what the customers want.

Anyway, watch the video.

Other Blogs

My Other Stuff

Monthly Archives