August 2013 Archives

On-call is a young-person's game

Or: the older you get, the stronger the imperative is to automate fault handling in your environment.

Last night I got a text at 2:36am. Being the sysadminly on-call type that I am, I leaped out of bed still half asleep. I keep my phones on a cell phone stand that's really loud when something is vibrating on it, so I've been trained to get up when that racket starts. This was my phone going buzz, which meant I missed the work-phone going buzz (I have two for a reason).

It was when the phone was in my hand that I remembered that I'm not on any on-call schedule right now.

R u male or female?

From a west-coast area code.

An interesting question as it happens, but not my infrastructure crying for parental guidance. Or worse, the actual on-call people being over their heads and needing my help. Nope, just some lost soul hoping for love.

Shut off phone, went back to bed.

Whereupon I stayed awake another two hours.

Had this been an actual emergency, work would have been understanding of me coming in late after a late-night call-out. Like that time I dealt with a major infrastructure failure on 3.5 hours of sleep. They're nice that way.

Sleep disruption like that is getting more common as I get older. I give thanks that my infrastructure doesn't usually cry mommy in the wee hours, and if there is crying it's more likely to happen in the evenings than at 3am. That evening call-out may have me working until 3am, but that's better than getting 2 hours of sleep and getting woken up.

With Manning announcing that she'll spend her 35 years of incarceration as a self-assigned woman, the US is getting a brief look at a particularly nasty state of affairs that been there for years. The US Army does not provide any treatment services other than mental health for people with Gender Identity Disorder. Which means Manning will spend her 35 years in a men's prison with no access or hormones, surgery, or even simple hair-removal.

If you dig out the big coverage document that came with your (US-based) health-care plan (assuming you even have one) there is a section you probably never bothered to look at titled EXCEPTIONS. This is the list of things that the plan will NOT cover. This is the list that tells you that, no, they won't cover things like:

  • Going to Aruba for your (otherwise covered) kidney transplant.
  • Costs related to medical studies of pre-market drugs and treatments.
  • Costs relating to anything the FDA labels as 'Experimental'.
  • Purely cosmetic procedures.

There is something else that almost always shows up on this list that really, really gets in the way of treating people like me and Chelsea Manning.

  • Costs related to treatment of Gender Identity Disorder.

Yep, even though the DSM recognizes GID as an actual treatable disorder, and there is even a widely accepted treatment protocol for it, it's explicitly not covered in most plans. It has been this way for decades. By the protocol, treatment of GID requires interaction with three different medical professionals:

  1. Mental health professionals who guide the person through the whole process.
  2. Endocrinologists for the administration of hormones.
  3. Surgeons for any surgeries that may be needed.

My current plan covers only the first step. They'll happily talk me out of it, but won't cover any actual medical interventions. This is the same coverage that Manning will get.

My plan at WWU didn't cover any of it. This is progress of a sort, but only a grudging one. Hormones and Endocrinologist visits are thousands of dollars a year. Surgeries such as double mastectomies will be completely out of pocket and can easily end up close to $10K. Hair removal takes years and multiple treatments (hair grows in cycles, you see).

Employers have to specifically negotiate coverage, which some do. San Francisco made news several years ago when they started covering the full costs. Several large tech companies advertise that they do so as well. It can be done, the effort just has to be taken.

Why is this protocol treated so very differently than anything else?

Dicks, but I'll get to that.

The only other thing that got even close to the exclusions of GID coverage is:

  • Ovariohysterectomies in women under 30

And even that has fallen off in recent years.

Way back in the 1960's when the male-to-female surgery first became generally available, people started doing it. It was very scandalous since men were cutting off their dicks. Unfortunately, some of those transitioners experienced buyers remorse and learned that the surgery is a one way street, and the results aren't as good as the imagination suggests. And some of those remorse sufferers suicided.

Cue the epic pearl-clutching.

Something had to be Done, and Something certainly Was Done. Regulation started to fall down on this elective surgery in a haphazard way. It was in light of this that the Harry Benjamin Standards of Care were created in the 1970's, as a way to provide a widely accepted protocol for treatment. It worked.

However, those suicides haunted the insurance actuaries. Wrongful death suits are really, really expensive. Treating GID can lead to death, therefore, we won't cover it. QED.

That was 40 years ago, though.

One of the big reasons those early transitioners suicided was regret over not being able to have kids. The BSC is big on making clear that sterility is one of the side effects of transition, and is a major component of the mental health requirement being satisfied before going on hormones.

However, we've gotten a lot better at reproductive technology in the last 40 years. Sperm donation is a lot easier than it used to be, and they're viable longer. Egg donation is a thing now. I've known transitioners who've done gamete donation before taking the sterilization steps because of plans for maybe-kids later on.


Numbers are illustrative, not scientific. Do not cite.

40 years ago society was a lot more divided along gender lines and the concept of genderqueer wasn't really a thing. You were either male or you were not (things were also a weensy bit more sexist too), there was no between. It was a much more gender essentialist time. Men transitioning to women were told to always wear skirts, grow their hair out, and learn how to be demure (failure to comply could mean not getting access to hormones). Never mind that gender performance varies considerably even among those who never question their gender, that's a pointless detail; these people need to over-perform in order to pass at all.

Another reason those transitioners suicided was because they were crammed into a role they didn't want to fit into. Perhaps they didn't want to change their job from the one they spent 20 years in to one more in line with Women's Work like teaching, but that's what the therapists demanded... and ended up hating it. And wanting the old life back, just different. But that's impossible so...

Speaking from direct personal experience, having between be an option really takes the stress out of many people who are in the middle of the gender spectrum. Not having to be shoved into a -8 or +8 on the spectum in order to have the gatekeeper open the door for you takes a lot of the stress out of the process.

The assumptions of 40 years ago no longer hold true, and it's time for that needless exclusion to be dropped.

We're getting more people suiciding from untreated GID than we ever did for treated. The continued presence of this health-care exclusion is unexcusable discrimination.

An unexpectedly long evening.

Friday evening that is.

Right before I left for the day I noticed my computer lost network. Seeing as it's directly connected to a switch, this was surprising. When I bipped into the utility room to see what was going on, I found the switch in reboot mode and a fellow employee behind the rack doing perfectly legitimate business things.

Perfectly legitimate business things that over the course of a year or so had managed to work the power cable out of the Ethernet switch. We didn't get one with redundant power supplies, it's just the office network, not critical like our actual revenue systems, so this caused a switch reboot.

It didn't come up. Crap.

Very, very happily I'd already figured out what combination of serial cable and minicom settings I needed to talk to this switch over the console port so I was able to plug in and see WTF was going wrong. 

Error, /cfa0/boot.ini corrupted; please reboot to console and repair.


Happily, I already had software images on that laptop so I proceeded to set my baud rate to 115,200 and uploaded a new one via XModem. Since I was not doing this at 9600 baud, this only took about 10 minutes for an 8.2MB file.

Software image corrupted.

Bugger. Looking around the file-system I saw a strange directory in there:


Huh. A ".Trash-$Username" folder is dropped by Gnome2 on removable media if something is deleted on it. How in blazes did that get onto a factory firmware image? A bit of Googling brought me to a certain HP Customer Advisory. Yep, looks like from 2009 to May of 2010 that directory was indeed baked into switches, and was definitely causing problems.

Since my switch was in the switch has already failed to reboot or failed on software update state, I had to follow that workflow. Running the given lshw command and removing the bad boot.ini file did allow the switch to boot into its normal state. I tried updating to "a switch software version which automatically removes the extraneous files", but no matter how I tried to update the firmware I got

Software image corrupted.

USB, TFTP, even another XModem upload. Same thing, every time, from fresh downloads even. Clearly, this option wasn't going to work for me, so I had to go to the "show tech custom" script they mention.

Frought with peril.

Google is famous for their 20% policy. Where their creative people are allowed 20% of their normal work-time to work on untracked projects. This is where some famous products have come from. It's a way for Google to further capture the wisdom of crowds in coming up with interesting projects to capture more market.

A 20% policy, one day a week, is a very admirable thing and has been copied many places. However, it is not a stable policy over the long-haul without a lot of maintenance work from On High. The normal give and take of business means increasing output from said creative professionals is a sound business move; a 20% policy deliberately introduces inefficiencies into that, and runs counter to how things work.

I looks like Google has given in on the maintenance work. They've introduced Stack Ranking into their management structure, which is exactly the kind of disincentive needed to turn 20% time into 20%-overtime (or 120% time as they seem to be calling it). How Google is using Stack Ranking is to identify the bottom 20% of their creative professionals and get them mentoring or other help to get them to do better. It's employment enrichment, and helps the company as a whole maintain a productive workforce.

However, to those creative professionals it sounds like...

We've noticed you suck. How can we help you suck less?

Which is a really, really great way to get people to stop using their 20% time and focus 100% on the job they're paid for, and seem to be at risk of losing. If they've got any 20% projects going... they'll be sidelined or pushed into after-work territory.

This effect extends to the entire workforce, as the specter of the Grim Disapprover coming for the slowest 20% is a good incentive to not be in that 20%. So perhaps I should give more time to my tracked projects? Yeah...

And now the 20% policy is a 120% time policy.

This is a more metastable state than the old 20% policy was. Google gets 100% out all but their most self-assured employees, and the really motivated are working away on projects that could be big in the future. The only cost is allowing 'personal use' of the Google Infrastructure. Even better, those really-motivated employees are doing this work beyond what they'd normally give to just daily work!

20% time, as originally implemented, is a great way for a company to innovate massively to gain entry into many markets. The Google of 2002 needed that. The Google of 2013 doesn't so much. Backsliding into a more stable state makes sense.

Oh look, recognition

My LOPSA membership is up for renewal so I was dealing with that today. I practically never log in over there, so I missed an addition to their profile page.


They've got more than one option for gender in there! Demographically, I'm likely not the first one to check that box. Likely not even in the first five (jokers not withstanding), though first ten is much more probable. But it's there! And LOPSA is big enough that there are probably more of us than statistical noise! Yay!

On marketing, input into

One of the side-effects of working for a much smaller company is that I'm now solicited for input into marketing things. Copy, images, scripts, videos, etc. I realized not long after starting that I really shouldn't comment on them because I never have anything constructive to say about them. Constructive criticism is good, simple negative criticism is to be avoided unless it's to illuminate something egregiously offensive that could reflect badly on us (which has happened, this is a cross-check that worked).

This is in large part because I've been in a job title that gets pitched by vendors hoping to get my organization to drop five or six figures on a nice little bit of technology (with attached annually renewed maintenance contracts), so I'm cynically familiar with what that looks like. The cynical bastard bonus I get to my save vs. marketing is impressive, I say um, no. Just.... no really, really easily.

A very large part of how I grew into this impressive bonus is having spent 7.5 of the last 10 years working for a governmental organization. This means that my lead time for 5-6 figure purchases that are not part of a master-contract arrangement (tool of choice by civil servants everywhere for avoiding RFPs at all costs) is 8-24 months depending on how big it is.

That same time was also spent in a title that only has "recommend" or "influence" powers over purchases. Worthy of being pitched; but if I put up a fight, also worthy of doing an end-run around. Happily my bosses were even better at saying, "Can't do lunch, it's illegal. No." than I was.

Also, for most of those 7.5 years my organization was working from a place of fiscal austerity. It gets really easy to say no when there isn't any money to spend. And very frustrating when money needs to get spent, but isn't allowed to be spent.

Do you anticipate a need for this in the next [quarter|half year]?

Yes, but any purchase orders won't be cut until #{now.add(:months => 18)}, and that's after we get it pushed through the budget process. Sorry.

This awesome pricing I got you is only good until the end of the month, when our quarter ends. If you can get a PO cut by then, you'll be golden.

I'd love to, but due to fiscal austerity all purchasing decisions of this size are being routed through the state capital and our own Purchasing department tells me that the average lag time is 4 months right now. Thanks for trying.

We can come in and pitch it to the right people, help them understand the problem. We want to help you.

Very thoughtful, but I'm the right person to talk to for this, and the fiscal/political environment right now says we can't afford a solution like this even though we need one. Sorry. Talk to me next year, maybe the Legislature will give us more money by then.

All of this made me the wrong person to talk to about assessing the impact of marketing copy. If we were pitching to cynical bastards like myself, then I could definitely help with that. CB's like me are damned hard to pitch at since we dismiss so much of the presented information as we attempt to distill what is real out of the spit-shine. But we're not pitching to CB's like me, so I don't poke my nose in.

Happily, we have many people who don't have avoid getting sold stuff costing lots of money as part of their work-history so are able to provide this needed feedback. I celebrate them.

This is one trade I'm not a jack of.

The new era of big storage...

| 2 Comments full of flash. And that changes things.

Not a surprise at all to anyone paying attention, but there it is. Flash is changing things in many ways:

  • Hybrid SSD+HD drives are now out there on the market, bringing storage tiering to the consumer space.
  • SSD is now kind of a standard for Laptops, or should be. The cheap option still has HD on it, but... SSD man. Just do it.
  • One SSD can fully saturate a 6Gb SATA or SAS link. This changes things:
    • A channel with 12 of those things is going to seriously under-utilize the individual drives.
    • There is no way a RAID setup (hardware, software, or ZFS) can keep up with parity calculations and still keep the drives performant, so parity RAID of any stripe is a bad choice.
    • A system with a hundred of these things on it, channeled appropriately of course, won't have enough system-bus speed to keep them fed.
  • Large scale enterprise systems are increasingly using a SSD tier for either caching or top-level tiering (not all solutions are created equal).
    • ZFS L2ARC + Log
  • They're now coming in PCIe flavors so you don't even have to bother with a HBA.
    • Don't have to worry about that SAS speed-limit anymore.
    • Do have to worry about how many PCIe slots you've got.

Way back in elder days, when Windows NT was a scrappy newcomer challenging the industry dominant incumbent and and said incumbent was making a mint on selling certifications, I got one of those certifications to be a player in the job market (it actually helped). In the studying for that certification I was exposed to a concept I had never seen before:

The Hierarchical Storage Management System.

NetWare had hooks for it. In short, it does for files what Storage Tiering does for blocks. Pretty easy concept, but required some tricky engineering when the bottom layer of the HSM tree was a tape library(1). All scaled-out (note, not distributed(2)) storage these days is going to end up using some kind of HSM-like system. At they very tippy-top you'll get your SSDs. They may even be in the next layer down as well. Spinning rust (disks) will likely form the tier that used to belong to spooling rust (tape), but they'll still be there.

And that tier? It can RAID5 all it wants. It may be 5 disk sets, but it'll have umpty different R5 sets to stripe across so it's all good. The famous R5 write-penalty won't be a big issue, since this tier is only written to when the higher tier is demoting data. It's not like the HSM systems of yore where data had to be promoted to the top tier before it could even be read, we can read directly from the slow/crappy stuff now!(3)

All flash solutions will exist, and heck, are already on the market. Not the best choice for bulk-storage, which is why they're frequently paired with big deduplication engines, but for things like, say, being the Centralized Storage Array for a large VM (sorry, "private cloud") deployment featuring hundreds/thousands of nearly identical VMs... they pay off.

Spinning disks will stick around the way spooling tape has stuck around. Farther and farther from the primary storage role, but still very much used.

[1]: Yes, these systems really did have a tape drive as part of a random-access storage system. If you needed a file off of tape, you waited. Things were slower back then, OK? And let us not speak of what happened when Google Desktop showed up and tried to index 15 years worth of archival data, and did so on 200 end-user workstations within a month.

[2]: Distributed storage is another animal. The flash presence there is less convincing, but it'll probably happen anyway.

[3]: Remember that bit about Google Desktop? Well... "How did we go from 60% used to 95% used on the home-directory volumes in a week? OUR USERS HAVEN'T BEEN THAT USERY!!!" That's what happened. All those brought-from-archive files now landed on the precious, precious hard-drives. Pain teaches, and we figured out how to access the lower tiers.

Also, I'm on twitter now. Thanks for reading.