January 2020 Archives

Lost history: service pack 1

I learned recently that one of the absolute maxims of Systems Administration has someone gotten muddled in this SaaSy world we now live in. I refer to, of course:

Never deploy the 0 release, always wait for the first service-pack.

There are people working on Operations-like roles, and release-engineering roles, who haven't lived this. You don't see it as much in a micro-waterfall environment where you're doing full-ritual releases every 2/3/4/5/6 weeks, or if you're only shipping software to your own servers. But people forget that there are still companies shipping actual-software for installation on customer hardware.

In the interests of history, I give you the social factors for why the dot-zero release, or the P0 release, or the hotfix 0 release is as buggy as it is. Especially if it's a major version increment.

In this hypothetical company, major releases happen every two years. Point releases happen quarterly, and hotfix-releases happen once a month (or more often if needs require). This company sells to other companies running RedHat Enterprise Server or SuSE Linux Enterprise, not the latest LTS from Ubuntu, and certainly not a 'rolling release'. These target-companies expect a year's warning before dealing with a major upgrade with changed UI and possibly changed behavior.

12 months from release of version +1

Engineering has been working on this for the last year already, in-between hotfixing the old code-branch. Feature-set is mostly finalized by Product. Sales starts running some features past customers (under NDA of course) to see how they like it.

6 months from release of version +1

Feature-set is locked in by Product, Marketing starts building campaigns.

Engineering continues to work, certain features are on track and testing well. Others, not so much.

Sales or Customer Success starts having Roadmap meetings with customers, especially customers that have either shown interest in the new features, or flatly demanded the features as a condition of not churning.

3 months from release of version +1

Engineering takes a hard look at progress, and is concerned. Most of it is there, but a few of the features demanded by potential-churn customers aren't on track because they were more complex than expected.

Marketing is already submitting advertising campaigns to print-media, and is deep in development of web advertising.

Sales/Customer Success is giving monthly updates to those churn-risks to keep them reassured that we have their needs in mind, and it will be fixed real soon. Promise.

Engineering gets an extra sprint to finish the release, so the release is pushed N weeks. This is the last reprieve.

2 months from release of version +1

Engineering is pretty sure that at least two of those must-have features won't be ready by the scheduled release date, but Marketing is already queued up, and those revenue-risk customers are quite keen to get their fixes. After high-level discussion with product, polish work on the other features in the release are put on the back-burner while more people are put into dealing with the must-have features.

1 month from release of version +1

Disaster strikes: major engineering casualty in the shipping version drew four whole days of Engineering time away from developing the new version. An out-of-schedule hotfix is released. Engineering asks for a delay, or to bump one of the must-have features to a point-release instead. Marketing, who has already built an entire campaign around that one feature, says they're committed to that feature already and we don't lie to customers. Engineering toys with the possibility of going into crunch-time.

The Managed Hosting division -- we install and maintain this so you don't have to -- who has been running the new version for certain 'beta' customers for a few months now, comes to Engineering to say the installer is an automation disaster and we need to fix it before we go live.

2 weeks from release of version +1

Work on a revised installer for the version starts.

Engineering is now very sure that at least two features are not production-ready, and says so.There is a lot of arguing, but Executive says ship it anyway. Engineering focuses work on getting the feature working specifically for the use-case of the highest value customer calling for it. Meanwhile all the other features are missing a fair amount of polish. They work, but have some sharp edges. Documentation-fixes are developed.

Release day

Boom. Ring the gong, send the triumphal email, eat the cake, this thing is out.

Engineering knows they shipped crap, gets working on those last features, aiming to make them fully functional for the .1 release in a quarter. Meanwhile, polish on the rest of the features gets scheduled in for the hotfixes.

2 weeks after release

Customer Success works with those high value customers, and hears back that the features are there, true, but they're not exactly... functional. The churn risk has been pushed a quarter. Gets on Engineering to prioritize work on those features. Which engineering already was doing, because professional pride is a real thing.

First hotfix release, 1 month after release-day

Engineering ships a lot of polish, including a more functional installer. Support breathes a sigh of relief, because they now have a fix for a lot of annoyance-tickets.

Second hotfix release, 2 months after release-day

Most of the features are now working pretty well, but the gnarly ones still have rough edges. Shipping full fixes for those is postponed to the .1 release due to scope of change.

Hotfix 3 release for the .0 version, .1 release, 3 months after release-day

Engineering ships those hard features in a way that actually works. Support cheers. Customer Success brings this to their cranky customers as proof we have their values at heart.

Social-factors the whole way. If Engineering was the sole driver of the release day, all the expectation-setting done by marketing would be blown out of the water and the company would get a reputation for being unreliable. If Marketing was the sole driver of the release day the quality of the release would be absolute shit. Most companies strike a balance between these two extremes, but it does mean the .1 release is the one with the finishing touches.

in a SaaS world, this can apply to major blockbuster features. Perhaps supporting more than one team on a billing-code is really hard, so you take a year to refactor so many things. Being a frequently asked for feature, this gets sent out to high value customers (who need multi-team billing) as a coming feature to keep them from churning to another SaaS provider. Engineering gets close, but not quite. Maybe that one customer agrees to be the beta-test, so they get the feature-flag turned on. Or maybe feature-flags aren't a thing, so this feature gets pushed before all the knobs are installed in order to stop churn, and Support ends up untangling a lot of billing problems that first billing-cycle.

Keep these factors in mind for anything that is expected to take a long time to develop.