Every time the topic of documentation comes up at work, at multiple workplaces, someone always says a variant of the following:
What we really need is markdown in a git repository. We get version control, there is a lot of tooling to make markdown work good in git, it's great
And every time I have to grit my teeth and hope I don't cause dental damage. My core complaint is that internal documentation has fundamentally different objectives than open source software documentation repositories, and pretending they're the same problem domain means we'll be re-having the documentation discussion in 18 to 24 months.
The examples of OSS projects using markdown or asciidoc as their documentation repository are many, and it works pretty well. Markdown and asciidoc are markup, which allows compilers to turn the marked up doc into rendered sites. This makes accepting contributions from the community much easier, because it follows the same merge-request workflow as code. As most OSS projects are chronically under-staffed, anything that allows reuse of process is a win. Also, markdown and asciidoc are relatively simple formats so you don't need expensive software like Adobe InDesign to make them.
OSS project docs are focused on several jobs to be done, and questions by readers:
- How to install the thing
- How to configure the thing
- How to upgrade the thing
- How to build various workflows the thing allows you to do
- Troubleshooting tips for the thing
- How often to expect releases of the thing
- How to integrate with other things, if this thing allows integration
- How to use the thing's API
- Where to find the thing's SDK for various languages
Corporate internal documentation repositories need to do all of the above, but generally for a much wider range of things and services. Cool, that's what standards are for. But "markdown in a git repo" goes a bit off the rails when you look at all the other types of documentation internal docs often cover:
- On-call rotation standards and contacts
- Pager-playbooks for the page-out alarms
- Incident Management program procedures and definitions
- Post incident review documents for each incident
- Service maturity standards for being allowed in prod
- Ownership documentation linking services to individual teams (updated or re-created after each reorg)
- Decision docs for implementing features or updating process
- Roadmap documentation going out three years (new docs generated quarterly)
- How to set up your development environment
- How to access prod, and who is allowed to access prod
- Protocols for accessing the datacenter hardware or cloud config consoles
- The entire software development lifecycle (SDLC) including how CI works, what tests are required when, how tests are selected for inclusion, which linters are included, and when it's allowed to ignore all that because of an emergency
And so on. The sneaky part here is that the OSS projects have many of the above as well, but they're kept in things like Google Docs, Etherpads, Wikis, Twillio, Canvases in Slack, many things that are definitely not involving the merge request workflow into git. All of these extra internal documentation repository jobs to be done greatly complicate what solutions count as viable, in large part because this huge list is actually trying to 'simplify' multiple documentation styles into a single monolithic document repository. What styles are these? Well:
- Product documentation, describing how to install, configure, and maintain the product.
- Process documentation, describing the ways various people-driven procedures are done, such as the incident management process and the number of review meetings that need to be held before a feature is released to production.
- Decision documentation, which evolves over time as people work through what an ultimate decision will look like, changing their minds along the way. Post-incident review docs are of this type.
- Responder runbooks, used by people responding to incidents to use pre-defined (and risk vetted) procedures as part of incident response.
- Maintenance runbooks, used by operators of the system to do various things, which is often based on a combination of product and process documentation, to create a grand unified procedure in one document.
All of these documentation styles need somewhat different document lifecycles, which in turn drives need to support workflows. A document lifecycle ensures that documentation is valid, up to date, and old information is removed. Sometimes documentation is a key part of compliance with regulation or industry standard-setting bodies, which adds review steps.
- Product documentation probably needs multi-step reviews to ensure updates are valid. Confluence is terrible for this, git is less bad. Product docs also need regular review for freshness, and pruning of no longer relevant docs.
- Process documentation less obviously needs multi-step review. Some will, some won't. Freshness is key, since process documentation describes the how of operating the system or accessing human processes, and old docs pollute search results.
- Decision documentation definitely does not need multi-step review, it needs to be updated by anyone involved, and may be surplus to requirements once the feature is built. In fact, these docs need to allow collaborative editing, like Etherpad or Google Docs, making them fundamentally unsuited for a git-based workflow. However, having such docs still around is occasionally useful later in time when someone tries to figure out "who thought this was a good idea, and why didn't they consider this obvious failure case?"
- Responder runbooks also can have compliance interactions; if so, these need multi-step review for risk management decisions. If not, they're probably a per-team free for all. As is the way of responder runbooks, rare errors are nigh impossible to check for freshness so these are the least likely to be verifiably up to date.
- Maintenance runbooks run the gamut from per team free for all to onerous multi-step review process, all depending on the risks of doing the thing and the nature of the business.
Ideally, the high lifecycle docs like product and process documentation would be in one system, with the minimal lifecycle docs like decision review and responder runbooks in another system entirely. This would allow each system to cater to the needs of the styles within, and solve more of the business' problems. I would like a two-system solution very much.
Except.
People have spent the last 25 years being trained that how you find documentation is:
- Look in the obvious place. If you don't find it....
- Search google. If that doesn't work, retry your terms. If after three tries you still haven't found it....
- Complain on social media.
A two doc-system solution is not well tolerated, and people will build a "universal search" engine to search both the high and low process repositories. Also, two doc systems seems like a lot of overhead. And how do you make sure the right docs go in the right system? Why not use one doc system that's sort of okay at both jobs and save money? 18 to 24 months later, discontent at how bad the "sort of okay" solution is rises and people advocate to moving to a new thing, and suggest markdown in a git repo.