October 2022 Archives

When should you solve a problem by building a solution, and when should you buy a solution? This is an age old problem in the software industry, with no wide consensus beyond a simple aphorism.

Build your core competencies, buy everything else. This allows you to focus your creative efforts on what you do best.

But there is a world of nuance in this seemingly simple statement. At the far end of one side of the argument you have Google: build everything, because no one else operates at our scale. At the far other end you have new startups: make other people do everything except our core business logic, because that gives us the speed to establish ourself in the market.

Lets take a look at the radically different incentives faced by Google and a generic new startup:

What resources do I have to build a new solution?

Google: We're one of the biggest software engineering companies in the world, so have a lot of engineering talent
Startup: We're five people and a big idea. Every engineer is precious.

What resources do I have to buy a new solution?

Google: So much money, but the third parties who can work at our scale are all direct competitors in different ways.
Startup: I have VC money, and may have Board members from SaaS companies who will give us discounts.

What solutions are available on the market for a company my size?

Google: Major foundations like the Cloud Native Computing Foundation and the Apache Foundation may have platforms from companies near our size we can adapt.
Startup: A whole universe of SaaS providers focused on solving my every need.

If I need to buy something from a new vendor, how hard is it?

Google: All new vendors need to go through a Vendor Security Assessment, and once that's complete, Legal needs to ensure certain standard contract clauses are in place before we can start cutting purchase orders. Maybe two quarters before it can be used at all, then another two quarters to production-scale it.
Startup: The CTO has a company credit-card for a reason. Two days to buy, maybe three sprints to get it into production.

If we need to use a new OSS platform, how hard is it?

Google: Throw 30 engineers at it to solve the scaling problems and ship code upstream. Proof of Concept in one quarter, another two to solve the production-scaling issues.
Startup: Buy our cloud vendor's managed version if at all possible, or something off their marketplace if not; one engineer works to build a POC. Maybe three sprints to production.

You begin to see how different the incentives are between one of the biggest software engineering companies on the planet and the new kids. The new kids are all buy buy buy for a lot of good reasons, where Google is build build build for equally valid reasons.

Looking at this from an order of magnitude point of view, a Startup has single to double-digits worth of engineers (1-99). After a few VC rounds they may get into the triple digits (100-999 engineers). A Google has five to six digits of engineers, a radically different company in so many ways. Each engineer at the startup is represented by at least one whole team at Google, more likely a group of teams, and sometimes an entire department.

But what about the middle ground, the companies with four-figures of engineers? Where do they fit on the build/buy continuum?

It turns out that's really freaking hard. A four-digit-engineers company likely has the same barriers to onboarding new vendors Google had above (have to VSA vendors, Legal needs special things in the contract). At the same time, they have a much shallower engineering pool to draw from to build things instead. This is the most awkward spot to be.

  • (buy) Getting new SaaS vendors onboarded enough you can give them money is a lot of painful work.
  • (build) Your engineering depth is pretty great for your core business work, but prone of availability outages for non-core work. You can build it, but once the motivated engineers move on the built system often decays.

It turns out these 1000 - 9999 engineer companies are where another dynamic emerges to push the decision needle towards build before it's a really great idea to do that: the speed of adding headcount versus the speed of paying a vendor.

Adding headcount is a fully pipelined service in a company this size. They're hiring engineers all the time, doing it in multiple legal jurisdictions, and have internal assets dedicated to making this process as frictionless as possible. Engineers are classified on a scale from 1 to 7 (or 8, or 9) which changes how they can be used. If you need to throw engineers at a problem, and have the open requisitions for it, you can have butts in seats in less than three months. Engineers are (an expensive) commodity.

Contrast this to adding a new vendor. Before any contract can be signed the Security group needs to do a Vendor Security Assessment of the vendor to ensure they're up to your company's standards with regards to safe handling of data and a variety of other things. This VSA process can take months and is prone to backlogs if the VSA processors are themselves clogged with work. Once the VSA is done, negotiating the contract comes next. Many companies try to add clauses to contracts to clarify ambiguities or to push for different treatment, which all means legal back-n-forth that adds time. Privacy teams need to asses the vendor for what types of private data the vendor is allowed to touch, which is its own process. Then the contract gets signed. This can take up to a year end to end. Each vendor is a unique snowflake.

But what if we don't have open reqs for headcount?

Even then, there are incentives to build. In this case, you're looking at moving engineering priorities around to make room for the development work on the new thing. That's the nice thing about headcount: you can use it for more than what you initially bought it for, something a vendor solution rarely provides.

Never forget accounting, though. Costs of employees are often tracked rather differently than costs of suppliers, which definitely does mask the full cost of ownership of a homebrew solution versus a vendor-provided solution. Most costing analysis considers engineering capacity to be "affects future roadmap", where vendor solutions come with direct costs measured in currency and profit/loss statements. If you want to be serious about like-to-like comparisons of a homebrew vs vendor solution, factor in the salary/bonus costs of the people in charge of building and then later maintaining the solutions.

If you're in one of these mid-size, four-figures-of-engineers companies, there are a few things you can do to try and de-skew the incentives at play and get closer to the business focused reasons for a build or a buy decision:

  • Adopt a total-cost-of-ownership model. This model should include the salary/benefits costs associated with the people needed to keep the solution running and maintained. By adding people-costs, you fight the speed skew of how fast you can get new employees versus how long it takes to add a new vendor.
  • Consider including reputational enhancements in cost/benefit analysis. If you're considering adopting an open source product, a company of this size has enough engineering talent to contribute back to the project and the industry. This increases your company's influence in overall industry direction, is a useful source of "engineering culture" blog-posts for your recruiters, increases the visibility of your Developer Relations people, and improves your company's visibility in conference talks.
  • Consider engineering onboarding impacts. If you're using an open source project, or a commonly used SaaS vendor, people are far more likely to walk in the door on day 1 already knowing how to use those systems. If you drank deeply from the Google well and built everything yourself, new people will take far longer to work at their full capacity.
  • Resource build efforts for the long haul. If you decide to build something because existing solutions are a poor fit for your needs, commit to resourcing the development and maintenance efforts over several years. Failing to do this leads to iterated build/rebuild cycles, which is bad for business (though, in some companies they're a good source of Staff+ promotion projects).