Skip to content

Build vs buy in IT operations comes down to one question most teams skip

Most organizations facing a build versus buy decision in IT operations start by counting developers. That is the wrong place to start, and it leads to predictable failures in both directions: teams that buy when they could have built a real advantage, and teams that build when they had no business owning the result.

The build versus buy decision should be driven by whether your team has functioning CI/CD discipline, not by how many developers you employ. A three-person team with version control, automated testing, peer review, and a rehearsed rollback path can responsibly own a custom integration. A twenty-person team without those things will turn the same integration into a liability the moment the person who wrote it changes roles.

This piece lays out why the framing matters, what actually changed with the arrival of AI agents, and how the right answer shifts depending on where your operational maturity sits today. The deeper version, broken down tier by tier, lives in our companion guide on the full build versus buy decision process.

The build versus buy question is usually framed wrong

The classic framing treats build and buy as a cost comparison. You estimate the license cost of a commercial product, estimate the engineering cost of building the equivalent, and pick the cheaper number. This framing fails because it prices the wrong thing. The license cost is mostly knowable. The true cost of building is not the initial development - it is the indefinite maintenance, the on-call burden, the security patching, the dependency upgrades, and the institutional knowledge that walks out the door with every departure.

When teams underestimate a build, they almost never underestimate the first version. They underestimate year two and year three. The prototype ships, it works, everyone celebrates, and then the platform it depends on releases a breaking change, the original author has moved to another project, and nobody remaining understands the code well enough to fix it safely. That is not a development failure. It is an operational ownership failure, and it is invisible in any cost comparison that stops at the initial build.

The better framing asks a different question: do we have the operational discipline to own this for its entire life, not just to build it once? That question reframes the whole decision around capability rather than cost, and capability is something you can assess honestly before you commit.

CI/CD discipline is the real capability, not headcount

Continuous integration and continuous delivery are usually discussed as developer productivity tooling. In the build versus buy context they are something more fundamental: they are the evidence that an organization can own software responsibly. The presence of a real CI/CD pipeline tells you the team has version control as a habit rather than an aspiration, that changes are tested before they reach production, that more than one person reviews what ships, and that there is a defined way to roll back when something breaks.

Industry research on software delivery performance has consistently found that these capabilities, not team size, separate organizations that ship reliably from those that do not. Google's DevOps Research and Assessment (DORA) program research on software delivery and operational performance is the most thorough public body of work on this, and its findings are available at dora.dev. The throughline is that delivery discipline is a learnable organizational capability, and that its absence predicts operational pain regardless of how many engineers are on the team.

This matters for build versus buy because every custom thing you build becomes software you have to deliver and operate. If you do not have the discipline to deliver and operate it well, building it is not an asset you are creating - it is a liability you are signing up for. Buying, in that situation, is not the lazy choice. It is the responsible one, because it transfers the delivery and operational burden to a vendor whose entire business is maintaining that discipline.

AI agents changed the math but not the discipline requirement

The arrival of capable AI agents and the Model Context Protocol has genuinely shifted the build versus buy calculus, and it is worth being precise about how. MCP, the open standard introduced by Anthropic for connecting AI assistants to external data and tools, is documented at modelcontextprotocol.io. What it enables is a new pattern: buy the platform that holds your authoritative data, and build lightweight agents on top of it that read and act through a maintained protocol rather than through brittle custom integration code.

That pattern lowers the cost of the build side of the equation for a specific class of work. Building an agent that allocates an IP address, reconciles a device record, or correlates a set of alerts is far cheaper and far less risky when the underlying platform exposes its data through MCP than when you have to scrape a UI or maintain a fragile API integration. We cover this in depth in our guides on source of truth and IPAM in the AI era and on the AIOps build versus buy decision.

What AI agents did not change is the discipline requirement. An agent is still software. It still needs version control, testing, review, and a rollback path. In fact agents raise the stakes, because an agent that acts on bad data or bad logic does so confidently and at machine speed. The organizations that will get the most value from the build-the-agents pattern are precisely the ones that already have CI/CD discipline, because they can build agents responsibly. The organizations without that discipline should be even more cautious now than before, because the failure modes are faster and harder to catch.

Three capability tiers, three different right answers

The honest answer to build versus buy depends on which of three tiers an organization sits in today.

The first tier has no in-house development or CI/CD discipline. These organizations should buy outcomes, not tools. Every custom thing they build becomes an orphaned liability, and the strategic investment should go into managed services and well-supported commercial platforms rather than into a development effort the team cannot sustain. For this tier, building anything beyond vendor-supported configuration is almost always a mistake.

The second tier has emerging capability: some scripting, some automation, but no formal delivery discipline. This is the most common tier and the one where the decision is most often gotten wrong. These teams can build, which means they will build, and without discipline they accumulate technical debt quickly. The right move for this tier is to buy the platforms and institutionalize CI/CD discipline before building anything that matters. The discipline is the prerequisite, not the afterthought.

The third tier has mature platform engineering with real CI/CD discipline already in place. These organizations can build where building is genuinely differentiating and the return on investment lands inside roughly twelve months, while buying the commodity foundation underneath. They have earned the right to build because they have the discipline to maintain what they create.

How we think about this

Here is the IVI take, and it runs against the prevailing direction of the market. The mainstream framing treats build versus buy as a technology and cost decision, and the last few years of tooling have pushed that framing hard. Kubernetes, mature open source, infrastructure as code, and now the Model Context Protocol have all made building look cheaper, more accessible, and more modern than it has ever been. The industry increasingly tells teams that building on this tooling is the capital-efficient, future-proof choice. We understand why that view is attractive, and we want to be fair to it: the tooling genuinely is better than it was five years ago, and the foundation genuinely is cheaper to build on now.

This is where we part company with that framing. Build versus buy is not, at its core, a technology or cost decision at all. It is an operational-maturity decision, and the gating variable is delivery discipline, not architecture and not headcount. An organization without CI/CD discipline that builds on the best modern tooling still ends up with a liability, because the tooling lowers the cost of building and does nothing for the cost of owning. The architecture-first framing the industry sells gets the decision backwards: it optimizes the part that is cheap - the initial build - and ignores the part that is expensive - the multi-year ownership.

The pattern we keep coming back to instead, across observability, AIOps, source of truth, and virtualization platform decisions, is buy the foundation and build the edge. Buy the platform that represents years of engineering you cannot cheaply replicate. Build the thin, high-value layer on top that is specific to your environment and genuinely differentiating, but only once the discipline to own that layer is in place. This is the same logic that runs through our work on the VMware Cloud Foundation decision, where the right answer for most customers is not to rebuild the stack but to right-size what they buy.

We should be clear about when the mainstream view is right, because it sometimes is. For organizations that genuinely have mature platform engineering and rehearsed delivery discipline already in place, the build-is-cheaper-now framing holds, and they should build at the differentiating edge where the modern tooling makes the work more rewarding than ever. The industry framing is not wrong for them. It is just dangerously incomplete for everyone who has not yet earned the discipline that the framing silently assumes, which is most organizations most of the time.

We have watched this play out in both directions. A mid-market healthcare organization with a small but disciplined infrastructure team built a set of managed agents on top of a commercial source-of-truth platform and got real operational leverage from them, because they had version control and testing in place before they wrote a line of agent code. A larger financial services firm with triple the headcount but no delivery discipline built a custom observability correlation layer that became unmaintainable within eighteen months, and ended up buying the commercial product they had originally rejected, after absorbing the cost of the failed build. The difference between those two outcomes was not talent or budget. It was discipline.

If you are weighing a build versus buy decision right now, start by assessing your real delivery capability honestly, before you price anything. The work we do in our architecture and advisory practice begins there for exactly this reason, and our network automation services are built around installing the discipline first. If you want to talk it through, start a conversation with us.

FAQ

Should team size drive a build versus buy decision?

No. Team size is a poor predictor of whether building will succeed. The better predictor is whether the team has functioning delivery discipline: version control, automated testing, peer review, and a rehearsed rollback path. A small disciplined team can own a build responsibly; a large undisciplined team usually cannot, because the cost of a build is dominated by years of maintenance rather than the initial development.

What changed about build versus buy now that AI agents exist?

AI agents and the Model Context Protocol made one pattern much cheaper: buying the platform that holds your data and building lightweight agents on top of it that act through a maintained protocol rather than custom integration code. This lowers the cost of certain builds, but it does not remove the requirement for delivery discipline. An agent is still software that needs to be tested, reviewed, and safely rolled back.

How do we know if we have enough CI/CD discipline to build?

A practical test: can your team describe, without improvising, how a change reaches production, who reviews it, what tests run automatically, and how you would roll it back at two in the morning? If the answers are concrete and rehearsed, you have the discipline to build selectively. If the answers are aspirational, install the discipline before you build anything that matters.

Is buying always the safer choice?

Buying is safer when you lack the discipline to own a build, but it is not free of risk. Commercial platforms create vendor dependency, licensing exposure, and the risk of paying for capability you do not use. The point is not that buying always wins. It is that the build side of the comparison is almost always underpriced, because it omits the multi-year operational cost, and that gap is what trips up most decisions.

What is the twelve-month rule for building?

For organizations with mature delivery discipline, a useful heuristic is to build only where the return on investment lands inside roughly twelve months and where the result is genuinely differentiating. Anything with a longer horizon is usually better bought, because the market will likely produce a commercial option before the build pays back, and because long-horizon builds accumulate maintenance burden faster than they deliver value.