26 PRs, Zero Bugs. The Boring Truth.

26 PRs in six months. Zero defects. Zero reverts. Zero hotfixes. Across a SaaS platform serving 50,000+ locations.

People ask what tool I'm using. Wrong question.

It's not the AI. It's not the framework. It's not the test runner. It's the discipline around the code. The same discipline that produced zero-defect output in 2018 produces zero-defect output in 2026 — except now AI is writing most of the keystrokes, which makes the discipline matter more, not less.

Here's what actually does the work.

1. Specify before you generate

The single biggest source of defects in AI-assisted code is fuzzy specs. When the input is vague, the output is plausible-looking nonsense. Plausible-looking nonsense passes review more often than it should because it reads like correct code. It just doesn't behave like correct code.

Before I let an agent generate anything non-trivial, I write the spec down. Not a paragraph. A list of behaviors. What should this do on the happy path? What should it do on the obvious edge cases? What inputs are out of scope? What's the contract with the caller? What's the failure mode?

Half the time, writing the spec reveals that the task as stated doesn't make sense, or has a hidden assumption, or conflicts with something else in the system. That half doesn't reach an agent. It goes back to the ticket. The other half goes to the agent with a tight, testable specification that produces output you can actually evaluate.

The agent is a faster typist than me. It is not a better product manager than me.

2. Test first when the logic is non-trivial

Not TDD religion. Just the question: what should this actually do? Write the test that proves it. Then write or generate the implementation.

The reason this matters more with AI is that AI will happily write code that doesn't do what you asked. It will write code that does something adjacent to what you asked, often passing whatever tests it's been shown, sometimes inventing tests that confirm its own misreading. The only defense is a test you wrote first, that the agent didn't see, that pins the behavior to your intent rather than the agent's interpretation of your intent.

Trivial logic — formatting, mapping, pass-through — doesn't need this. Anything with branching, state, or coordination between components does.

3. Full local pipeline before pushing

Linters. Type checks. Unit tests. Integration tests. The format checker. The lint-staged hooks. The pre-push hooks. All of it. If CI catches it, I've already failed. Not because catching it is bad — because the loop is too slow. CI feedback is on the order of minutes; local feedback is on the order of seconds. Slow loops produce sloppy code.

This isn't about purism. It's about which loop the discipline lives in. Code I'm willing to push has already survived every check the project runs. If a check fails after push, it's a check that wasn't part of my local pipeline — and the fix is to add it, not to wait for CI to catch it next time.

4. Adversarial self-review

Before I open a PR, I read the diff once in normal mode and once as a hostile reviewer. "If I were trying to break this, where would I attack? What edge case isn't handled? What invariant am I assuming that the caller can violate?"

This catches a class of bug that tests never catch and lint never catches: the bugs that come from incomplete imagination. The function works for everything you thought to write a test for. It doesn't work for the null case you didn't think about, the concurrent write you didn't model, the migration that runs while the feature is shipping.

Adversarial review is uncomfortable on purpose. The point is to find what you missed, not to confirm you didn't miss anything. If I read the diff hostilely and don't find anything, I read it again — slower, with more skepticism — because the boring truth is that I miss things every single time, and the only question is whether I find them before the user does.

5. Zero tolerance for "works on my machine"

"Works on my machine" is not a status. It is a confession. The code either works in the environment it's going to run in, or it doesn't work. The intermediate state where it works on the developer's laptop and nowhere else is not a release-ready state.

Concretely: every PR I open runs in CI exactly as it would run in production. Same container. Same env vars. Same database fixtures. If the local-only path differs from the CI path, that's a bug to fix before the PR is reviewable, not after.

This is the cheapest item on the list and the most often skipped. It's also the one that lets the other four work, because without it you don't actually know what you've shipped — you know what your laptop did when it last ran.

What AI changes — and doesn't

AI doesn't replace any of the above. It accelerates the steps that were already disciplined and exposes the steps that weren't.

If your spec is tight, the agent produces good output faster. If your spec is fuzzy, the agent produces bad output faster. If your tests pin behavior, the agent's regressions surface immediately. If your tests don't, the agent's regressions ship.

Speed without quality is not speed. It's future debt at a higher interest rate. The teams measuring "PRs per week" without measuring "defects per quarter" are taking on debt and calling it productivity. The teams measuring both are the ones the next decade is going to compound for.

The uncomfortable answer

The discipline is boring. The list isn't novel. You've read a version of this in every senior-engineer post for fifteen years.

The novel part is that AI raised the cost of not doing it. The pre-AI version of skipped discipline produced bad code at human typing speed. The post-AI version of skipped discipline produces bad code at agent typing speed, which is to say at industrial scale.

The people shipping zero-defect output with AI aren't lucky. They're disciplined. The agent is fast. The judgment is what scales the agent into shipped work that doesn't burn you next quarter.

Hiring engineers who can do this is hard. If you're a team that takes this seriously and you're looking for Staff or Principal, say hello.