An Agent Platform Where Agents Can't Destroy Each Other's Work

Two AI agents edited the same file. Both were right. The merge destroyed both changes.

That was the day I built three-way diff merging for agent workspaces. The day before that, I had watched an agent retry the same failing command forty-seven times. Twelve dollars in tokens. Zero progress. The day before that, an agent ran a shell command I didn't expect and I was very glad it was in a container.

Every guardrail in the platform exists because I watched an agent break something without one. This is the architecture that came out of those incidents.

The shape of the system

The platform is a multi-agent orchestration system built on a hub-and-spoke model. An orchestrator receives a task, spawns a team of specialists, coordinates their work, and ships an integrated result. The current roster is thirteen specialists — programmer, code reviewer, test writer, architect, security auditor, database expert, migration specialist, and seven others — each with a focused system prompt, a narrow tool surface, and a defined handoff contract.

Each team runs in its own ephemeral Docker container. RabbitMQ carries events back to the portal in real time. Artifacts (file diffs, logs, metrics) are collected and written to a results volume. SIGTERM triggers graceful shutdown: agents finish their current turn, save state, exit clean. The container dies; the portal has everything it needs.

What makes the system actually work isn't the agent roster. It's six guardrails layered underneath.

1. Workspace isolation

Each specialist gets an isolated workspace. Their own copy of the relevant files, their own scratch space, their own state. They cannot read or write each other's working files directly. When they're done, their changes get collected as a diff against the shared base.

The shared-workspace alternative is chaos. I tried it once. Once is enough. Two agents working in the same directory will step on each other's edits, blow away half-finished work, and produce output that looks integrated until you try to run it.

Isolation first. Merge second. Conflict resolution last. Same principle Git solved twenty years ago. Same principle applies here.

2. Three-way diff merging

When two specialists touch the same file, the integrator computes diffs from the common base. Non-overlapping changes merge automatically. Overlapping changes flag for human review with both sides of the conflict displayed.

The trick is the base: every specialist starts from the same snapshot of the codebase. Their changes are diffs against that snapshot, not against each other. That gives you a deterministic merge surface instead of "whoever wrote last wins." Two valid edits compose. Two contradictory edits surface as a conflict, not a silent destruction.

3. Loop detection

Agents get stuck. Not always in obvious ways. Sometimes they retry the same failing tool call. Sometimes they spiral through subtly different failures that all share the same trajectory.

The platform tracks both modes. Every tool call plus error gets hashed. Same hash three times in a row → warning. Five times → force stop. Separately, consecutive errors (whether identical or not) get counted. Eight in a row → warning. Twelve → force stop.

The cost of getting this wrong is measured in dollars. The cost of getting it right is a few hundred lines of bookkeeping. The economics are obvious once you've watched an agent burn through a budget pretending to make progress.

4. Budget-aware model downgrade

Specialists get assigned a model at spawn time based on budget consumption. Below 70%, complex work goes to Sonnet or Opus. Above 70%, the orchestrator silently downgrades subsequent specialists to Haiku.

The insight: early work matters more. Research, architecture, complex implementation — that needs the best model. Late work — cleanup, formatting, doc generation — doesn't. Front-load expensive models. Degrade gracefully as budget depletes. Highest quality where it matters most. Acceptable quality where it matters least.

It's the same principle as progressive JPEG. Important information first. Refine with what's left.

5. Blind adversarial validation

If a specialist sees the test files, it pattern-matches to pass. The code compiles, the tests pass, the work looks done. It's reward-hacking. The structural correctness you wanted is not what you got.

The platform fixes this by isolating validation. After a specialist completes, a separate container spins up with the agent's artifacts plus a hidden test suite. The agent's own test files are scrubbed. The project's actual test command runs against the artifacts. Pass = structurally correct. Fail = the specialist retries with the failure output as feedback, still blind.

Three attempts. Still failing → human review. The agent never sees the hidden suite. Adversarial by construction.

6. Quality gates as shell hooks

After any specialist completes, a configurable hook runs against the output. pytest, eslint, a custom validator, a security scan. Exit 0 = pass. Exit non-zero = reject. The hook's stdout becomes feedback context for the specialist's next attempt.

Critically, the agent doesn't know about the gate. It just receives feedback and improves. The gate is enforcement, not negotiation. Three retry attempts then human review. The system fails closed by default — uncertain work goes to a human, not into main.

The observability layer

You can't debug what you can't see. In single-agent systems "check the logs" is sometimes enough. In a thirteen-agent system running in parallel across isolated containers, it isn't.

The platform emits 54 structured event types. RunStarted → DelegationStarted → TeamMemberFinished. TokensUsed → SpendingLimitReached → ModelDowngraded. TaskClaimed → QualityHookFired → AgentMessageReplied. Every event carries run_id, agent_name, agent_id, type, timestamp, and a unique event_id.

WebSocket replays all events on connect. Refresh the page mid-run — you lose nothing. Large outputs (diffs, log dumps) stream separately so the event bus stays sharp.

This isn't observability as a nice-to-have. It's the debugging strategy itself. The day a specialist does something weird, the only way to find out what happened is to replay the event sequence and see the decision points. Anything less and you're guessing.

What the platform doesn't do

It doesn't ship code to main. Final integration is a diff against the base branch, presented to a human reviewer with the full event trail and the artifacts from every specialist that contributed. A human approves. The platform's job is to make that approval cheap — to produce work that's already passed structural validation, already passed security scans, already integrated cleanly with itself — so the human review surface is small and the human review is meaningful.

It also doesn't pretend to be deterministic where it isn't. Specialists get assigned names from themed pools (Norse gods, Hindu gods, biblical figures, anime characters). "Socrates reviewed your code and found three issues" reads differently than "Agent #4 found three issues." Same information, different trust. Small UX detail, significant impact on whether engineers actually use the output.

The pattern, in one sentence

Isolate execution. Merge deterministically. Detect stuck states. Validate adversarially. Gate at the boundary. Observe everything. Trust nothing the agent claims that you haven't independently verified.

That sentence is the entire architectural thesis. Everything else is implementation.

The platform is private for now. If you'd like to see it, or you're building something similar and want to compare notes, say hello.