Before You Add an Agent, Build the Harness

I don’t start with the model. I don’t start with the prompt, the framework, or whether to use LangChain versus whatever orchestration layer appeared this week. I start with a simpler question, and it’s also the dangerous one: what is this agent actually allowed to decide?

The hard part of putting agents into real systems is not getting a plausible answer from a model. Models are already good at that. The hard part is deciding what the system does with that answer once it exists.

A model can classify a request. It can summarize risk. It can recommend an action and explain its reasoning in a paragraph that sounds confident. But none of that should make it the final authority. This is where a lot of agent designs collapse — they treat model output as if it were a decision. It isn’t. It’s an input.

The problem is not output. It is authority.

Consider a simple infrastructure request:

{
  "request_id": "REQ-001",
  "title": "Expose internal admin API",
  "description": "We need to expose the internal admin API temporarily for a customer demo.",
  "environment": "production",
  "requested_by": "platform-team"
}

An LLM might look at this and return a tidy recommendation — risk medium, confidence 0.82, action auto_approve, reason “temporary and tied to a demo.”

That answer is not absurd. It is also not safe enough to execute. The request mentions production. It mentions an internal admin API. It mentions exposure. Those are operational boundaries, not just words. A system that treats the model recommendation as the decision has already lost control. It outsourced authority to a probabilistic component.

The better pattern separates the work. The LLM classifies. A schema validates what came back. Policy rules constrain what is allowed. A router makes the final call. An audit log records what actually happened. The model participates. It does not govern.

I call this boundary a harness.

Not a better prompt. Not a more elaborate system message. Not a framework that makes an agent feel more autonomous.

A harness is the set of hard rules around the model: schemas for input and output, policy rules for what is allowed, routing logic that selects the final action, escalation paths for risky or ambiguous cases, and audit logs that record what happened.

The point is not to make the model smarter. The point is to make the system less dependent on the model being right. If the model is correct, the harness lets useful work proceed. If the model is wrong, uncertain, overconfident, or incomplete, the harness prevents that output from becoming unsafe automation.

To make this concrete, I built a small TypeScript experiment called agent-harness-lab. The code is at github.com/mhernandezve/agent-harness-lab. The repo has two branches: main with the mock classifier, and with-llm with the real LLM integration.

The first version does not call a real LLM — that was intentional. Before adding model variability, provider APIs, retries, latency, and prompt tuning, I wanted to isolate the architectural pattern. So the first version uses a mock classifier that behaves like an LLM-shaped component, but the important parts of the system are fixed and predictable.

The CLI accepts a JSON request, validates it with Zod, runs it through the mock classifier, validates the classifier output with Zod, applies policy rules, routes to a final decision, writes an audit log, and prints the result.

The interesting behavior is not classification. It is override.

You might see output like this:

Request: REQ-001
LLM recommendation: auto_approve
Final decision: human_review
Overrode LLM: true
Audit written to: audits/REQ-001.json

The model recommended auto_approve, but the final decision was human_review. The system refused to treat the recommendation as authority. That line is the whole point.

The policy layer is kept small on purpose. It uses a severity order:

auto_approve < request_more_info < human_review < reject

Rules can raise severity but never lower it. A production request may require human review. A public exposure request may require human review. A request that disables authentication may be rejected outright.

The exact rules are not the important part. What matters is that they are fixed and independent from the model. The model can say auto_approve. The policy layer can still say human_review. Then the router selects the stricter action. That is where the system keeps authority.

Validation is part of the boundary. A model does not only produce wrong answers — it can also produce malformed ones. That is why the experiment validates both the input request and the classifier output with Zod. A simplified classifier schema defines risk, confidence, recommended_action, and reason.

This does not make the model reliable. It does something more modest and more useful: it prevents invalid model output from silently becoming system input.

Small boundary. Real one.

Every run writes an audit record. It includes the original request, the model classification, the triggered policy rules, the final decision, whether the model recommendation was overridden, and a timestamp. That audit log is not just operational bookkeeping. If an agent participates in a decision, the system should be able to reconstruct what happened. What did the model recommend? Which rules fired? Who or what made the final call? Was there an override? Without that trace, the system may look automated, but it is not accountable.

The with-llm branch replaces the mock classifier with a real LLM-backed classifier. The most important part of that integration is what should not change. The schema stays. The policy layer stays. The router stays. The audit log stays. The model still classifies and recommends. It does not decide.

Adding an LLM should change the classifier implementation, not the decision authority model. That is the architectural test. If adding a real LLM requires moving policy into the prompt, letting the model select the final action, or removing fixed validation, the system did not gain an agent. It lost a boundary.

A real production system may eventually need a durable workflow engine — retries, timers, human approval, long-running state, external events, idempotent audit writes. But that comes after. The starting point is the authority model. Who classifies? Who validates? Who constrains? Who decides? Who records? Once those are clear, a workflow engine can make the process durable. Without them, it only makes the confusion more reliable.

The useful pattern is an agent inside a workflow boundary, not an agent replacing one. The model can help interpret ambiguous requests, extract structure from natural language, summarize risk, and recommend what should happen next. But the system still needs hard rules around that output. Policy should not disappear into model prose. Without a trace of what the model recommended, which rules fired, and what the final call was, the system may look automated — but it is not accountable.

Build the harness first. Add the agent after.

Not because agents are useless on their own — but because anything powerful enough to act on your infrastructure is powerful enough to need explicit boundaries.