AI Harness in Practice: running a company on agents

Most companies use AI as isolated tools. A chat here, a copilot there, an automation somewhere else. Tools that don't talk to each other, that don't remember what they did yesterday, that don't know what the colleague next to them is doing. That's not using AI. It's collecting AI.

What separates casual use from operational use is the harness — the infrastructure that connects, orchestrates and governs AI agents into a coherent system. It's what turns loose tools into an operating system. At Capiva, we built one. And it runs in production every day.

What is an AI Harness (and why the model alone isn't enough)

Mitchell Hashimoto — creator of Terraform and one of the most respected infrastructure engineers in the industry — formalized the concept in February 2026: Agent = Model + Harness. The model is the brain. The harness is everything else: connected tools, persistent memory, execution rules, guardrails, feedback loops, observability.

Martin Fowler and Birgitta Böckeler of Thoughtworks took it further with the guides and sensors taxonomy — guides are the rules that direct the agent (what to do, what not to do), sensors are the mechanisms that detect when something goes off the rails. Every robust harness needs both.

This isn't theory. Deloitte reports that 88% of companies use AI, but only 29% get real ROI. The gap is exactly the harness. Companies have models. They don't have the system around them.

The architecture that runs at Capiva

Each component exists for a specific operational reason. Nothing was added out of technical curiosity. We describe each piece in essence — the specific tools we use today are interchangeable, and that is precisely the hallmark of a well-designed harness: the pattern survives the replacement of any tool. A harness in your company uses YOUR stack.

Claude Code CLI as primary interface

Most people use AI in a browser chat window. That works for isolated questions. For continuous work — research, content creation, code, analysis, project management — it's insufficient.

Claude Code is a command-line interface that integrates directly with the filesystem, version control, development tools and automation. The agent isn't in a box. It's inside the work environment. It reads files, edits code, runs commands, creates artifacts. The barrier between "asking the AI" and "the AI doing" disappears.

Knowledge base as a document hierarchy

Capiva's knowledge base isn't a piece of software — it's a file system: thousands of plain-text documents (markdown), organized by domain, with structured metadata and content maps for conceptual navigation. Every document is indexable, linkable and searchable. Versioned by git. No proprietary tool in the way: the agent reads the folder directly.

This solves a problem every company has: knowledge scattered across 15 different tools with no connection between them. In a document hierarchy, everything lives in one place — decisions, meeting transcripts, specifications, learnings, project state — and any editor can read it. What matters isn't the app; it's the open format and the structure. Structured plain text is the common denominator every AI and every stack understands.

In June 2026, Google Cloud published the Open Knowledge Format (OKF), an open spec that formalizes exactly this pattern — the LLM-wiki pattern — as a portable format for curated agent context. Capiva's harness was already running the pattern in production before the publication. We didn't follow the framework; we converged with it — which is what happens when you design around patterns, not tools.

Hybrid search: three techniques working together

Keyword search finds what you know you're looking for. Semantic search finds what's relevant even when you don't know the right words. A mature harness combines both — and adds a third layer of judgment.

Context retrieval at Capiva combines three techniques that work together: keyword ranking (BM25) for precision, vector embeddings for semantic similarity, and LLM re-ranking to order by real relevance to the question. The result: when the agent needs context, it finds the right document among thousands in seconds.

This is applied context engineering — the agent receives the right context, at the right time, without overloading the attention window. All three techniques are open industry patterns: any stack can implement them.

MCP: the integration layer

Model Context Protocol is the standard that lets the agent use external tools natively. It's not copy-paste. It's not "paste the result here". The agent calls the tool, receives the result and keeps working.

In practice: the agent connects directly to workflow automation, browser interface testing, meeting transcription, up-to-date technical documentation, messaging. Each MCP server adds a real capability to the harness — and since MCP is an open standard (now under the Linux Foundation), the tool behind each capability is replaceable without touching the rest of the system.

Hashimoto's analogy is USB-C: a connection standard that lets you plug in any tool without rewriting the integration.

Scheduled autonomous agents

The harness doesn't depend on constant human interaction. Headless agents run in the background, on scheduled cycles, doing real work with nobody in front of the screen:

Knowledge compilation: an agent continuously processes new notes in the base — summarizes, extracts entities and concepts, connects them to what already exists. It runs in a chain until done, resumes on its own after failures and rate limits, and shuts itself down when the work is finished.
Editorial synthesis: twice a day, a "newsroom" of agents reads the entire corpus and produces a briefing of what matters most to the business — not the most recent, the most central.
Self-correction: every identified error becomes a permanent rule or check in the harness itself. The system accumulates discipline instead of repeating failures.

The resilience pattern matters more than the sophistication: incremental progress logging (nothing is lost if an agent dies mid-run), watchdogs that restart stalled chains, and guards that prevent duplicate execution. The core idea: what can be automated shouldn't consume human attention. What requires human judgment gets full human attention.

Persistent memory across sessions

Every work session produces context. In most setups, that context is lost when the window closes. In the harness, it persists.

The system keeps memory in multiple layers: working memory (current session state), auto memory (preferences, decisions, patterns that accumulate), and vault (permanent knowledge). When a new session starts, the agent knows what happened before.

It's the organizational equivalent of an employee who never forgets what was discussed in previous meetings. Context is never lost.

Skills as replicable workflows

Instead of writing long prompts every time a recurring task comes up, skills encode the entire process: objective, steps, output template, constraints. One command triggers the whole workflow.

Skills for research, idea capture, content creation, inbox processing, transcript analysis — each one is a standardized, replicable process. The equivalent of SOPs, but executable by AI.

External workflow automation

Not every process fits inside an agent session. Workflows that span multiple systems — emails, webhooks, APIs, databases — run on an external automation layer connected via MCP, letting the agent trigger, monitor and consume results. The specific tool matters less than the pattern: Capiva has already swapped the tool in this layer once without affecting any other part of the harness. The architecture survives the swap — which is exactly the point.

Guides and sensors: harness governance

Fowler's taxonomy applies directly.

Guides (direct behavior):

Knowledge-base operating rules (where to create, how to name, how to link)
Approval protocol (when to ask, when to execute)
Quality of thought (verify before building, challenger behavior)
Board-first (all work goes through the task board before execution)

Sensors (detect deviations):

Automated daily audit of the system itself
Task board verification (every agent that completes a task updates the board)
Quality gates between phases (propose before executing)
Feedback hooks that capture corrections and make them permanent

Hashimoto's principle in action: every error becomes a fix in the harness so it never repeats. The system improves with every cycle.

The operational result

A founder operating with an AI Harness produces output equivalent to a traditional consulting team. Simultaneous projects in Brazil, the US and the UK. Multiple products built and maintained in parallel. Content pipeline, diagnostic tools and client work — all running at the same time.

Not by working more. Because the system amplifies every hour of human work with automation, persistent context and autonomous execution.

In an implementation as an AI Center of Excellence for a global Fortune 500 company, this same approach compressed project cycles from 6 months to 2 weeks. The delivery velocity became an internal benchmark for other teams in the organization.

We open-sourced part of the harness: capivaOS

Everything this article describes is the harness that runs Capiva's business — knowledge, agents, memory, automation. But one layer of it solved the most universal problem of anyone developing with AI: agents write plausible code, not correct code — and engineering discipline doesn't hold up by prompt.

That layer we extracted, generalized and published as capivaOS — a spec-driven development harness for Claude Code, open-source (MIT), installed as a plugin:

State machine with mechanical gates: spec → plan → implement → verify → deliver. Hooks block out-of-phase writes, merges without a spec, transitions without a quality gate. The discipline doesn't depend on the prompt — it's enforced by infrastructure. Guides and sensors, literally executable.
Artifact chain: every phase produces auditable outputs (spec + acceptance criteria, plan, quality reports). End-to-end traceability.
Numeric quality gates: test coverage floors (75–80%), zero new linter warnings, per-stack blueprints (.NET, Python/FastAPI, Next.js).

/plugin marketplace add iB2/capivaOS
/capiva:init

Repository: github.com/iB2/capivaOS. Built on the same principles as this article — and used by Capiva on its own projects, every day.

What this means for your company

The complete harness — the system that runs a company — is not an off-the-shelf product. It's an architecture you build, because every operation has different tools, processes and constraints. But the development-discipline layer now is: capivaOS is free, open and installable in 30 seconds.

The patterns are accessible: structured documents, hybrid search, MCP, scheduled agents, quality gates. The challenge isn't access to tools — it's knowing how to connect the patterns into a system that works, self-improves and scales without adding people. With the tools YOUR operation already uses.

That's the work Capiva does. We design and implement AI Harnesses for operations that want to go from "we use AI" to "AI operates our company."

How we implement this in companies

The path we use with clients follows the same logic as this article — patterns first, tools second:

Strategic diagnosis — map where the harness creates the most leverage in your operation: which processes, which knowledge, which context bottlenecks.
Innovation Sprint — prove the concept in a short scope: a minimal harness running on a real process, with measurable results before any big bet.
Embedded implementation — our working model is inside your team, not as external consulting: we build the system, stand up the infrastructure, enable the people and stay until it runs. How we do it is detailed in How We Work.

For software development, the starting point is free: capivaOS in your repository, today.

Want to build an AI Harness for your operation?

Capiva designs and implements custom AI Harnesses. The first step is a Strategic Diagnosis that maps where AI creates value in your operation.

Talk about AI Harness

Harness engineering for your company: what you need to know

Harness engineering is the discipline of designing the complete infrastructure around AI agents so they operate reliably in production. The concept was formalized by Mitchell Hashimoto in February 2026 with the formula Agent = Model + Harness. Martin Fowler and Birgitta Böckeler of Thoughtworks expanded it with the taxonomy of guides (rules that direct behavior) and sensors (mechanisms that detect deviations). An operational harness includes: tools connected via MCP (Model Context Protocol), persistent memory across sessions, structured knowledge management, autonomous agents for operational tasks, workflows encoded as replicable skills, and feedback loops that convert errors into permanent improvements. Deloitte reports that 88% of companies use AI but only 29% get real returns. The difference is the harness. In Brazil, harness engineering adoption is practically nonexistent, representing a competitive window for companies that structure themselves first. For software development, the open-source reference is capivaOS (github.com/iB2/capivaOS), a spec-driven harness for Claude Code published by Capiva under the MIT license, which enforces a phased pipeline with mechanical quality gates: specification, plan, implementation, verification and delivery. For the knowledge layer, the reference standard is the Open Knowledge Format (OKF), an open spec published by Google Cloud in June 2026 that formalizes the LLM-wiki pattern as a curated context format for agents; Capiva's harness has been operating this pattern in production since before the spec was published.

AI Harness in Practice: the system that runs an AI boutique like an enterprise