TENET — Documentation

Build agents extend the RL improvement loop to greenfield building. Instead of optimizing an existing metric, they build new modules from specs and iterate until every assertion passes.

The Pattern

spec → eval assertions → agent TOML → `tenet peter agent {name}` → Karpathy loop → PR

“Granularity of feedback determines speed of convergence.” A monolithic eval with 16 checks stalled at 7%. The same eval decomposed into 6 page-level evals — each hit 100% in one round. Same agent, same code, different gradient.

Writing a Build Eval

// eval/build/storage-adapter.ts
export async function evaluate(): Promise<number> {
  const checks = [
    { name: "interface-exists", pass: existsSync(resolve("src/lib/storage/interface.ts")) },
    { name: "has-read-method", pass: fileContains("src/lib/storage/interface.ts", "read(") },
    { name: "local-impl", pass: existsSync(resolve("src/lib/storage/local.ts")) },
    { name: "compiles", pass: tscPasses() },
  ]
  return checks.filter(c => c.pass).length / checks.length
}

The eval checks AGENT_WORKTREE env var so it tests the agent’s worktree, not the main repo.

Agent TOML Config

[agent]
name = "build-storage-adapter"
scope = "build"           # triggers build-specific behavior
metric = "spec_compliance"
direction = "maximize"
time_budget_seconds = 600

[eval]
script = "eval/build/storage-adapter.ts"
data = "eval/fixtures/build-baseline.jsonl"

[task]
description = """
Create the TenetStorage adapter with interface, 
LocalStorage, and CloudStorage implementations.
Exact file paths: src/lib/storage/interface.ts, etc.
"""

Build vs RL Agents

	RL Agent	Build Agent
Goal	Improve existing metric	Build from spec
Baseline	Current score	Zero
Rounds	5-50, small changes	3-10, creates files
Worktree	From origin/main	From HEAD (inherits merged work)
Turns	15 per round	40 per round
Early stop	No	Yes (stops at 1.0)

Build Supervisor

Between rounds, checkRound() detects patterns:

Stalled: 3+ rounds at same score → injects hint
Filename mismatch: files created but eval can’t find them → alerts
Repeated reverts: same checks failing → suggests different approach

The supervisor logs learnings to .tenet/build-learnings.jsonl for future sessions.

Eval Decomposition

Break complex builds into sub-evals. Instead of one frontend eval with 16 checks, create 6 page-level evals with 2-3 checks each. Each scores independently, giving the agent gradient from round 1.

​The Pattern

​Writing a Build Eval

​Agent TOML Config

​Build vs RL Agents

​Build Supervisor

​Eval Decomposition

The Pattern

Writing a Build Eval

Agent TOML Config

Build vs RL Agents

Build Supervisor

Eval Decomposition