TENET — Documentation

The eval system is the reward function for the RL loop. It runs before and after an agent makes changes, and the delta determines whether the change is kept.

Eval Flow

Baseline eval (before changes)
  → Agent makes code change
  → Post-change eval (same script)
  → delta = post - baseline
  → delta > 0 → KEPT
  → delta ≤ 0 → REVERTED

Eval Store

All eval results are stored in .tenet/eval.jsonl:

{
  "v": 1,
  "ts": "2026-03-22T21:30:00Z",
  "agent": "test-coverage",
  "run_id": "test-coverage-4bc3ff95",
  "metrics": {
    "coverage_percent": 0.1307,
    "line_pct": 13.37,
    "branch_pct": 12.27
  },
  "composite": 0.1307,
  "delta": 0.0031,
  "improved": true
}

Viewing Eval History

# Current eval status
tenet eval status

# Compare two snapshots
tenet eval compare

# View trajectory for an agent
tenet eval trajectory --agent test-coverage

Eval Snapshots

When an agent starts, TENET freezes the eval script into a snapshot (SHA-based). This ensures the eval doesn’t change mid-run — the same script measures baseline and post-change. Snapshots are cached at ~/.cache/tenet/eval-snapshots/<hash>/.

Writing Good Evals

See Eval Scripts for the complete guide on writing eval scripts that produce real gradient. Key principles:

Output JSON with a primary metric
Use AGENT_WORKTREE for cross-repo agents
Ensure the metric has room to improve (not at ceiling)
Keep evals fast (under 30s) and deterministic

RL Loop Training Buffer

​Eval Flow

​Eval Store

​Viewing Eval History

​Eval Snapshots

​Writing Good Evals

Eval Flow

Eval Store

Viewing Eval History

Eval Snapshots

Writing Good Evals