Skip to main content
TENET agents are autonomous workers that make focused code changes, measure the results, and learn from the outcomes. They run on a reinforcement learning loop — keep what improves, revert what doesn’t.

How Agents Work

Each agent has:
  • A metric — what it’s trying to improve (test coverage, startup speed, code quality)
  • An eval script — how to measure the metric (bash or TypeScript)
  • A scope — which files it can modify
  • A time budget — how long each round gets
Agent starts
  → Measure baseline (run eval)
  → Make a focused change (via Claude/Sonnet)
  → Measure again (run eval)
  → Score improved? → KEEP (advance branch)
  → Score same/worse? → REVERT (git reset)
  → Write training tuple → policy head learns
  → Repeat for N rounds

Built-in Agents

TENET ships with 5 focused agents out of the box:
AgentMetricDirectionWhat it does
test-coveragecoverage_percentmaximizeAdds tests for uncovered files. Uses jest --coverage.
code-qualityquality_scoremaximizeReduces console.logs, any types, TODOs. Adds @purpose headers.
cli-speedp90_msminimizeOptimizes CLI startup latency. Lazy-loads, caches, reduces imports.
telemetry-rlproduct_healthmaximizeImproves real user experience: startup speed, session reliability.
onboarding-successsuccess_ratemaximizeFixes onboarding edge cases in jfl init.

Running an Agent

# List configured agents
jfl peter agent list

# Run a specific agent for 3 rounds
jfl peter agent test-coverage --rounds 3

# Run all agents (swarm mode)
jfl peter agent swarm --rounds 10

Agent Output

Running Scoped Agent: test-coverage (3 rounds)

Metric: coverage_percent (maximize)
Time budget: 600s per round
Baseline: 0.1276

── Round 1/3 ──────────────────────────────
Task: Add tests for uncovered files...
✓ Result: 0.1307 (+0.0031) KEPT

── Round 2/3 ──────────────────────────────
Task: Add tests for auth module...
✗ Result: 0.1305 (-0.0002) REVERTED

── Round 3/3 ──────────────────────────────
Task: Add tests for config loader...
✓ Result: 0.1342 (+0.0035) KEPT

Session Complete
  Improved: 2
  Total delta: +0.0066

The Key Insight

The eval script is everything. If the eval measures the right thing, agents improve. If it measures the wrong thing, they waste compute.We learned this the hard way — 750 rounds with 2.5% keep rate because eval scripts were at ceiling (test pass rate was already 100%). The fix: eval scripts that measure metrics with real gradient.
Good eval = real gradient (12.8% coverage → 87% room to improve). Bad eval = ceiling (100% test pass rate → nowhere to go).

What’s Next

Agent Configuration

TOML config reference — metrics, scope, time budgets.

Eval Scripts

Write eval scripts that produce real gradient.

Peter Parker

The meta-orchestrator that runs the nightly loop.

Creating Agents

Build custom agents for your own metrics.