Skip to main content
TENET’s core innovation is a reinforcement learning loop for code. Agents don’t just make changes — they measure whether changes improved the codebase, keep what works, and learn from the results.

The RL Loop

+---------------------------------------------+
|                                             |
|   +----------+    +-----------+             |
|   |  State   |--->|  Action   |             |
|   | (World   |    | (Agent    |             |
|   |  Model)  |    |  makes    |             |
|   |          |    |  change)  |             |
|   +----------+    +-----+-----+             |
|        ^                |                   |
|        |          +-----v-----+             |
|        |          |  Eval     |             |
|   +----+-----+    | (Measure  |             |
|   | Training |<---|  result)  |             |
|   | Buffer   |    +-----+-----+             |
|   |          |          |                   |
|   +----+-----+    +-----v-----+             |
|        |          |  Reward   |             |
|        v          | (Keep or  |             |
|   +----------+    |  revert)  |             |
|   | Policy   |    +-----------+             |
|   | Head     |                              |
|   | (Better  |                              |
|   |  action  |                              |
|   | select)  |                              |
|   +----------+                              |
|                                             |
+---------------------------------------------+

1. State (World Model)

Before each round, TENET captures the current system state:
  • Composite eval score
  • Test pass rate and coverage
  • Build health
  • Code quality metrics
  • Agent’s trajectory (what it tried before)

2. Action (Agent)

The agent makes a focused code change. The policy head helps select what type of change to try based on what worked in the past.

3. Eval (Measure)

An eval script runs against the agent’s changes — not the main branch. The AGENT_WORKTREE mechanism ensures the eval tests the actual changes in an isolated git worktree.

4. Reward (Keep or Revert)

  • Score improved → change is kept, merged to the session branch
  • Score stayed same or regressedgit reset --hard HEAD~1, change reverted

5. Training Buffer

Every round — kept or reverted — writes a training tuple:
{
  "agent": "test-coverage",
  "state": { "composite_score": 0.1276, ... },
  "action": { "type": "add_tests", "description": "...", "files_affected": [...] },
  "reward": { "composite_delta": 0.0031, "improved": true }
}

6. Policy Head

A 14M-parameter transformer trained on the training buffer. Predicts which actions will produce positive reward given the current state. Retrained nightly when 50+ new tuples accumulate.

The Nightly Loop

Every night at 2 AM (configurable):
jfl peter daily
  +-- Mine training tuples from journals
  +-- Synthesize product context
  +-- Strategic reasoning (which agents to run?)
  +-- Run stale agents (5 rounds each)
  +-- Retrain policy head (if 50+ new tuples)
  +-- Pick up backlog issues → create PRs

Self-Driving Pipeline

Issues flow through a kanban pipeline automatically:
Issue filed (Linear/GitHub)
  → jfl/backlog label
  → PP picks up (every 30 min)
  → Agent creates PR
  → CI runs eval
  → Score improves → auto-merge → close issue
  → Score regresses → request changes

Key Insight

The eval script is the reward function. If the eval measures the right thing, agents improve. If it doesn’t, they waste compute. We learned this the hard way — 750 rounds with 2.5% keep rate because eval scripts were at ceiling (test pass rate was already 100%). The fix: eval scripts that measure metrics with real gradient (actual coverage percentage, not just pass/fail).