TENET implements a simplified reinforcement learning loop for code improvement. It’s not traditional RL with neural network policies playing Atari — it’s the Karpathy autoresearch pattern applied to codebases.Documentation Index
Fetch the complete documentation index at: https://docs.10et.ai/llms.txt
Use this file to discover all available pages before exploring further.
The Three Components
1. State (World Model)
Before each round, TENET captures the system state:RLState for the policy head:
2. Action (Policy Head Selects)
The policy head is a 14M-parameter transformer that predicts reward for candidate actions:- Experiment history (what worked, what didn’t)
- Policy head predictions (which action type is most promising)
- Product context (what the team is focused on)
3. Reward (Eval Delta)
After the agent makes changes:- Positive delta → KEPT (change merged to session branch)
- Zero or negative delta → REVERTED (
git reset --hard HEAD~1)
Training Tuple
Every round produces a training tuple, regardless of outcome:Why This Works
Traditional RL needs millions of episodes. TENET works with hundreds because:- The action space is constrained — agents modify specific files in a focused scope
- The eval is deterministic — same code produces the same score
- The environment resets cleanly —
git resetprovides perfect rollback - History informs action — agents see what worked/failed in past rounds
The Karpathy Connection
This is the autoresearch pattern:- Propose an experiment (agent generates a code change)
- Run the experiment (eval script measures the result)
- Evaluate the outcome (delta > 0?)
- Learn from the result (training tuple → policy head)
- Repeat with better-informed proposals
Common Pitfalls
| Pitfall | Symptom | Fix |
|---|---|---|
| Eval at ceiling | 0% keep rate, delta always 0 | Measure something with gradient |
| Wrong metric | Agent makes good changes, still reverted | Align eval with what agent actually changes |
| Eval tests wrong code | Agent’s worktree not evaluated | Use AGENT_WORKTREE env var |
| Scope too broad | Agent changes unrelated files | Narrow scope_files in agent config |
| Too many rounds | Diminishing returns | Cap at 5-10 rounds per session |

