TENET — Documentation

The policy head is a neural network trained on your project’s training buffer. It predicts which actions will produce positive reward given the current system state.

Architecture

Input: RLState (composite score, dimensions, trajectory)
  ↓
4-layer transformer (512 hidden, 8 heads)
  ↓
Output: predicted reward for each candidate action

Specs:

14M parameters
Trained on MPS (Apple Silicon) or CPU
Checkpoint: .tenet/checkpoints/policy-head-v2.json
Weights: .tenet/checkpoints/best_policy_head.pt

How It’s Used

During agent runs, the policy head scores candidate actions:

# Score a single action
tenet policy score --type fix --description "Add error handling to auth module" --scope small

# Rank multiple actions
tenet policy rank '[
  {"type": "test", "description": "Add tests for config loader", "scope": "small"},
  {"type": "refactor", "description": "Extract auth middleware", "scope": "medium"},
  {"type": "fix", "description": "Fix memory leak in hub", "scope": "large"}
]'

  Ranked Actions (predicted reward):
[+0.0042] test: Add tests for config loader (small)
[+0.0018] fix: Fix memory leak in hub (large)
[-0.0003] refactor: Extract auth middleware (medium)

Training

When It Trains

The nightly loop retrains automatically when 50+ new tuples have accumulated since last training:

# In peter daily:
BUFFER_SIZE=$(wc -l < .tenet/training-buffer.jsonl)
LAST_TRAINED=$(jq '.trained_on' .tenet/checkpoints/policy-head-v2.json)
NEW_TUPLES=$((BUFFER_SIZE - LAST_TRAINED))
if [ "$NEW_TUPLES" -ge 50 ]; then
  tenet train transform && tenet train policy-head --force
fi

Manual Training

# Transform raw tuples into training format
tenet train transform

# Train policy head
tenet train policy-head --force

Training Data

The training buffer (.tenet/training-buffer.jsonl) contains tuples from:

Agent autoresearch rounds (kept and reverted)
Manual journal entries (mined by tuple miner)
Cross-service events

Current stats: 2764 tuples, 91.6% validation accuracy.

Checkpoint

{
  "version": 2,
  "architecture": "transformer-4layer-512h",
  "embedding_dim": 768,
  "hidden_dim": 512,
  "num_layers": 4,
  "num_heads": 8,
  "trained_on": 1565,
  "val_accuracy": 0.9164,
  "parameters": 14191628,
  "tool_to_index": {
    "fix_bug": 0,
    "refactor_code": 1,
    "add_feature": 2,
    "add_tests": 3,
    "update_config": 4,
    "run_experiment": 5
  }
}

When to Use GPUs

The policy head is small (14M params). Training on Apple Silicon MPS takes ~2 minutes. You don’t need cloud GPUs unless:

You’re training on 10K+ tuples
You want to experiment with larger architectures
You’re running parallel training across multiple projects

For most users, MPS or CPU is sufficient.

​Architecture

​How It’s Used

​Training

​When It Trains

​Manual Training

​Training Data

​Checkpoint

​When to Use GPUs