Documentation Index
Fetch the complete documentation index at: https://docs.10et.ai/llms.txt
Use this file to discover all available pages before exploring further.
The policy head is a neural network trained on your project’s training buffer. It predicts which actions will produce positive reward given the current system state.
Architecture
Input: RLState (composite score, dimensions, trajectory)
↓
4-layer transformer (512 hidden, 8 heads)
↓
Output: predicted reward for each candidate action
Specs:
- 14M parameters
- Trained on MPS (Apple Silicon) or CPU
- Checkpoint:
.tenet/checkpoints/policy-head-v2.json
- Weights:
.tenet/checkpoints/best_policy_head.pt
How It’s Used
During agent runs, the policy head scores candidate actions:
# Score a single action
tenet policy score --type fix --description "Add error handling to auth module" --scope small
# Rank multiple actions
tenet policy rank '[
{"type": "test", "description": "Add tests for config loader", "scope": "small"},
{"type": "refactor", "description": "Extract auth middleware", "scope": "medium"},
{"type": "fix", "description": "Fix memory leak in hub", "scope": "large"}
]'
Ranked Actions (predicted reward):
1. [+0.0042] test: Add tests for config loader (small)
2. [+0.0018] fix: Fix memory leak in hub (large)
3. [-0.0003] refactor: Extract auth middleware (medium)
Training
When It Trains
The nightly loop retrains automatically when 50+ new tuples have accumulated since last training:
# In peter daily:
BUFFER_SIZE=$(wc -l < .tenet/training-buffer.jsonl)
LAST_TRAINED=$(jq '.trained_on' .tenet/checkpoints/policy-head-v2.json)
NEW_TUPLES=$((BUFFER_SIZE - LAST_TRAINED))
if [ "$NEW_TUPLES" -ge 50 ]; then
tenet train transform && tenet train policy-head --force
fi
Manual Training
# Transform raw tuples into training format
tenet train transform
# Train policy head
tenet train policy-head --force
Training Data
The training buffer (.tenet/training-buffer.jsonl) contains tuples from:
- Agent autoresearch rounds (kept and reverted)
- Manual journal entries (mined by tuple miner)
- Cross-service events
Current stats: 2764 tuples, 91.6% validation accuracy.
Checkpoint
{
"version": 2,
"architecture": "transformer-4layer-512h",
"embedding_dim": 768,
"hidden_dim": 512,
"num_layers": 4,
"num_heads": 8,
"trained_on": 1565,
"val_accuracy": 0.9164,
"parameters": 14191628,
"tool_to_index": {
"fix_bug": 0,
"refactor_code": 1,
"add_feature": 2,
"add_tests": 3,
"update_config": 4,
"run_experiment": 5
}
}
When to Use GPUs
The policy head is small (14M params). Training on Apple Silicon MPS takes ~2 minutes. You don’t need cloud GPUs unless:
- You’re training on 10K+ tuples
- You want to experiment with larger architectures
- You’re running parallel training across multiple projects
For most users, MPS or CPU is sufficient.