Architecture
- 14M parameters
- Trained on MPS (Apple Silicon) or CPU
- Checkpoint:
.jfl/checkpoints/policy-head-v2.json - Weights:
.jfl/checkpoints/best_policy_head.pt
How It’s Used
During agent runs, the policy head scores candidate actions:Training
When It Trains
The nightly loop retrains automatically when 50+ new tuples have accumulated since last training:Manual Training
Training Data
The training buffer (.jfl/training-buffer.jsonl) contains tuples from:
- Agent autoresearch rounds (kept and reverted)
- Manual journal entries (mined by tuple miner)
- Cross-service events
Checkpoint
When to Use GPUs
The policy head is small (14M params). Training on Apple Silicon MPS takes ~2 minutes. You don’t need cloud GPUs unless:- You’re training on 10K+ tuples
- You want to experiment with larger architectures
- You’re running parallel training across multiple projects