Skip to main content
The training buffer (.jfl/training-buffer.jsonl) captures every agent action and its outcome. This data trains the policy head and provides experiment history for future runs.

Format

Each line is a training tuple:
{
  "id": "tb_eyJjb21wb3Np",
  "v": "1",
  "ts": "2026-03-22T21:30:00Z",
  "agent": "test-coverage",
  "state": {
    "composite_score": 0.1276,
    "dimension_scores": { "test_pass_rate": 1.0, "build_health": 1.0 },
    "tests_passing": 1414,
    "tests_total": 1414,
    "trajectory_length": 3,
    "recent_deltas": [0.0031, -0.0002],
    "agent": "test-coverage"
  },
  "action": {
    "type": "test",
    "description": "Add tests for claude-md-generator.ts",
    "files_affected": ["src/utils/__tests__/claude-md-generator.test.ts"],
    "scope": "medium",
    "branch": "session/test-coverage-4bc3ff95-2026-03-22"
  },
  "reward": {
    "composite_delta": 0.0031,
    "dimension_deltas": {},
    "tests_added": 48,
    "quality_score": 0.0,
    "improved": true
  }
}

Data Sources

Tuples come from three sources:
SourceWhenWhat
Agent runsEach roundState, action, reward from eval delta
Tuple minerNightly pre-flightExtracts tuples from journal entries
Manualjfl_training_buffer toolRecord observations during sessions

Querying the Buffer

# Total tuples
wc -l .jfl/training-buffer.jsonl

# Tuples by agent
jq -r '.agent' .jfl/training-buffer.jsonl | sort | uniq -c | sort -rn

# Recent improvements
jq 'select(.reward.improved == true)' .jfl/training-buffer.jsonl | tail -5

Mining Tuples

The tuple miner extracts learning data from journals:
# Mine from all sources
jfl eval mine --all

# Mine from specific source
jfl eval mine --source journals
jfl eval mine --source evals
jfl eval mine --source sessions

Buffer Health

Check the nightly scorecard for buffer stats:
bash eval/nightly-scorecard.sh
  Training Buffer
  ───────────────
    Total tuples:     2764
    Last 24h:         62 new tuples
    Reward distribution:
      test-coverage    +8 / -2 / =0  (80% positive)
      code-quality     +4 / -3 / =1  (50% positive)

When Policy Head Retrains

The policy head retrains when the buffer has 50+ new tuples since last training. This happens automatically in the nightly loop, or manually:
jfl train transform && jfl train policy-head --force