Skip to main content
The world model captures a complete snapshot of the system at a point in time. It gives agents and the policy head context about the environment they’re operating in.

WorldState

interface WorldState {
  timestamp: number
  agentId: string
  
  systemState: {
    activeAgents: string[]        // Which agents are running
    worktreeAllocation: {}        // Git worktree usage
    hubConnections: number        // Context Hub connections
    buildStatus: Record<string, string>  // Build health per service
    fileLocks: string[]           // Currently locked files
    pendingEvals: number          // Queued eval runs
  }
  
  contextState: {
    recentCommits: number         // Commits in last 24h
    openPRs: number               // Open pull requests
    failingTests: number          // Currently failing tests
    codeChurn: number             // Lines changed recently
    humanActivity: boolean        // Is a human currently working?
  }
  
  agentState: {
    lastEvalScore: number         // Most recent eval composite
    rewardEMA: number             // Exponential moving average of rewards
    actionHistory: string[]       // Recent action types taken
    consecutiveFailures: number   // Reverts in a row
  }
}

RLState (Policy Head Input)

The world state is converted to a compact format for the policy head:
interface RLState {
  composite_score: number              // Current eval score
  dimension_scores: {
    test_pass_rate: number             // Test health
    build_health: number               // Build status
    code_quality: number               // Code quality metrics
    hub_health: number                 // Hub connectivity
  }
  tests_passing: number
  tests_total: number
  trajectory_length: number            // Rounds completed
  recent_deltas: number[]              // Last 5 reward deltas
  agent: string                        // Agent name
}

How It’s Used

  1. Before each round — World state captured as the “before” snapshot
  2. Policy head scoring — RLState fed to transformer for action ranking
  3. Strategic reasoning — Peter Parker uses state to decide which agents to run
  4. Training — State included in training tuples for policy head learning

State Transitions

Each agent round creates a state transition:
Prior state (before) → Action (agent change) → Posterior state (after)
Transitions are tracked in .jfl/telemetry/resource-transitions.jsonl for analysis.