LiveRL

LiveRL

Architecture

AgentLoopWorkers

The concurrent workers that run scaffolds and capture rollouts

AgentLoopWorkers are the verl processes that actually run the trials. The trainer hands each step's batch of tasks to a pool of workers; each worker drives one scaffold through one task, start to finish, and returns the captured rollout.

What a worker does

For each trial a worker:

  1. Launches the configured scaffold inside a fresh Environment sandbox.
  2. Points the scaffold at the In-Process Proxy (overriding its base URL with a unique per-session URL).
  3. Lets the agent run multi-turn — reason, edit, run tests — until it submits or hits the turn/timeout limit.
  4. Collects the verifier's reward and the per-token trajectory for the training update.

Concurrency

NUM_WORKERS sets how many trials run in parallel (16 at cold start; raise for steady state). Because a trial is dominated by sandbox/environment execution rather than GPU compute, running many workers concurrently is the main lever for keeping the inference servers — and, in synchronous mode, the trainer — busy.

The agent-loop classes (and Custom Workers)

The worker's per-scaffold logic lives in an agent-loop class:

  • BuiltinCCAgentLoop — Claude Code (Anthropic protocol).
  • BuiltinSWEAgentLoop — OpenHands / OpenCode / Terminus (OpenAI protocol).

A Custom Worker is simply a new agent-loop class: implement the loop for your scaffold, drop the adapter under src/harbor_patch/agents/, and select it with HARBOR_AGENT_IMPORT_PATH. Every worker funnels through the same proxy, so a new scaffold gets trajectory capture and routing for free.

On this page