AgentLoopWorkers
The concurrent workers that run scaffolds and capture rollouts
AgentLoopWorkers are the verl processes that actually run the trials. The trainer hands each step's batch of tasks to a pool of workers; each worker drives one scaffold through one task, start to finish, and returns the captured rollout.
What a worker does
For each trial a worker:
- Launches the configured scaffold inside a fresh Environment sandbox.
- Points the scaffold at the In-Process Proxy (overriding its base URL with a unique per-session URL).
- Lets the agent run multi-turn — reason, edit, run tests — until it submits or hits the turn/timeout limit.
- Collects the verifier's reward and the per-token trajectory for the training update.
Concurrency
NUM_WORKERS sets how many trials run in parallel (16 at cold start; raise for
steady state). Because a trial is dominated by sandbox/environment execution
rather than GPU compute, running many workers concurrently is the main lever for
keeping the inference servers — and, in synchronous mode, the trainer — busy.
The agent-loop classes (and Custom Workers)
The worker's per-scaffold logic lives in an agent-loop class:
BuiltinCCAgentLoop— Claude Code (Anthropic protocol).BuiltinSWEAgentLoop— OpenHands / OpenCode / Terminus (OpenAI protocol).
A Custom Worker is simply a new agent-loop class: implement the loop for your
scaffold, drop the adapter under src/harbor_patch/agents/, and select it with
HARBOR_AGENT_IMPORT_PATH. Every worker funnels through the same proxy, so a new
scaffold gets trajectory capture and routing for free.