AgentLoopWorkers

AgentLoopWorkers are the verl processes that actually run the trials. The trainer hands each step's batch of tasks to a pool of workers; each worker drives one scaffold through one task, start to finish, and returns the captured rollout.

What a worker does

For each trial a worker:

Launches the configured scaffold inside a fresh Environment sandbox.
Points the scaffold at the In-Process Proxy (overriding its base URL with a unique per-session URL).
Lets the agent run multi-turn — reason, edit, run tests — until it submits or hits the turn/timeout limit.
Collects the verifier's reward and the per-token trajectory for the training update.

Concurrency

NUM_WORKERS sets how many trials run in parallel (16 at cold start; raise for steady state). Because a trial is dominated by sandbox/environment execution rather than GPU compute, running many workers concurrently is the main lever for keeping the inference servers — and, in synchronous mode, the trainer — busy.

The agent-loop classes (and Custom Workers)

The worker's per-scaffold logic lives in an agent-loop class:

BuiltinCCAgentLoop — Claude Code (Anthropic protocol).
BuiltinSWEAgentLoop — OpenHands / OpenCode / Terminus (OpenAI protocol).

A Custom Worker is simply a new agent-loop class: implement the loop for your scaffold, drop the adapter under src/harbor_patch/agents/, and select it with HARBOR_AGENT_IMPORT_PATH. Every worker funnels through the same proxy, so a new scaffold gets trajectory capture and routing for free.

AgentLoopWorkers

What a worker does

Concurrency

The agent-loop classes (and Custom Workers)

On this page