Environment
Sandboxed execution, the verifier, the scaffolds, and the backends
The Environment is where a task actually runs. Every trial gets a fresh sandbox, a coding-agent scaffold drives the model through the task inside it, and the task's own verifier decides the reward. This is the layer that makes the reward real.
Sandboxed execution + verifier
Each task ships a repository snapshot and a verifier — its own test script.
A trial proceeds: the agent explores the repo, edits files, and runs commands in
the sandbox until it submits or hits the turn/timeout limit; then the verifier
runs the test suite and emits a reward (1.0 if the issue is resolved, 0.0
otherwise). Because the grader is the task's real tests, there is no learned
reward model to drift or hack. See Reward.
Scaffolds
The scaffold is the agent harness that turns model outputs into actions —
Claude Code, OpenCode, OpenHands, Terminus 2, and others under
src/harbor_patch/agents/. The scaffold determines the rollout's shape and which
runtime image Harbor launches; it is selected per run with
HARBOR_AGENT_IMPORT_PATH (default
harbor_patch.agents.image_mounted_claude_code:ClaudeCode). Whatever the
scaffold's native protocol, it talks to the policy through the
In-Process Proxy.
Backends
The sandbox itself runs on one of three backends, selected with
HARBOR_ENVIRONMENT_IMPORT_PATH:
| Backend | Unit of execution | Use |
|---|---|---|
| Kubernetes | one pod per trial | production default, large parallel runs |
| Local / Remote Docker | one container per trial | minimal setup, no cluster |
| EC2 / ECS service | a managed cloud service | elastic capacity |
See Backends for the concrete config and the trade-offs.
Runtime features
The Environment layer also carries the machinery that makes real sandboxes fast and trustworthy at scale:
- Nydus cold start — lazy-loading container images so a pod is usable before the whole image is pulled.
- Agent-runtime image mounting — the scaffold/runtime is pre-baked into the agent image and mounted in, rather than installed per pod (which fails on no-egress task pods).
- Image cache — shared image layers across trials to cut startup time.
- Anti-hacking — guards against reward hacking (e.g. agents re-cloning a repo's GitHub history to recover the fix), so the reward reflects a genuine solution.