Run Training
Launch and monitor a training run
A LiveRL run boots vLLM to serve the policy, starts the in-process proxy in front of it, and lets verl drive the RL loop while a coding agent rolls out across tasks inside Harbor sandboxes. Everything is driven from a single launch script — no separate config file.
# single-node, synchronous (the minimal-cost default)
bash scripts/cleanup_before_run.sh # optional: reset stale Ray/vLLM state
bash scripts/sync_1node_cc.sh # boot vLLM + proxy, then run the PPO/GRPO/GSPO loopYou configure a run by editing the variables at the top of the script (or exporting them as env vars) — see Inputs & Configuration.
Launch in the background
Training runs for hours. Wrap it in tmux or nohup setsid so it survives shell
disconnects, and tail logs/<exp>.log.
The boot sequence
- venv — built once by
scripts/setup_env.sh(or reused viaVENV_PATH). - vLLM — verl launches DP × TP replicas; CUDA-graph capture on a 30B MoE (TP=4) is ~10–20 min before the first replica registers.
- In-process proxy — started by the verl agent loop; it advertises a per-session URL per trial (no standalone LiteLLM, no fixed port).
- verl loop — generate rollouts → run trials in Harbor → reward → update →
checkpoint every
save_freqsteps.
In this section
- Preflight — what to validate before a multi-hour run, including KV-head divisibility
- Inference Stack — the agent → in-process proxy → vLLM chain and its health checks
- Backends — Kubernetes vs Docker sandboxes
- Results & Artifacts — logs, checkpoints, trajectories
- Scaling Up — advanced: multi-node, fully-async, VeOmni/MoE, partial rollout, R3
Start single-node
The single-node path above is the minimal-cost way to run LiveRL — one 8×GPU host, one launch script. Only move to multiple machines when you outgrow it; see Scaling Up.
Stop cleanly between runs
Between runs, bash scripts/cleanup_before_run.sh stops Ray, frees ports, and
reaps stale vLLM/verl processes. Checkpoints and wandb runs are preserved; set
KEEP_TRIALS=0 to also clear the trials dir. It never kills running GPU tasks.