LiveRL

LiveRL

Run Training

Run Training

Launch and monitor a training run

A LiveRL run boots vLLM to serve the policy, starts the in-process proxy in front of it, and lets verl drive the RL loop while a coding agent rolls out across tasks inside Harbor sandboxes. Everything is driven from a single launch script — no separate config file.

# single-node, synchronous (the minimal-cost default)
bash scripts/cleanup_before_run.sh   # optional: reset stale Ray/vLLM state
bash scripts/sync_1node_cc.sh        # boot vLLM + proxy, then run the PPO/GRPO/GSPO loop

You configure a run by editing the variables at the top of the script (or exporting them as env vars) — see Inputs & Configuration.

Launch in the background

Training runs for hours. Wrap it in tmux or nohup setsid so it survives shell disconnects, and tail logs/<exp>.log.

The boot sequence

  1. venv — built once by scripts/setup_env.sh (or reused via VENV_PATH).
  2. vLLM — verl launches DP × TP replicas; CUDA-graph capture on a 30B MoE (TP=4) is ~10–20 min before the first replica registers.
  3. In-process proxy — started by the verl agent loop; it advertises a per-session URL per trial (no standalone LiteLLM, no fixed port).
  4. verl loop — generate rollouts → run trials in Harbor → reward → update → checkpoint every save_freq steps.

In this section

  • Preflight — what to validate before a multi-hour run, including KV-head divisibility
  • Inference Stack — the agent → in-process proxy → vLLM chain and its health checks
  • Backends — Kubernetes vs Docker sandboxes
  • Results & Artifacts — logs, checkpoints, trajectories
  • Scaling Upadvanced: multi-node, fully-async, VeOmni/MoE, partial rollout, R3

Start single-node

The single-node path above is the minimal-cost way to run LiveRL — one 8×GPU host, one launch script. Only move to multiple machines when you outgrow it; see Scaling Up.

Stop cleanly between runs

Between runs, bash scripts/cleanup_before_run.sh stops Ray, frees ports, and reaps stale vLLM/verl processes. Checkpoints and wandb runs are preserved; set KEEP_TRIALS=0 to also clear the trials dir. It never kills running GPU tasks.

On this page