LiveRL

LiveRL

Reference

Status & State

How to tell what is running and what a run did

LiveRL keeps no run-state file — the launch script is one-shot configuration. Whether a run is live, and what it did, is answered from the process table, the logs, and the dashboard.

Is a run live?

pgrep -af 'sync_1node_cc|fully_async|train'   # launch scripts
pgrep -af 'main_ppo'                          # verl trainer

Once nothing matches, the box is free — there is no stale flag to clean up after a crash. If a previous run left Ray/vLLM processes or ports behind, reset with bash scripts/cleanup_before_run.sh.

vLLM / rollout readiness

The rollout replicas register as Ray named actors (vllm_server_*). Check them (and GPU utilization) as described in Inference Stack.

What a run did

  • Logslogs/<exp>.log carries the per-step step:N - key:value metric lines; logs/<exp>_vllm.log is the throughput-only stream.
  • Checkpoints — the newest checkpoints/<project>/<exp>/global_step_N/ is the latest saved actor.
  • Dashboard — the dashboard reads the logs directly and infers a run's state from log file mtime/size (running vs finished).

On this page