Reference
Status & State
How to tell what is running and what a run did
LiveRL keeps no run-state file — the launch script is one-shot configuration. Whether a run is live, and what it did, is answered from the process table, the logs, and the dashboard.
Is a run live?
pgrep -af 'sync_1node_cc|fully_async|train' # launch scripts
pgrep -af 'main_ppo' # verl trainerOnce nothing matches, the box is free — there is no stale flag to clean up after
a crash. If a previous run left Ray/vLLM processes or ports behind, reset with
bash scripts/cleanup_before_run.sh.
vLLM / rollout readiness
The rollout replicas register as Ray named actors (vllm_server_*). Check them
(and GPU utilization) as described in
Inference Stack.
What a run did
- Logs —
logs/<exp>.logcarries the per-stepstep:N - key:valuemetric lines;logs/<exp>_vllm.logis the throughput-only stream. - Checkpoints — the newest
checkpoints/<project>/<exp>/global_step_N/is the latest saved actor. - Dashboard — the dashboard reads the logs directly and infers a run's state from log file mtime/size (running vs finished).