LiveRL

LiveRL

Reference

Inputs & Configuration

The launch-script variables that define a run

LiveRL has no config.yaml. A run is configured by the variables at the top of its launch script (e.g. scripts/sync_1node_cc.sh), which can also be overridden as environment variables. The script exports them, then execs verl.trainer.main_ppo with Hydra overrides built from those values plus the YAML defaults under src/verl_patch/config/.

Configuration layers

LayerWhat it isEdit it?
Launch-script variablesmodel, data, topology, batch, algorithm, backendYes — this is the user surface
Environment variablessame names, override the script defaultsYes — for one-off runs
YAML defaults (src/verl_patch/config/*.yaml)agent-loop + trainer defaults consumed by HydraRarely

Inputs

The values you normally set in scripts/sync_1node_cc.sh:

VariableDescription
MODEL_PATHPolicy checkpoint to train
SERVED_MODEL_NAMEvLLM model name advertised (default vllm_model)
TRAIN_INDEX / VAL_INDEXTrain / val Harbor task parquet indexes
PROJECT_NAME / EXP_NAMEwandb project + experiment name
NNODES / NGPUS_PER_NODECluster topology
gen_tpvLLM tensor-parallel degree (must divide num_key_value_heads)
K8S_KUBECONFIG / K8S_NAMESPACEKubernetes backend
HARBOR_AGENT_IMPORT_PATHAgent scaffold
HARBOR_ENVIRONMENT_IMPORT_PATHSandbox backend (K8s / remote Docker)
NUM_WORKERSParallel trials per step (16 cold-start; raise for steady state)
WANDB_API_KEY / WANDB_MODEwandb credentials + mode (set in your shell)

Hyperparameters

Also set in the launch script:

adv_estimator=grpo            # ppo | grpo
policy_loss_mode=gspo         # gspo | ...
learning_rate=1.0e-06
train_prompt_bsz=64
n_resp_per_prompt=8           # 64 × 8 = 512 trials/step
max_prompt_length=40000
max_response_length=68000     # window = prompt + response
save_freq=5
test_freq=5
temperature=1.0

KV-head divisibility

gen_tp must divide the model's num_key_value_heads, or training crashes at the first forward pass. See Preflight.

Outputs

A run writes, relative to the repo root:

OutputPathNotes
Checkpointscheckpoints/<project>/<exp>/global_step_N/actor FSDP shards (per save_freq)
Training loglogs/<exp>.logper-run training log
Throughput loglogs/<exp>_vllm.logthroughput-only log
Trajectoriesharbor_trials/<project>/<exp>/per-trial proxy_trajectory.json
Curves / val resolve ratewandb + dashboardvalidation resolve rate and metric curves

See Results & Artifacts for the full layout, and Config Variants for common edits.

On this page