Inputs & Configuration

LiveRL has no config.yaml. A run is configured by the variables at the top of its launch script (e.g. scripts/sync_1node_cc.sh), which can also be overridden as environment variables. The script exports them, then execs verl.trainer.main_ppo with Hydra overrides built from those values plus the YAML defaults under src/verl_patch/config/.

Configuration layers

Layer	What it is	Edit it?
Launch-script variables	model, data, topology, batch, algorithm, backend	Yes — this is the user surface
Environment variables	same names, override the script defaults	Yes — for one-off runs
YAML defaults (`src/verl_patch/config/*.yaml`)	agent-loop + trainer defaults consumed by Hydra	Rarely

Inputs

The values you normally set in scripts/sync_1node_cc.sh:

Variable	Description
`MODEL_PATH`	Policy checkpoint to train
`SERVED_MODEL_NAME`	vLLM model name advertised (default `vllm_model`)
`TRAIN_INDEX` / `VAL_INDEX`	Train / val Harbor task parquet indexes
`PROJECT_NAME` / `EXP_NAME`	wandb project + experiment name
`NNODES` / `NGPUS_PER_NODE`	Cluster topology
`gen_tp`	vLLM tensor-parallel degree (must divide `num_key_value_heads`)
`K8S_KUBECONFIG` / `K8S_NAMESPACE`	Kubernetes backend
`HARBOR_AGENT_IMPORT_PATH`	Agent scaffold
`HARBOR_ENVIRONMENT_IMPORT_PATH`	Sandbox backend (K8s / remote Docker)
`NUM_WORKERS`	Parallel trials per step (16 cold-start; raise for steady state)
`WANDB_API_KEY` / `WANDB_MODE`	wandb credentials + mode (set in your shell)

Hyperparameters

Also set in the launch script:

adv_estimator=grpo            # ppo | grpo
policy_loss_mode=gspo         # gspo | ...
learning_rate=1.0e-06
train_prompt_bsz=64
n_resp_per_prompt=8           # 64 × 8 = 512 trials/step
max_prompt_length=40000
max_response_length=68000     # window = prompt + response
save_freq=5
test_freq=5
temperature=1.0

KV-head divisibility

gen_tp must divide the model's num_key_value_heads, or training crashes at the first forward pass. See Preflight.

Outputs

A run writes, relative to the repo root:

Output	Path	Notes
Checkpoints	`checkpoints/<project>/<exp>/global_step_N/`	actor FSDP shards (per `save_freq`)
Training log	`logs/<exp>.log`	per-run training log
Throughput log	`logs/<exp>_vllm.log`	throughput-only log
Trajectories	`harbor_trials/<project>/<exp>/`	per-trial `proxy_trajectory.json`
Curves / val resolve rate	wandb + dashboard	validation resolve rate and metric curves

See Results & Artifacts for the full layout, and Config Variants for common edits.

Inputs & Configuration

Configuration layers

Inputs

Hyperparameters

Outputs

On this page