Skip to content

Search Agent training problem #217

@GitMonkey0

Description

@GitMonkey0

When training search agent using 512 batch size, i got an error as follow, only using a small batch size can fix it.

raise Exception(f"API request failed with status {response.status}: {error_text}")

Traceback (most recent call last): File "rllm/rllm/router/router.py", line 48, in poll_completions_openai raise Exception(f"API request failed with status {response.status}: {error_text}") Exception: API request failed with status 500: Internal Server Error

My settings are as follow:
`
set -x

export VLLM_ATTENTION_BACKEND=FLASH_ATTN
export PYTORCH_CUDA_ALLOC_CONF="expandable_segments:False"
export VLLM_USE_V1=1
export VLLM_ALLOW_LONG_MAX_MODEL_LEN=1
export VLLM_ENGINE_ITERATION_TIMEOUT_S=100000000000

RLLM_DIR=$(python3 -c "import rllm; import os; print(os.path.dirname(os.path.dirname(rllm.file)))")

python3 -m examples.search.train_search_agent
algorithm.adv_estimator=loop
data.train_batch_size=512
data.val_batch_size=256
data.max_prompt_length=2048
data.max_response_length=2048
actor_rollout_ref.model.path=Qwen/Qwen3-4B
actor_rollout_ref.hybrid_engine=True
actor_rollout_ref.actor.optim.lr=1e-6
actor_rollout_ref.model.use_remove_padding=True
actor_rollout_ref.actor.loss_agg_mode=seq-mean-token-sum
actor_rollout_ref.actor.ppo_mini_batch_size=256
actor_rollout_ref.actor.use_dynamic_bsz=True
actor_rollout_ref.actor.ppo_max_token_len_per_gpu=24000
actor_rollout_ref.actor.use_kl_loss=False
actor_rollout_ref.actor.clip_ratio_high=0.28
actor_rollout_ref.actor.kl_loss_coef=0.001
actor_rollout_ref.actor.kl_loss_type=low_var_kl
actor_rollout_ref.actor.ulysses_sequence_parallel_size=1
actor_rollout_ref.model.enable_gradient_checkpointing=True
actor_rollout_ref.actor.fsdp_config.param_offload=True
actor_rollout_ref.actor.fsdp_config.optimizer_offload=True
actor_rollout_ref.rollout.tensor_model_parallel_size=1
actor_rollout_ref.rollout.name=vllm
actor_rollout_ref.rollout.mode="async"
actor_rollout_ref.rollout.chat_scheduler=verl.schedulers.completions_scheduler.CompletionsScheduler
actor_rollout_ref.rollout.enforce_eager=False
actor_rollout_ref.rollout.temperature=0.7
actor_rollout_ref.rollout.gpu_memory_utilization=0.6
actor_rollout_ref.rollout.n=8
actor_rollout_ref.rollout.val_kwargs.n=1
actor_rollout_ref.rollout.val_kwargs.temperature=0.7
actor_rollout_ref.rollout.val_kwargs.top_p=0.8
actor_rollout_ref.rollout.val_kwargs.top_k=20
actor_rollout_ref.ref.fsdp_config.param_offload=True
actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=16
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=16
actor_rollout_ref.actor.entropy_coeff=0
algorithm.kl_ctrl.kl_coef=0.001
algorithm.mask_truncated_samples=False
algorithm.clip_advantages=False
trainer.critic_warmup=0
trainer.logger=['console','wandb']
trainer.project_name='deepscaler-agent'
trainer.experiment_name='4b-loop-drgrpo-search_agent'
trainer.val_before_train=False
trainer.n_gpus_per_node=4
trainer.nnodes=1
trainer.save_freq=-1
trainer.test_freq=20
trainer.default_hdfs_dir=null
agent.max_steps=10
agent.async_engine=True
trainer.total_epochs=1
`

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions