Official resources of "KBQA-o1: Agentic Knowledge Base Question Answering with Monte Carlo Tree Search". Haoran Luo, Haihong E, Yikai Guo, Qika Lin, Xiaobao Wu, Xinyu Mu, Wenhao Liu, Meina Song, Yifan Zhu, Luu Anh Tuan. ICML 2025 [paper].
conda create -n kbqao1 python=3.11
conda activate kbqao1
pip install torch==2.3.0
pip install -r requirements.txt
sudo apt install unixodbc
export PYTHONPATH=$PWDBelow steps are according to Freebase Virtuoso Setup.
(1) Clone from dki-lab/Freebase-Setup:
cd Freebase-Setup(2) Processed Freebase Virtuoso DB file can be downloaded from here or via wget (WARNING: 53G+ disk space is needed):
tar -zxvf virtuoso_db.zip(3) Managing the Virtuoso service: To start service:
chmod +x virtuoso-opensource/bin/virtuoso-t
python3 virtuoso.py start 3001 -d virtuoso_dband to stop a currently running service at the same port:
chmod +x virtuoso-opensource/bin/isql
python3 virtuoso.py stop 3001A server with at least 100 GB RAM is recommended.
- Download
fb_roles,fb_types,reverse_propertiesfrom here todataset/Freebase/.
KBQA-o1/
└── dataset/
├── Freebase/
├── fb_roles
├── fb_types
└── reverse_properties
Experiments are conducted on 3 classical KBQA benchmarks: WebQSP, GrailQA and GraphQ.
- WebQSP: Download the WebQSP dataset from here and put them under
dataset/WebQSP/origin. The dataset files should be named asWebQSP.test[train].json. - GrailQA: Download the GrailQA dataset here and put them under both
dataset/GrailQA/origin. The dataset files should be named asgrailqa_v1.0_test_public[train,dev].json. - GraphQ: Download the GraphQ dataset here and put them under both
dataset/GraphQ/origin. The dataset files should be named asgraphquestions_v1_fb15_test[training]_091420.json.
KBQA-o1/
└── dataset/
├── WebQSP/
├── origin/
├── WebQSP.train.json
└── WebQSP.test.json
├── GrailQA/
├── origin/
├── grailqa_v1.0_train.json
├── grailqa_v1.0_dev.json
└── grailqa_v1.0_test_public.json
├── GraphQ/
├── origin/
├── graphquestions_v1_fb15_training_091420.json
└── graphquestions_v1_fb15_test_091420.json
Parse SPARQL queries to S-expressions and Function-lists.
- WebQSP: Run
python data_process.py --dataset WebQSPand the merged data file will be saved asdataset/WebQSP/processed/WebQSP_train[test].json. - GrailQA: Run
python data_process.py --dataset GrailQAand the merged data file will be saved asdataset/GrailQA/processed/GrailQA_train[test,test_public].json. - GraphQ: Run
python data_process.py --dataset GraphQand the merged data file will be saved asdataset/GraphQ/processed/GraphQ_train[test].json.
KBQA-o1/
└── dataset/
├── WebQSP/
├── processed/
├── WebQSP_train.json
└── WebQSP_test.json
├── GrailQA/
├── processed/
├── GrailQA_train.json
└── GrailQA_test.json
├── GraphQ/
├── processed/
├── GraphQ_train.json
└── GraphQ_test.json
python prepare_sft_data.py --dataset WebQSPCUDA_VISIBLE_DEVICES=0,1,2,3 nohup deepspeed --num_gpus 4 --master_port=9902 src/train.py --deepspeed ds_config.json --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --stage sft --do_train --finetuning_type lora --lora_target q_proj,v_proj --dataset sft_KBQA_WebQSP_simulate --template llama3 --cutoff_len 1024 --overwrite_cache --preprocessing_num_workers 16 --output_dir expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft_KBQA_WebQSP_simulate/checkpoint --logging_steps 10 --save_steps 10000 --plot_loss --overwrite_output_dir --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --learning_rate 5e-5 --num_train_epochs 50.0 --lr_scheduler_type cosine --bf16 >> result_Llama-3.1-8B-Instruct_sft_KBQA_WebQSP_simulate.log 2>&1 &
CUDA_VISIBLE_DEVICES=0,1,2,3 nohup deepspeed --num_gpus 4 --master_port=9902 src/train.py --deepspeed ds_config.json --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --stage sft --do_train --finetuning_type lora --lora_target q_proj,v_proj --use_dora --dataset sft_KBQA_WebQSP_reward --template llama3 --cutoff_len 1024 --overwrite_cache --preprocessing_num_workers 16 --output_dir expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft_KBQA_WebQSP_reward/checkpoint --logging_steps 10 --save_steps 10000 --plot_loss --overwrite_output_dir --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --learning_rate 5e-5 --num_train_epochs 100.0 --lr_scheduler_type cosine --bf16 >> result_Llama-3.1-8B-Instruct_sft_KBQA_WebQSP_reward.log 2>&1 &CUDA_VISIBLE_DEVICES=0,1,2,3 python src/export_model.py --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --adapter_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft_KBQA_WebQSP_simulate/checkpoint --template llama3 --finetuning_type lora --use_dora --export_dir expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft_KBQA_WebQSP_simulate/export_model/simulate --export_size 2 --export_legacy_format FalseCUDA_VISIBLE_DEVICES=0,1,2,3 python src/export_model.py --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --adapter_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft_KBQA_WebQSP_reward/checkpoint --template llama3 --finetuning_type lora --use_dora --export_dir expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft_KBQA_WebQSP_reward/export_model/reward --export_size 2 --export_legacy_format FalseCUDA_VISIBLE_DEVICES=0 API_PORT=8101 MODEL_NAME=simulate nohup python src/llm_api.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft_KBQA_WebQSP_simulate/export_model/simulate --template llama3 --temperature 0.0 >> result_Llama-3.1-8B-Instruct_llm_api_sft_KBQA_WebQSP_simulate.log 2>&1 &
CUDA_VISIBLE_DEVICES=0 API_PORT=8102 MODEL_NAME=reward nohup python src/llm_api.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft_KBQA_WebQSP_reward/export_model/reward --template llama3 --temperature 0.0 >> result_Llama-3.1-8B-Instruct_llm_api_sft_KBQA_WebQSP_reward.log 2>&1 &CUDA_VISIBLE_DEVICES=0 nohup python run_explore.py --llm_simulate_name 8101/simulate --llm_reward_name 8102/reward --base Llama-3.1-8B-Instruct --task explore --dataset WebQSP >> result_Llama-3.1-8B-Instruct_explore_KBQA_WebQSP_sft.log 2>&1 &python prepare_sft2_data.py --llm_reward_name 8102/reward --base Llama-3.1-8B-Instruct --dataset WebQSP --limit "30"bash utils/kill_llm_api_WebQSP.shCUDA_VISIBLE_DEVICES=0,1,2,3 nohup deepspeed --num_gpus 4 --master_port=9902 src/train.py --deepspeed ds_config.json --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft_KBQA_WebQSP_simulate/export_model/simulate --stage sft --do_train --finetuning_type lora --lora_target q_proj,v_proj --dataset sft2_KBQA_WebQSP_simulate --template llama3 --cutoff_len 1024 --overwrite_cache --preprocessing_num_workers 16 --output_dir expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft2_KBQA_WebQSP_simulate/checkpoint --logging_steps 10 --save_steps 10000 --plot_loss --overwrite_output_dir --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --learning_rate 5e-5 --num_train_epochs 10.0 --lr_scheduler_type cosine --bf16 >> result_Llama-3.1-8B-Instruct_sft2_KBQA_WebQSP_simulate.log 2>&1 &
CUDA_VISIBLE_DEVICES=0,1,2,3 nohup deepspeed --num_gpus 4 --master_port=9902 src/train.py --deepspeed ds_config.json --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft_KBQA_WebQSP_reward/export_model/reward --stage sft --do_train --finetuning_type lora --lora_target q_proj,v_proj --use_dora --dataset sft2_KBQA_WebQSP_reward --template llama3 --cutoff_len 1024 --overwrite_cache --preprocessing_num_workers 16 --output_dir expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft2_KBQA_WebQSP_reward/checkpoint --logging_steps 10 --save_steps 10000 --plot_loss --overwrite_output_dir --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --learning_rate 5e-5 --num_train_epochs 20.0 --lr_scheduler_type cosine --bf16 >> result_Llama-3.1-8B-Instruct_sft2_KBQA_WebQSP_reward.log 2>&1 &CUDA_VISIBLE_DEVICES=0,1,2,3 python src/export_model.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft_KBQA_WebQSP_simulate/export_model/simulate --adapter_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft2_KBQA_WebQSP_simulate/checkpoint --template llama3 --finetuning_type lora --use_dora --export_dir expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft2_KBQA_WebQSP_simulate/export_model/simulate --export_size 2 --export_legacy_format FalseCUDA_VISIBLE_DEVICES=0,1,2,3 python src/export_model.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft_KBQA_WebQSP_reward/export_model/reward --adapter_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft2_KBQA_WebQSP_reward/checkpoint --template llama3 --finetuning_type lora --use_dora --export_dir expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft2_KBQA_WebQSP_reward/export_model/reward --export_size 2 --export_legacy_format FalseCUDA_VISIBLE_DEVICES=0 API_PORT=8101 MODEL_NAME=simulate nohup python src/llm_api.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft2_KBQA_WebQSP_simulate/export_model/simulate --template llama3 --temperature 0.0 >> result_Llama-3.1-8B-Instruct_llm_api_sft2_KBQA_WebQSP_simulate.log 2>&1 &
CUDA_VISIBLE_DEVICES=0 API_PORT=8102 MODEL_NAME=reward nohup python src/llm_api.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft2_KBQA_WebQSP_reward/export_model/reward --template llama3 --temperature 0.0 >> result_Llama-3.1-8B-Instruct_llm_api_sft2_KBQA_WebQSP_reward.log 2>&1 &CUDA_VISIBLE_DEVICES=0 nohup python run_explore.py --llm_simulate_name 8101/simulate --llm_reward_name 8102/reward --base Llama-3.1-8B-Instruct --task test --dataset WebQSP >> result_Llama-3.1-8B-Instruct_test_KBQA_WebQSP_sft2.log 2>&1 &bash utils/kill_llm_api_WebQSP.shpython prepare_sft_data.py --dataset GrailQACUDA_VISIBLE_DEVICES=0,1,2,3 nohup deepspeed --num_gpus 4 --master_port=9902 src/train.py --deepspeed ds_config.json --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --stage sft --do_train --finetuning_type lora --lora_target q_proj,v_proj --dataset sft_KBQA_GrailQA_simulate --template llama3 --cutoff_len 1024 --overwrite_cache --preprocessing_num_workers 16 --output_dir expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft_KBQA_GrailQA_simulate/checkpoint --logging_steps 10 --save_steps 10000 --plot_loss --overwrite_output_dir --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --learning_rate 5e-5 --num_train_epochs 100.0 --lr_scheduler_type cosine --bf16 >> result_Llama-3.1-8B-Instruct_sft_KBQA_GrailQA_simulate.log 2>&1 &
CUDA_VISIBLE_DEVICES=0,1,2,3 nohup deepspeed --num_gpus 4 --master_port=9902 src/train.py --deepspeed ds_config.json --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --stage sft --do_train --finetuning_type lora --lora_target q_proj,v_proj --use_dora --dataset sft_KBQA_GrailQA_reward --template llama3 --cutoff_len 1024 --overwrite_cache --preprocessing_num_workers 16 --output_dir expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft_KBQA_GrailQA_reward/checkpoint --logging_steps 10 --save_steps 10000 --plot_loss --overwrite_output_dir --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --learning_rate 5e-5 --num_train_epochs 300.0 --lr_scheduler_type cosine --bf16 >> result_Llama-3.1-8B-Instruct_sft_KBQA_GrailQA_reward.log 2>&1 &CUDA_VISIBLE_DEVICES=0,1,2,3 python src/export_model.py --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --adapter_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft_KBQA_GrailQA_simulate/checkpoint --template llama3 --finetuning_type lora --use_dora --export_dir expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft_KBQA_GrailQA_simulate/export_model/simulate --export_size 2 --export_legacy_format FalseCUDA_VISIBLE_DEVICES=0,1,2,3 python src/export_model.py --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --adapter_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft_KBQA_GrailQA_reward/checkpoint --template llama3 --finetuning_type lora --use_dora --export_dir expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft_KBQA_GrailQA_reward/export_model/reward --export_size 2 --export_legacy_format FalseCUDA_VISIBLE_DEVICES=1 API_PORT=8103 MODEL_NAME=simulate nohup python src/llm_api.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft_KBQA_GrailQA_simulate/export_model/simulate --template llama3 --temperature 0.0 >> result_Llama-3.1-8B-Instruct_llm_api_sft_KBQA_GrailQA_simulate.log 2>&1 &
CUDA_VISIBLE_DEVICES=1 API_PORT=8104 MODEL_NAME=reward nohup python src/llm_api.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft_KBQA_GrailQA_reward/export_model/reward --template llama3 --temperature 0.0 >> result_Llama-3.1-8B-Instruct_llm_api_sft_KBQA_GrailQA_reward.log 2>&1 &CUDA_VISIBLE_DEVICES=1 nohup python run_explore.py --llm_simulate_name 8103/simulate --llm_reward_name 8104/reward --base Llama-3.1-8B-Instruct --task explore --dataset GrailQA >> result_Llama-3.1-8B-Instruct_explore_KBQA_GrailQA_sft.log 2>&1 &python prepare_sft2_data.py --llm_reward_name 8104/reward --base Llama-3.1-8B-Instruct --dataset GrailQA --limit "-100"bash utils/kill_llm_api_GrailQA.shCUDA_VISIBLE_DEVICES=0,1,2,3 nohup deepspeed --num_gpus 4 --master_port=9902 src/train.py --deepspeed ds_config.json --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft_KBQA_GrailQA_simulate/export_model/simulate --stage sft --do_train --finetuning_type lora --lora_target q_proj,v_proj --dataset sft2_KBQA_GrailQA_simulate --template llama3 --cutoff_len 1024 --overwrite_cache --preprocessing_num_workers 16 --output_dir expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft2_KBQA_GrailQA_simulate/checkpoint --logging_steps 10 --save_steps 10000 --plot_loss --overwrite_output_dir --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --learning_rate 5e-5 --num_train_epochs 10.0 --lr_scheduler_type cosine --bf16 >> result_Llama-3.1-8B-Instruct_sft2_KBQA_GrailQA_simulate.log 2>&1 &
CUDA_VISIBLE_DEVICES=0,1,2,3 nohup deepspeed --num_gpus 4 --master_port=9902 src/train.py --deepspeed ds_config.json --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft_KBQA_GrailQA_reward/export_model/reward --stage sft --do_train --finetuning_type lora --lora_target q_proj,v_proj --use_dora --dataset sft2_KBQA_GrailQA_reward --template llama3 --cutoff_len 1024 --overwrite_cache --preprocessing_num_workers 16 --output_dir expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft2_KBQA_GrailQA_reward/checkpoint --logging_steps 10 --save_steps 10000 --plot_loss --overwrite_output_dir --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --learning_rate 5e-5 --num_train_epochs 20.0 --lr_scheduler_type cosine --bf16 >> result_Llama-3.1-8B-Instruct_sft2_KBQA_GrailQA_reward.log 2>&1 &CUDA_VISIBLE_DEVICES=0,1,2,3 python src/export_model.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft_KBQA_GrailQA_simulate/export_model/simulate --adapter_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft2_KBQA_GrailQA_simulate/checkpoint --template llama3 --finetuning_type lora --use_dora --export_dir expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft2_KBQA_GrailQA_simulate/export_model/simulate --export_size 2 --export_legacy_format FalseCUDA_VISIBLE_DEVICES=0,1,2,3 python src/export_model.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft_KBQA_GrailQA_reward/export_model/reward --adapter_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft2_KBQA_GrailQA_reward/checkpoint --template llama3 --finetuning_type lora --use_dora --export_dir expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft2_KBQA_GrailQA_reward/export_model/reward --export_size 2 --export_legacy_format FalseCUDA_VISIBLE_DEVICES=1 API_PORT=8103 MODEL_NAME=simulate nohup python src/llm_api.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft2_KBQA_GrailQA_simulate/export_model/simulate --template llama3 --temperature 0.0 >> result_Llama-3.1-8B-Instruct_llm_api_sft2_KBQA_GrailQA_simulate.log 2>&1 &
CUDA_VISIBLE_DEVICES=1 API_PORT=8104 MODEL_NAME=reward nohup python src/llm_api.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft2_KBQA_GrailQA_reward/export_model/reward --template llama3 --temperature 0.0 >> result_Llama-3.1-8B-Instruct_llm_api_sft2_KBQA_GrailQA_reward.log 2>&1 &CUDA_VISIBLE_DEVICES=1 nohup python run_explore.py --llm_simulate_name 8103/simulate --llm_reward_name 8104/reward --base Llama-3.1-8B-Instruct --task test --dataset GrailQA >> result_Llama-3.1-8B-Instruct_test_KBQA_GrailQA_sft2.log 2>&1 &bash utils/kill_llm_api_GrailQA.shpython prepare_sft_data.py --dataset GraphQCUDA_VISIBLE_DEVICES=0,1,2,3 nohup deepspeed --num_gpus 4 --master_port=9902 src/train.py --deepspeed ds_config.json --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --stage sft --do_train --finetuning_type lora --lora_target q_proj,v_proj --dataset sft_KBQA_GraphQ_simulate --template llama3 --cutoff_len 1024 --overwrite_cache --preprocessing_num_workers 16 --output_dir expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft_KBQA_GraphQ_simulate/checkpoint --logging_steps 10 --save_steps 10000 --plot_loss --overwrite_output_dir --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --learning_rate 5e-5 --num_train_epochs 50.0 --lr_scheduler_type cosine --bf16 >> result_Llama-3.1-8B-Instruct_sft_KBQA_GraphQ_simulate.log 2>&1 &
CUDA_VISIBLE_DEVICES=0,1,2,3 nohup deepspeed --num_gpus 4 --master_port=9902 src/train.py --deepspeed ds_config.json --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --stage sft --do_train --finetuning_type lora --lora_target q_proj,v_proj --use_dora --dataset sft_KBQA_GraphQ_reward --template llama3 --cutoff_len 1024 --overwrite_cache --preprocessing_num_workers 16 --output_dir expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft_KBQA_GraphQ_reward/checkpoint --logging_steps 10 --save_steps 10000 --plot_loss --overwrite_output_dir --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --learning_rate 5e-5 --num_train_epochs 100.0 --lr_scheduler_type cosine --bf16 >> result_Llama-3.1-8B-Instruct_sft_KBQA_GraphQ_reward.log 2>&1 &CUDA_VISIBLE_DEVICES=0,1,2,3 python src/export_model.py --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --adapter_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft_KBQA_GraphQ_simulate/checkpoint --template llama3 --finetuning_type lora --use_dora --export_dir expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft_KBQA_GraphQ_simulate/export_model/simulate --export_size 2 --export_legacy_format FalseCUDA_VISIBLE_DEVICES=0,1,2,3 python src/export_model.py --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --adapter_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft_KBQA_GraphQ_reward/checkpoint --template llama3 --finetuning_type lora --use_dora --export_dir expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft_KBQA_GraphQ_reward/export_model/reward --export_size 2 --export_legacy_format FalseCUDA_VISIBLE_DEVICES=2 API_PORT=8105 MODEL_NAME=simulate nohup python src/llm_api.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft_KBQA_GraphQ_simulate/export_model/simulate --template llama3 --temperature 0.0 >> result_Llama-3.1-8B-Instruct_llm_api_sft_KBQA_GraphQ_simulate.log 2>&1 &
CUDA_VISIBLE_DEVICES=2 API_PORT=8106 MODEL_NAME=reward nohup python src/llm_api.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft_KBQA_GraphQ_reward/export_model/reward --template llama3 --temperature 0.0 >> result_Llama-3.1-8B-Instruct_llm_api_sft_KBQA_GraphQ_reward.log 2>&1 &CUDA_VISIBLE_DEVICES=2 nohup python run_explore.py --llm_simulate_name 8105/simulate --llm_reward_name 8106/reward --base Llama-3.1-8B-Instruct --task explore --dataset GraphQ >> result_Llama-3.1-8B-Instruct_explore_KBQA_GraphQ_sft.log 2>&1 &python prepare_sft2_data.py --llm_reward_name 8106/reward --base Llama-3.1-8B-Instruct --dataset GraphQ --limit "-50"bash utils/kill_llm_api_GraphQ.shCUDA_VISIBLE_DEVICES=0,1,2,3 nohup deepspeed --num_gpus 4 --master_port=9902 src/train.py --deepspeed ds_config.json --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft_KBQA_GraphQ_simulate/export_model/simulate --stage sft --do_train --finetuning_type lora --lora_target q_proj,v_proj --dataset sft2_KBQA_GraphQ_simulate --template llama3 --cutoff_len 1024 --overwrite_cache --preprocessing_num_workers 16 --output_dir expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft2_KBQA_GraphQ_simulate/checkpoint --logging_steps 10 --save_steps 10000 --plot_loss --overwrite_output_dir --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --learning_rate 5e-5 --num_train_epochs 10.0 --lr_scheduler_type cosine --bf16 >> result_Llama-3.1-8B-Instruct_sft2_KBQA_GraphQ_simulate.log 2>&1 &
CUDA_VISIBLE_DEVICES=0,1,2,3 nohup deepspeed --num_gpus 4 --master_port=9902 src/train.py --deepspeed ds_config.json --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft_KBQA_GraphQ_reward/export_model/reward --stage sft --do_train --finetuning_type lora --lora_target q_proj,v_proj --use_dora --dataset sft2_KBQA_GraphQ_reward --template llama3 --cutoff_len 1024 --overwrite_cache --preprocessing_num_workers 16 --output_dir expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft2_KBQA_GraphQ_reward/checkpoint --logging_steps 10 --save_steps 10000 --plot_loss --overwrite_output_dir --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --learning_rate 5e-5 --num_train_epochs 20.0 --lr_scheduler_type cosine --bf16 >> result_Llama-3.1-8B-Instruct_sft2_KBQA_GraphQ_reward.log 2>&1 &CUDA_VISIBLE_DEVICES=0,1,2,3 python src/export_model.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft_KBQA_GraphQ_simulate/export_model/simulate --adapter_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft2_KBQA_GraphQ_simulate/checkpoint --template llama3 --finetuning_type lora --use_dora --export_dir expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft2_KBQA_GraphQ_simulate/export_model/simulate --export_size 2 --export_legacy_format FalseCUDA_VISIBLE_DEVICES=0,1,2,3 python src/export_model.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft_KBQA_GraphQ_reward/export_model/reward --adapter_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft2_KBQA_GraphQ_reward/checkpoint --template llama3 --finetuning_type lora --use_dora --export_dir expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft2_KBQA_GraphQ_reward/export_model/reward --export_size 2 --export_legacy_format FalseCUDA_VISIBLE_DEVICES=2 API_PORT=8105 MODEL_NAME=simulate nohup python src/llm_api.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft2_KBQA_GraphQ_simulate/export_model/simulate --template llama3 --temperature 0.0 >> result_Llama-3.1-8B-Instruct_llm_api_sft2_KBQA_GraphQ_simulate.log 2>&1 &
CUDA_VISIBLE_DEVICES=2 API_PORT=8106 MODEL_NAME=reward nohup python src/llm_api.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft2_KBQA_GraphQ_reward/export_model/reward --template llama3 --temperature 0.0 >> result_Llama-3.1-8B-Instruct_llm_api_sft2_KBQA_GraphQ_reward.log 2>&1 &CUDA_VISIBLE_DEVICES=2 nohup python run_explore.py --llm_simulate_name 8105/simulate --llm_reward_name 8106/reward --base Llama-3.1-8B-Instruct --task test --dataset GraphQ >> result_Llama-3.1-8B-Instruct_test_KBQA_GraphQ_sft2.log 2>&1 &bash utils/kill_llm_api_GraphQ.shIf you find this work is helpful for your research, please cite:
@InProceedings{luo2025kbqao1,
title = {{KBQA}-o1: Agentic Knowledge Base Question Answering with {M}onte {C}arlo Tree Search},
author = {Luo, Haoran and E, Haihong and Guo, Yikai and Lin, Qika and Wu, Xiaobao and Mu, Xinyu and Liu, Wenhao and Song, Meina and Zhu, Yifan and Luu, Anh Tuan},
booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
pages = {41177--41199},
year = {2025},
editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
volume = {267},
series = {Proceedings of Machine Learning Research},
month = {13--19 Jul},
publisher = {PMLR},
pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/luo25d/luo25d.pdf},
url = {https://proceedings.mlr.press/v267/luo25d.html},
abstract = {Knowledge Base Question Answering (KBQA) aims to answer natural language questions with a large-scale structured knowledge base (KB). Despite advancements with large language models (LLMs), KBQA still faces challenges in weak KB awareness, imbalance between effectiveness and efficiency, and high reliance on annotated data. To address these challenges, we propose KBQA-o1, a novel agentic KBQA method with Monte Carlo Tree Search (MCTS). It introduces a ReAct-based agent process for stepwise logical form generation with KB environment exploration. Moreover, it employs MCTS, a heuristic search method driven by policy and reward models, to balance agentic exploration’s performance and search space. With heuristic exploration, KBQA-o1 generates high-quality annotations for further improvement by incremental fine-tuning. Experimental results show that KBQA-o1 outperforms previous low-resource KBQA methods with limited annotated data, boosting Llama-3.1-8B model’s GrailQA F1 performance to 78.5% compared to 48.5% of the previous sota method with GPT-3.5-turbo. Our code is publicly available.}
}For further questions, please contact: [email protected].
This repo benefits from KB-Coder, LLM-Reasoners and LLaMA-Factory. Thanks for their wonderful works.
