KBQA-o1

Official resources of "KBQA-o1: Agentic Knowledge Base Question Answering with Monte Carlo Tree Search". Haoran Luo, Haihong E, Yikai Guo, Qika Lin, Xiaobao Wu, Xinyu Mu, Wenhao Liu, Meina Song, Yifan Zhu, Luu Anh Tuan. ICML 2025 [paper].

Overview

Preparation

Step1: Conda Environment

conda create -n kbqao1 python=3.11
conda activate kbqao1
pip install torch==2.3.0
pip install -r requirements.txt
sudo apt install unixodbc
export PYTHONPATH=$PWD

Step2: Setup Freebase KG Service

Below steps are according to Freebase Virtuoso Setup.

(1) Clone from dki-lab/Freebase-Setup:

cd Freebase-Setup

(2) Processed Freebase Virtuoso DB file can be downloaded from here or via wget (WARNING: 53G+ disk space is needed):

tar -zxvf virtuoso_db.zip

(3) Managing the Virtuoso service: To start service:

chmod +x virtuoso-opensource/bin/virtuoso-t
python3 virtuoso.py start 3001 -d virtuoso_db

and to stop a currently running service at the same port:

chmod +x virtuoso-opensource/bin/isql
python3 virtuoso.py stop 3001

A server with at least 100 GB RAM is recommended.

Step3: Download Freebase Ontology Files

Download fb_roles, fb_types, reverse_properties from here to dataset/Freebase/.

KBQA-o1/
└── dataset/
    ├── Freebase/   
        ├── fb_roles
        ├── fb_types
        └── reverse_properties

Step4: Download KBQA Datasets

Experiments are conducted on 3 classical KBQA benchmarks: WebQSP, GrailQA and GraphQ.

WebQSP: Download the WebQSP dataset from here and put them under dataset/WebQSP/origin. The dataset files should be named as WebQSP.test[train].json.
GrailQA: Download the GrailQA dataset here and put them under both dataset/GrailQA/origin. The dataset files should be named as grailqa_v1.0_test_public[train,dev].json.
GraphQ: Download the GraphQ dataset here and put them under both dataset/GraphQ/origin. The dataset files should be named as graphquestions_v1_fb15_test[training]_091420.json.

KBQA-o1/
└── dataset/
    ├── WebQSP/                  
        ├── origin/                    
            ├── WebQSP.train.json                    
            └── WebQSP.test.json 
    ├── GrailQA/                 
        ├── origin/                    
            ├── grailqa_v1.0_train.json                   
            ├── grailqa_v1.0_dev.json      
            └── grailqa_v1.0_test_public.json  
    ├── GraphQ/                 
        ├── origin/                    
            ├── graphquestions_v1_fb15_training_091420.json
            └── graphquestions_v1_fb15_test_091420.json

Step5: Preprocess KBQA Datasets

Parse SPARQL queries to S-expressions and Function-lists.

WebQSP: Run python data_process.py --dataset WebQSP and the merged data file will be saved as dataset/WebQSP/processed/WebQSP_train[test].json.
GrailQA: Run python data_process.py --dataset GrailQA and the merged data file will be saved as dataset/GrailQA/processed/GrailQA_train[test,test_public].json.
GraphQ: Run python data_process.py --dataset GraphQ and the merged data file will be saved as dataset/GraphQ/processed/GraphQ_train[test].json.

KBQA-o1/
└── dataset/
    ├── WebQSP/                  
        ├── processed/                    
            ├── WebQSP_train.json                    
            └── WebQSP_test.json  
    ├── GrailQA/                 
        ├── processed/                    
            ├── GrailQA_train.json   
            └── GrailQA_test.json                 
    ├── GraphQ/                 
        ├── processed/                    
            ├── GraphQ_train.json                    
            └── GraphQ_test.json

KBQA-o1

WebQSP

Step1: Prepare SFT Data for KBQA

python prepare_sft_data.py --dataset WebQSP

Step2: SFT Training Simulate Model and Reward Model

CUDA_VISIBLE_DEVICES=0,1,2,3 nohup deepspeed --num_gpus 4 --master_port=9902 src/train.py --deepspeed ds_config.json --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --stage sft --do_train --finetuning_type lora --lora_target q_proj,v_proj --dataset sft_KBQA_WebQSP_simulate --template llama3 --cutoff_len 1024 --overwrite_cache --preprocessing_num_workers 16 --output_dir expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft_KBQA_WebQSP_simulate/checkpoint  --logging_steps 10 --save_steps 10000 --plot_loss --overwrite_output_dir --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --learning_rate 5e-5 --num_train_epochs 50.0 --lr_scheduler_type cosine --bf16 >> result_Llama-3.1-8B-Instruct_sft_KBQA_WebQSP_simulate.log 2>&1 &
CUDA_VISIBLE_DEVICES=0,1,2,3 nohup deepspeed --num_gpus 4 --master_port=9902 src/train.py --deepspeed ds_config.json --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --stage sft --do_train --finetuning_type lora --lora_target q_proj,v_proj --use_dora --dataset sft_KBQA_WebQSP_reward --template llama3 --cutoff_len 1024 --overwrite_cache --preprocessing_num_workers 16 --output_dir expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft_KBQA_WebQSP_reward/checkpoint  --logging_steps 10 --save_steps 10000 --plot_loss --overwrite_output_dir --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --learning_rate 5e-5 --num_train_epochs 100.0 --lr_scheduler_type cosine --bf16 >> result_Llama-3.1-8B-Instruct_sft_KBQA_WebQSP_reward.log 2>&1 &

Step3: Export 2 Models

CUDA_VISIBLE_DEVICES=0,1,2,3 python src/export_model.py --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --adapter_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft_KBQA_WebQSP_simulate/checkpoint --template llama3 --finetuning_type lora --use_dora --export_dir expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft_KBQA_WebQSP_simulate/export_model/simulate --export_size 2 --export_legacy_format False

CUDA_VISIBLE_DEVICES=0,1,2,3 python src/export_model.py --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --adapter_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft_KBQA_WebQSP_reward/checkpoint --template llama3 --finetuning_type lora --use_dora --export_dir expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft_KBQA_WebQSP_reward/export_model/reward --export_size 2 --export_legacy_format False

Step4: Start LLM API for 2 Models

CUDA_VISIBLE_DEVICES=0 API_PORT=8101 MODEL_NAME=simulate nohup python src/llm_api.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft_KBQA_WebQSP_simulate/export_model/simulate --template llama3 --temperature 0.0 >> result_Llama-3.1-8B-Instruct_llm_api_sft_KBQA_WebQSP_simulate.log 2>&1 &
CUDA_VISIBLE_DEVICES=0 API_PORT=8102 MODEL_NAME=reward nohup python src/llm_api.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft_KBQA_WebQSP_reward/export_model/reward --template llama3 --temperature 0.0 >> result_Llama-3.1-8B-Instruct_llm_api_sft_KBQA_WebQSP_reward.log 2>&1 &

Step5: Explore KBQA

CUDA_VISIBLE_DEVICES=0 nohup python run_explore.py --llm_simulate_name 8101/simulate --llm_reward_name 8102/reward --base Llama-3.1-8B-Instruct --task explore --dataset WebQSP >> result_Llama-3.1-8B-Instruct_explore_KBQA_WebQSP_sft.log 2>&1 &

Step6: Prepare SFT2 Data for KBQA

python prepare_sft2_data.py --llm_reward_name 8102/reward --base Llama-3.1-8B-Instruct --dataset WebQSP --limit "30"

Step7: End LLM API for 2 Models

bash utils/kill_llm_api_WebQSP.sh

Step8: SFT2 Training 2 Models

CUDA_VISIBLE_DEVICES=0,1,2,3 nohup deepspeed --num_gpus 4 --master_port=9902 src/train.py --deepspeed ds_config.json --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft_KBQA_WebQSP_simulate/export_model/simulate --stage sft --do_train --finetuning_type lora --lora_target q_proj,v_proj --dataset sft2_KBQA_WebQSP_simulate --template llama3 --cutoff_len 1024 --overwrite_cache --preprocessing_num_workers 16 --output_dir expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft2_KBQA_WebQSP_simulate/checkpoint  --logging_steps 10 --save_steps 10000 --plot_loss --overwrite_output_dir --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --learning_rate 5e-5 --num_train_epochs 10.0 --lr_scheduler_type cosine --bf16 >> result_Llama-3.1-8B-Instruct_sft2_KBQA_WebQSP_simulate.log 2>&1 &
CUDA_VISIBLE_DEVICES=0,1,2,3 nohup deepspeed --num_gpus 4 --master_port=9902 src/train.py --deepspeed ds_config.json --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft_KBQA_WebQSP_reward/export_model/reward --stage sft --do_train --finetuning_type lora --lora_target q_proj,v_proj --use_dora --dataset sft2_KBQA_WebQSP_reward --template llama3 --cutoff_len 1024 --overwrite_cache --preprocessing_num_workers 16 --output_dir expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft2_KBQA_WebQSP_reward/checkpoint  --logging_steps 10 --save_steps 10000 --plot_loss --overwrite_output_dir --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --learning_rate 5e-5 --num_train_epochs 20.0 --lr_scheduler_type cosine --bf16 >> result_Llama-3.1-8B-Instruct_sft2_KBQA_WebQSP_reward.log 2>&1 &

Step9: Export 2 Models

CUDA_VISIBLE_DEVICES=0,1,2,3 python src/export_model.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft_KBQA_WebQSP_simulate/export_model/simulate --adapter_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft2_KBQA_WebQSP_simulate/checkpoint --template llama3 --finetuning_type lora --use_dora --export_dir expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft2_KBQA_WebQSP_simulate/export_model/simulate --export_size 2 --export_legacy_format False

CUDA_VISIBLE_DEVICES=0,1,2,3 python src/export_model.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft_KBQA_WebQSP_reward/export_model/reward --adapter_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft2_KBQA_WebQSP_reward/checkpoint --template llama3 --finetuning_type lora --use_dora --export_dir expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft2_KBQA_WebQSP_reward/export_model/reward --export_size 2 --export_legacy_format False

Step10: Start LLM API for 2 Models

CUDA_VISIBLE_DEVICES=0 API_PORT=8101 MODEL_NAME=simulate nohup python src/llm_api.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft2_KBQA_WebQSP_simulate/export_model/simulate --template llama3 --temperature 0.0 >> result_Llama-3.1-8B-Instruct_llm_api_sft2_KBQA_WebQSP_simulate.log 2>&1 &
CUDA_VISIBLE_DEVICES=0 API_PORT=8102 MODEL_NAME=reward nohup python src/llm_api.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft2_KBQA_WebQSP_reward/export_model/reward --template llama3 --temperature 0.0 >> result_Llama-3.1-8B-Instruct_llm_api_sft2_KBQA_WebQSP_reward.log 2>&1 &

Step11: Test KBQA

CUDA_VISIBLE_DEVICES=0 nohup python run_explore.py --llm_simulate_name 8101/simulate --llm_reward_name 8102/reward --base Llama-3.1-8B-Instruct --task test --dataset WebQSP >> result_Llama-3.1-8B-Instruct_test_KBQA_WebQSP_sft2.log 2>&1 &

Step12: End LLM API for 2 Models

bash utils/kill_llm_api_WebQSP.sh

GrailQA

Step1: Prepare SFT Data for KBQA

python prepare_sft_data.py --dataset GrailQA

Step2: SFT Training Simulate Model and Reward Model

CUDA_VISIBLE_DEVICES=0,1,2,3 nohup deepspeed --num_gpus 4 --master_port=9902 src/train.py --deepspeed ds_config.json --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --stage sft --do_train --finetuning_type lora --lora_target q_proj,v_proj --dataset sft_KBQA_GrailQA_simulate --template llama3 --cutoff_len 1024 --overwrite_cache --preprocessing_num_workers 16 --output_dir expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft_KBQA_GrailQA_simulate/checkpoint  --logging_steps 10 --save_steps 10000 --plot_loss --overwrite_output_dir --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --learning_rate 5e-5 --num_train_epochs 100.0 --lr_scheduler_type cosine --bf16 >> result_Llama-3.1-8B-Instruct_sft_KBQA_GrailQA_simulate.log 2>&1 &
CUDA_VISIBLE_DEVICES=0,1,2,3 nohup deepspeed --num_gpus 4 --master_port=9902 src/train.py --deepspeed ds_config.json --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --stage sft --do_train --finetuning_type lora --lora_target q_proj,v_proj --use_dora --dataset sft_KBQA_GrailQA_reward --template llama3 --cutoff_len 1024 --overwrite_cache --preprocessing_num_workers 16 --output_dir expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft_KBQA_GrailQA_reward/checkpoint  --logging_steps 10 --save_steps 10000 --plot_loss --overwrite_output_dir --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --learning_rate 5e-5 --num_train_epochs 300.0 --lr_scheduler_type cosine --bf16 >> result_Llama-3.1-8B-Instruct_sft_KBQA_GrailQA_reward.log 2>&1 &

Step3: Export 2 Models

CUDA_VISIBLE_DEVICES=0,1,2,3 python src/export_model.py --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --adapter_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft_KBQA_GrailQA_simulate/checkpoint --template llama3 --finetuning_type lora --use_dora --export_dir expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft_KBQA_GrailQA_simulate/export_model/simulate --export_size 2 --export_legacy_format False

CUDA_VISIBLE_DEVICES=0,1,2,3 python src/export_model.py --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --adapter_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft_KBQA_GrailQA_reward/checkpoint --template llama3 --finetuning_type lora --use_dora --export_dir expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft_KBQA_GrailQA_reward/export_model/reward --export_size 2 --export_legacy_format False

Step4: Start LLM API for 2 Models

CUDA_VISIBLE_DEVICES=1 API_PORT=8103 MODEL_NAME=simulate nohup python src/llm_api.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft_KBQA_GrailQA_simulate/export_model/simulate --template llama3 --temperature 0.0 >> result_Llama-3.1-8B-Instruct_llm_api_sft_KBQA_GrailQA_simulate.log 2>&1 &
CUDA_VISIBLE_DEVICES=1 API_PORT=8104 MODEL_NAME=reward nohup python src/llm_api.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft_KBQA_GrailQA_reward/export_model/reward --template llama3 --temperature 0.0 >> result_Llama-3.1-8B-Instruct_llm_api_sft_KBQA_GrailQA_reward.log 2>&1 &

Step5: Explore KBQA

CUDA_VISIBLE_DEVICES=1 nohup python run_explore.py --llm_simulate_name 8103/simulate --llm_reward_name 8104/reward --base Llama-3.1-8B-Instruct --task explore --dataset GrailQA >> result_Llama-3.1-8B-Instruct_explore_KBQA_GrailQA_sft.log 2>&1 &

Step6: Prepare SFT2 Data for KBQA

python prepare_sft2_data.py --llm_reward_name 8104/reward --base Llama-3.1-8B-Instruct --dataset GrailQA --limit "-100"

Step7: End LLM API for 2 Models

bash utils/kill_llm_api_GrailQA.sh

Step8: SFT2 Training 2 Models

CUDA_VISIBLE_DEVICES=0,1,2,3 nohup deepspeed --num_gpus 4 --master_port=9902 src/train.py --deepspeed ds_config.json --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft_KBQA_GrailQA_simulate/export_model/simulate --stage sft --do_train --finetuning_type lora --lora_target q_proj,v_proj --dataset sft2_KBQA_GrailQA_simulate --template llama3 --cutoff_len 1024 --overwrite_cache --preprocessing_num_workers 16 --output_dir expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft2_KBQA_GrailQA_simulate/checkpoint  --logging_steps 10 --save_steps 10000 --plot_loss --overwrite_output_dir --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --learning_rate 5e-5 --num_train_epochs 10.0 --lr_scheduler_type cosine --bf16 >> result_Llama-3.1-8B-Instruct_sft2_KBQA_GrailQA_simulate.log 2>&1 &
CUDA_VISIBLE_DEVICES=0,1,2,3 nohup deepspeed --num_gpus 4 --master_port=9902 src/train.py --deepspeed ds_config.json --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft_KBQA_GrailQA_reward/export_model/reward --stage sft --do_train --finetuning_type lora --lora_target q_proj,v_proj --use_dora --dataset sft2_KBQA_GrailQA_reward --template llama3 --cutoff_len 1024 --overwrite_cache --preprocessing_num_workers 16 --output_dir expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft2_KBQA_GrailQA_reward/checkpoint  --logging_steps 10 --save_steps 10000 --plot_loss --overwrite_output_dir --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --learning_rate 5e-5 --num_train_epochs 20.0 --lr_scheduler_type cosine --bf16 >> result_Llama-3.1-8B-Instruct_sft2_KBQA_GrailQA_reward.log 2>&1 &

Step9: Export 2 Models

CUDA_VISIBLE_DEVICES=0,1,2,3 python src/export_model.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft_KBQA_GrailQA_simulate/export_model/simulate --adapter_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft2_KBQA_GrailQA_simulate/checkpoint --template llama3 --finetuning_type lora --use_dora --export_dir expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft2_KBQA_GrailQA_simulate/export_model/simulate --export_size 2 --export_legacy_format False

CUDA_VISIBLE_DEVICES=0,1,2,3 python src/export_model.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft_KBQA_GrailQA_reward/export_model/reward --adapter_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft2_KBQA_GrailQA_reward/checkpoint --template llama3 --finetuning_type lora --use_dora --export_dir expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft2_KBQA_GrailQA_reward/export_model/reward --export_size 2 --export_legacy_format False

Step10: Start LLM API for 2 Models

CUDA_VISIBLE_DEVICES=1 API_PORT=8103 MODEL_NAME=simulate nohup python src/llm_api.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft2_KBQA_GrailQA_simulate/export_model/simulate --template llama3 --temperature 0.0 >> result_Llama-3.1-8B-Instruct_llm_api_sft2_KBQA_GrailQA_simulate.log 2>&1 &
CUDA_VISIBLE_DEVICES=1 API_PORT=8104 MODEL_NAME=reward nohup python src/llm_api.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft2_KBQA_GrailQA_reward/export_model/reward --template llama3 --temperature 0.0 >> result_Llama-3.1-8B-Instruct_llm_api_sft2_KBQA_GrailQA_reward.log 2>&1 &

Step11: Test KBQA

CUDA_VISIBLE_DEVICES=1 nohup python run_explore.py --llm_simulate_name 8103/simulate --llm_reward_name 8104/reward --base Llama-3.1-8B-Instruct --task test --dataset GrailQA >> result_Llama-3.1-8B-Instruct_test_KBQA_GrailQA_sft2.log 2>&1 &

Step12: End LLM API for 2 Models

bash utils/kill_llm_api_GrailQA.sh

GraphQ

Step1: Prepare SFT Data for KBQA

python prepare_sft_data.py --dataset GraphQ

Step2: SFT Training Simulate Model and Reward Model

CUDA_VISIBLE_DEVICES=0,1,2,3 nohup deepspeed --num_gpus 4 --master_port=9902 src/train.py --deepspeed ds_config.json --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --stage sft --do_train --finetuning_type lora --lora_target q_proj,v_proj --dataset sft_KBQA_GraphQ_simulate --template llama3 --cutoff_len 1024 --overwrite_cache --preprocessing_num_workers 16 --output_dir expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft_KBQA_GraphQ_simulate/checkpoint  --logging_steps 10 --save_steps 10000 --plot_loss --overwrite_output_dir --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --learning_rate 5e-5 --num_train_epochs 50.0 --lr_scheduler_type cosine --bf16 >> result_Llama-3.1-8B-Instruct_sft_KBQA_GraphQ_simulate.log 2>&1 &
CUDA_VISIBLE_DEVICES=0,1,2,3 nohup deepspeed --num_gpus 4 --master_port=9902 src/train.py --deepspeed ds_config.json --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --stage sft --do_train --finetuning_type lora --lora_target q_proj,v_proj --use_dora --dataset sft_KBQA_GraphQ_reward --template llama3 --cutoff_len 1024 --overwrite_cache --preprocessing_num_workers 16 --output_dir expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft_KBQA_GraphQ_reward/checkpoint  --logging_steps 10 --save_steps 10000 --plot_loss --overwrite_output_dir --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --learning_rate 5e-5 --num_train_epochs 100.0 --lr_scheduler_type cosine --bf16 >> result_Llama-3.1-8B-Instruct_sft_KBQA_GraphQ_reward.log 2>&1 &

Step3: Export 2 Models

CUDA_VISIBLE_DEVICES=0,1,2,3 python src/export_model.py --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --adapter_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft_KBQA_GraphQ_simulate/checkpoint --template llama3 --finetuning_type lora --use_dora --export_dir expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft_KBQA_GraphQ_simulate/export_model/simulate --export_size 2 --export_legacy_format False

CUDA_VISIBLE_DEVICES=0,1,2,3 python src/export_model.py --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --adapter_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft_KBQA_GraphQ_reward/checkpoint --template llama3 --finetuning_type lora --use_dora --export_dir expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft_KBQA_GraphQ_reward/export_model/reward --export_size 2 --export_legacy_format False

Step4: Start LLM API for 2 Models

CUDA_VISIBLE_DEVICES=2 API_PORT=8105 MODEL_NAME=simulate nohup python src/llm_api.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft_KBQA_GraphQ_simulate/export_model/simulate --template llama3 --temperature 0.0 >> result_Llama-3.1-8B-Instruct_llm_api_sft_KBQA_GraphQ_simulate.log 2>&1 &
CUDA_VISIBLE_DEVICES=2 API_PORT=8106 MODEL_NAME=reward nohup python src/llm_api.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft_KBQA_GraphQ_reward/export_model/reward --template llama3 --temperature 0.0 >> result_Llama-3.1-8B-Instruct_llm_api_sft_KBQA_GraphQ_reward.log 2>&1 &

Step5: Explore KBQA

CUDA_VISIBLE_DEVICES=2 nohup python run_explore.py --llm_simulate_name 8105/simulate --llm_reward_name 8106/reward --base Llama-3.1-8B-Instruct --task explore --dataset GraphQ >> result_Llama-3.1-8B-Instruct_explore_KBQA_GraphQ_sft.log 2>&1 &

Step6: Prepare SFT2 Data for KBQA

python prepare_sft2_data.py --llm_reward_name 8106/reward --base Llama-3.1-8B-Instruct --dataset GraphQ --limit "-50"

Step7: End LLM API for 2 Models

bash utils/kill_llm_api_GraphQ.sh

Step8: SFT2 Training 2 Models

CUDA_VISIBLE_DEVICES=0,1,2,3 nohup deepspeed --num_gpus 4 --master_port=9902 src/train.py --deepspeed ds_config.json --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft_KBQA_GraphQ_simulate/export_model/simulate --stage sft --do_train --finetuning_type lora --lora_target q_proj,v_proj --dataset sft2_KBQA_GraphQ_simulate --template llama3 --cutoff_len 1024 --overwrite_cache --preprocessing_num_workers 16 --output_dir expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft2_KBQA_GraphQ_simulate/checkpoint  --logging_steps 10 --save_steps 10000 --plot_loss --overwrite_output_dir --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --learning_rate 5e-5 --num_train_epochs 10.0 --lr_scheduler_type cosine --bf16 >> result_Llama-3.1-8B-Instruct_sft2_KBQA_GraphQ_simulate.log 2>&1 &
CUDA_VISIBLE_DEVICES=0,1,2,3 nohup deepspeed --num_gpus 4 --master_port=9902 src/train.py --deepspeed ds_config.json --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft_KBQA_GraphQ_reward/export_model/reward --stage sft --do_train --finetuning_type lora --lora_target q_proj,v_proj --use_dora --dataset sft2_KBQA_GraphQ_reward --template llama3 --cutoff_len 1024 --overwrite_cache --preprocessing_num_workers 16 --output_dir expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft2_KBQA_GraphQ_reward/checkpoint  --logging_steps 10 --save_steps 10000 --plot_loss --overwrite_output_dir --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --learning_rate 5e-5 --num_train_epochs 20.0 --lr_scheduler_type cosine --bf16 >> result_Llama-3.1-8B-Instruct_sft2_KBQA_GraphQ_reward.log 2>&1 &

Step9: Export 2 Models

CUDA_VISIBLE_DEVICES=0,1,2,3 python src/export_model.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft_KBQA_GraphQ_simulate/export_model/simulate --adapter_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft2_KBQA_GraphQ_simulate/checkpoint --template llama3 --finetuning_type lora --use_dora --export_dir expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft2_KBQA_GraphQ_simulate/export_model/simulate --export_size 2 --export_legacy_format False

CUDA_VISIBLE_DEVICES=0,1,2,3 python src/export_model.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft_KBQA_GraphQ_reward/export_model/reward --adapter_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft2_KBQA_GraphQ_reward/checkpoint --template llama3 --finetuning_type lora --use_dora --export_dir expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft2_KBQA_GraphQ_reward/export_model/reward --export_size 2 --export_legacy_format False

Step10: Start LLM API for 2 Models

CUDA_VISIBLE_DEVICES=2 API_PORT=8105 MODEL_NAME=simulate nohup python src/llm_api.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft2_KBQA_GraphQ_simulate/export_model/simulate --template llama3 --temperature 0.0 >> result_Llama-3.1-8B-Instruct_llm_api_sft2_KBQA_GraphQ_simulate.log 2>&1 &
CUDA_VISIBLE_DEVICES=2 API_PORT=8106 MODEL_NAME=reward nohup python src/llm_api.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft2_KBQA_GraphQ_reward/export_model/reward --template llama3 --temperature 0.0 >> result_Llama-3.1-8B-Instruct_llm_api_sft2_KBQA_GraphQ_reward.log 2>&1 &

Step11: Test KBQA

CUDA_VISIBLE_DEVICES=2 nohup python run_explore.py --llm_simulate_name 8105/simulate --llm_reward_name 8106/reward --base Llama-3.1-8B-Instruct --task test --dataset GraphQ >> result_Llama-3.1-8B-Instruct_test_KBQA_GraphQ_sft2.log 2>&1 &

Step12: End LLM API for 2 Models

bash utils/kill_llm_api_GraphQ.sh

BibTex

If you find this work is helpful for your research, please cite:

@InProceedings{luo2025kbqao1,
  title = 	 {{KBQA}-o1: Agentic Knowledge Base Question Answering with {M}onte {C}arlo Tree Search},
  author =       {Luo, Haoran and E, Haihong and Guo, Yikai and Lin, Qika and Wu, Xiaobao and Mu, Xinyu and Liu, Wenhao and Song, Meina and Zhu, Yifan and Luu, Anh Tuan},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {41177--41199},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/luo25d/luo25d.pdf},
  url = 	 {https://proceedings.mlr.press/v267/luo25d.html},
  abstract = 	 {Knowledge Base Question Answering (KBQA) aims to answer natural language questions with a large-scale structured knowledge base (KB). Despite advancements with large language models (LLMs), KBQA still faces challenges in weak KB awareness, imbalance between effectiveness and efficiency, and high reliance on annotated data. To address these challenges, we propose KBQA-o1, a novel agentic KBQA method with Monte Carlo Tree Search (MCTS). It introduces a ReAct-based agent process for stepwise logical form generation with KB environment exploration. Moreover, it employs MCTS, a heuristic search method driven by policy and reward models, to balance agentic exploration’s performance and search space. With heuristic exploration, KBQA-o1 generates high-quality annotations for further improvement by incremental fine-tuning. Experimental results show that KBQA-o1 outperforms previous low-resource KBQA methods with limited annotated data, boosting Llama-3.1-8B model’s GrailQA F1 performance to 78.5% compared to 48.5% of the previous sota method with GPT-3.5-turbo. Our code is publicly available.}
}

For further questions, please contact: [email protected].

Acknowledgement

This repo benefits from KB-Coder, LLM-Reasoners and LLaMA-Factory. Thanks for their wonderful works.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Freebase-Setup		Freebase-Setup
figs		figs
reasoners		reasoners
src		src
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
data_process.py		data_process.py
ds_config.json		ds_config.json
prepare_sft2_data.py		prepare_sft2_data.py
prepare_sft_data.py		prepare_sft_data.py
requirements.txt		requirements.txt
run_explore.py		run_explore.py

License

LHRLAB/KBQA-o1

Folders and files

Latest commit

History

Repository files navigation

KBQA-o1

Overview

Preparation

Step1: Conda Environment

Step2: Setup Freebase KG Service

Step3: Download Freebase Ontology Files

Step4: Download KBQA Datasets

Step5: Preprocess KBQA Datasets

KBQA-o1

WebQSP

Step1: Prepare SFT Data for KBQA

Step2: SFT Training Simulate Model and Reward Model

Step3: Export 2 Models

Step4: Start LLM API for 2 Models

Step5: Explore KBQA

Step6: Prepare SFT2 Data for KBQA

Step7: End LLM API for 2 Models

Step8: SFT2 Training 2 Models

Step9: Export 2 Models

Step10: Start LLM API for 2 Models

Step11: Test KBQA

Step12: End LLM API for 2 Models

GrailQA

Step1: Prepare SFT Data for KBQA

Step2: SFT Training Simulate Model and Reward Model

Step3: Export 2 Models

Step4: Start LLM API for 2 Models

Step5: Explore KBQA

Step6: Prepare SFT2 Data for KBQA

Step7: End LLM API for 2 Models

Step8: SFT2 Training 2 Models

Step9: Export 2 Models

Step10: Start LLM API for 2 Models

Step11: Test KBQA

Step12: End LLM API for 2 Models

GraphQ

Step1: Prepare SFT Data for KBQA

Step2: SFT Training Simulate Model and Reward Model

Step3: Export 2 Models

Step4: Start LLM API for 2 Models

Step5: Explore KBQA

Step6: Prepare SFT2 Data for KBQA

Step7: End LLM API for 2 Models

Step8: SFT2 Training 2 Models

Step9: Export 2 Models

Step10: Start LLM API for 2 Models

Step11: Test KBQA

Step12: End LLM API for 2 Models

BibTex

Acknowledgement

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages