This repository is the official implementation of the paper "A Reasoning Paradigm for Named Entity Recognition (accepted by AAAI 2026)".
In this project, we propose a novel reasoning paradigm for Named Entity Recognition (NER) that shifts the modeling approach from traditional implicit pattern matching to an explicit, verifiable reasoning process.
Our model, ReasoningNER, is trained through three stages: Chain-of-Thought (CoT) Generation, CoT Tuning, and Reasoning Enhancement.
Experiments show that this paradigm significantly improves the model's generalization ability and data efficiency in zero-shot, few-shot, and cross-domain scenarios.
The core idea of ReasoningNER is to inject explicit reasoning capabilities into Large Language Models (LLMs) through a three-stage process.
-
CoT Generation: We first construct a high-quality NER-CoT dataset, where each entity annotation is accompanied by a detailed, step-by-step reasoning chain.
-
CoT Tuning: We use the NER-CoT dataset to perform Supervised Fine-Tuning (SFT) on a base language model, teaching it to generate a coherent reasoning process before predicting the final entities.
-
Reasoning Enhancement: After fine-tuning, we employ a reinforcement learning algorithm (GRPO) to further optimize the model's reasoning policy. Using a composite reward function (including F1 score and Schema compliance), we incentivize the model to generate more accurate and reliable reasoning paths.
-
torch
-
transformers
-
trl
-
accelerate
-
deepspeed
-
flash attention
-
liger-kernel
-
vllm (0.9.1)
Before starting the training, please prepare the necessary datasets.
-
NER-CoT Dataset: The complete dataset is available on Hugging Face at the following link: https://huggingface.co/datasets/HuiHuang/NER-CoT.
-
InstructUIE: The InstructUIE dataset can be downloaded from the following address: https://github.com/BeyonderXX/InstructUIE
This stage corresponds to Supervised Fine-Tuning. You can start the training by running the following script (sft.py):
accelerate launch \
--config_file config/accelerate_config/deepspeed_zero3.yaml \
sft.py \
--config config/sft/qwen3-8b.yamlAfter completing CoT tuning, use the RE script (grpo.py) to further optimize the model through reinforcement learning:
- Sample from the InstructUIE dataset.
python sample_grpo.py \
--base_path IE_INSTRUCTION/NER \
--output_path data/trl-grpo \
--num_samples 5000 \
--max_count 10000 \
--shuffle- Deploy vLLM to accelerate GRPO training.
CUDA_VISIBLE_DEVICES=7 python -m trl.scripts.vllm_serve \
--model outputs/qwen3-8b-sft \
--tensor_parallel_size 1 \
--data_parallel_size 1 \
--port 5407 \
--enable_prefix_caching- Start the GRPO training process.
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6 accelerate launch \
--config_file config/accelerate_config/deepspeed_zero3.yaml \
--num_processes 7 \
grpo.py \
--config config/grpo/qwen3-8b.yamlWe later found that training for more epochs using the verl framework can further improve performance. Therefore, we have updated the code to include the usage of
verl.
-
Sample
verltraining data:python sample_grpo.py --base_path IE_INSTRUCTION/NER --output_path data/verl --save_to_verl
-
verltraining script:bash verl-grpo.sh
You can modify the parameter configurations in
verl-grpo.shas needed. -
Weight Conversion: To convert the
verl-trained weights into Hugging Face format, you can use the official script: https://github.com/volcengine/verl/blob/main/scripts/legacy_model_merger.py
python evaluate.py \
--model outputs/qwen3-8b-grpo \
--base_path IE_INSTRUCTION/NER \
--result_file eval_result.json \
--template qwen3@misc{huang2025reasoningparadigmnamedentity,
title={A Reasoning Paradigm for Named Entity Recognition},
author={Hui Huang and Yanping Chen and Ruizhang Huang and Chuan Lin and Yongbin Qin},
year={2025},
eprint={2511.11978},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2511.11978},
}

