Chinese-POS-tagging

Assignment for PKU Advanced Topics in Natural Language Processing 2021 spring.

POS-tagging for simplified and traditional Chinese data with BERT / RoBERTa.

Prepare environment

conda create -n pos python=3.6
conda activate pos
conda install pytorch torchvision cudatoolkit=10.0 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch
pip install -r requirements.txt

Download pre-trained Chinese BERT

sh download.sh bert-base

Change bert-base to roberta-base and roberta-large as you need.

Train & eval

Baseline

sh pos_train.sh -b bert-wwm-ext -d simplified -c 0,1,2,3

-b denotes the path to the pre-trained model, which can be chosen from ['bert-wwm-ext', 'roberta-wwm-ext', 'roberta-wwm-ext-large'].
-d denotes the data type, which can be chosen from ['simplified', 'traditional']
-c denotes the index of CUDA device.

Focal Loss

sh pos_train_focalloss.sh -b bert-wwm-ext -d simplified -c 0,1,2,3

CRF

sh pos_train_crf.sh -b bert-wwm-ext -d simplified -c 0,1,2,3

Result

Result on the dev set

Simplified

	loss	accuracy	precision	recall	f1
BERT-base	0.18	95.79	95.24	95.63	95.44
roBERTa-large	0.21	96.17	95.92	96.09	96.00
roBERTa-large + FocalLoss	0.15	96.06	95.59	95.92	95.75
roBERTa-large + CRF	7.05	96.28	96.36	96.29	96.32

Traditional

loss accuracy precision recall f1

roBERTa-large 0.25 94.66 94.20 94.55 94.38

roBERTa-large + FocalLoss 0.18 94.62 93.91 94.44 94.18

Result on the test set

Simplified

	loss	accuracy	precision	recall	f1
BERT-base	0.18	95.84	95.29	95.71	95.50
roBERTa-large	0.20	96.32	96.05	96.23	96.14
roBERTa-large + FocalLoss	0.15	96.22	95.74	96.06	95.90
roBERTa-large + CRF	7.00	96.39	96.45	96.39	96.42

Traditional

loss accuracy precision recall f1

roBERTa-large 0.24 94.62 94.21 94.52 94.36

roBERTa-large + FocalLoss 0.18 94.57 93.89 94.38 94.14

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
token-classification		token-classification
utils		utils
.gitignore		.gitignore
README.md		README.md
download.sh		download.sh
pos_eval.sh		pos_eval.sh
pos_train.sh		pos_train.sh
pos_train_crf.sh		pos_train_crf.sh
pos_train_focalloss.sh		pos_train_focalloss.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chinese-POS-tagging

Prepare environment

Download pre-trained Chinese BERT

Train & eval

Baseline

Focal Loss

CRF

Result

About

Releases

Packages

Languages

	loss	accuracy	precision	recall	f1
roBERTa-large	0.25	94.66	94.20	94.55	94.38
roBERTa-large + FocalLoss	0.18	94.62	93.91	94.44	94.18

	loss	accuracy	precision	recall	f1
roBERTa-large	0.24	94.62	94.21	94.52	94.36
roBERTa-large + FocalLoss	0.18	94.57	93.89	94.38	94.14

RenShuhuai-Andy/POS-Tagging

Folders and files

Latest commit

History

Repository files navigation

Chinese-POS-tagging

Prepare environment

Download pre-trained Chinese BERT

Train & eval

Baseline

Focal Loss

CRF

Result

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages