Skip to content

Dadmatech/Persian-NERs-Comparison

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 

Repository files navigation

Persian Named Entity Recognition Models Comparison

This repo is a comparison between bert-fa-zwnj and xlm-roberta. These models are fine-tuned on either ParsNer or SemEval for NER task.

Datasets

There was four popular datasets for NER task in Persian, and Hooshvare Team introduced another dataset that is a mixture of three of them:

ParsNer tags:

  • DAT: Date
  • EVE: Event
  • FAC: Facility
  • LOC: Location
  • MON: Money
  • ORG: Organization
  • PCT: Percent
  • PER: Person
  • PRO: Product
  • TIM: Time

SemEval tags:

  • PER: Names of people
  • LOC: Location or physical facilities
  • CORP: Corporations and businesses
  • GRP: All other groups
  • PROD: Consumer products
  • CW: Titles of creative works like movie, song, and book title

Statistics:

Dataset train validation set test set
ParsNer 29,133 5,142 6,049
SemEval - Fa 15,300 800 165,702

Results' comparison on ParsNer dataset

These three models are fined-tuned on ParsNer train set and results are on ParsNer test set.

  • Pretrained model: xlm-roberta-large
- precision recall f1-score
DAT 0.76 0.81 0.78
EVE 0.86 0.97 0.91
FAC 0.84 0.99 0.90
LOC 0.95 0.94 0.95
MON 0.92 0.93 0.93
ORG 0.90 0.94 0.92
PCT 0.87 0.89 0.88
PER 0.97 0.98 0.97
PRO 0.87 0.98 0.92
TIM 0.82 0.91 0.86
micro 0.93 0.95 0.94
macro 0.88 0.93 0.90
weighted 0.93 0.95 0.94
  • Pretrained model: xlm-roberta-base
- precision recall f1-score
DAT 0.60 0.73 0.66
EVE 0.65 0.87 0.74
FAC 0.71 0.95 0.81
LOC 0.90 0.88 0.89
MON 0.85 0.86 0.85
ORG 0.81 0.89 0.85
PCT 0.79 0.87 0.83
PER 0.95 0.96 0.96
PRO 0.74 0.89 0.81
TIM 0.35 0.17 0.23
micro 0.86 0.90 0.88
macro 0.73 0.81 0.76
weighted 0.86 0.90 0.88
  • Pretrained model: HooshvareLab/bert-fa-zwnj-base
- precision recall f1-score
DAT 0.71 0.75 0.73
EVE 0.78 0.92 0.84
FAC 0.78 0.91 0.84
LOC 0.92 0.93 0.92
MON 0.83 0.82 0.82
ORG 0.87 0.90 0.88
PCT 0.90 0.88 0.89
PER 0.95 0.95 0.95
PRO 0.84 0.95 0.89
TIM 0.66 0.43 0.52
micro 0.89 0.92 0.90
macro 0.82 0.84 0.83
weighted 0.89 0.92 0.90
  • Entity comparison (f1-score)
Entities xlm-roberta-large xlm-roberta-base bert-fa-zwnj-base
DAT 0.78 0.66 0.73
EVE 0.91 0.74 0.84
FAC 0.90 0.81 0.84
LOC 0.95 0.89 0.92
MON 0.93 0.85 0.82
ORG 0.92 0.85 0.88
PCT 0.88 0.83 0.89
PER 0.97 0.96 0.95
PRO 0.92 0.81 0.89
TIM 0.86 0.23 0.52
  • Weighted avg comparison
- precision recall f1-score
xlm-roberta-large 0.93 0.95 0.94
xlm-roberta-base 0.86 0.90 0.88
bert-fa-zwnj-base 0.89 0.92 0.90

Results' comparison on SemEval dataset

These two models are fined-tuned on SemEval train set and results are on SemEval test set.

  • Pretrained model: xlm-roberta-large
- precision recall f1-score
CORP 0.56 0.56 0.56
CW 0.41 0.54 0.46
GRP 0.58 0.56 0.57
LOC 0.65 0.65 0.65
PER 0.70 0.72 0.71
PROD 0.60 0.61 0.60
micro 0.59 0.62 0.60
macro 0.58 0.61 0.59
weighted 0.60 0.62 0.61
  • Pretrained model: bert-fa-zwnj-base
- precision recall f1-score
CORP 0.51 0.56 0.53
CW 0.22 0.40 0.28
GRP 0.50 0.47 0.48
LOC 0.52 0.49 0.51
PER 0.56 0.64 0.60
PROD 0.48 0.47 0.47
micro 0.45 0.51 0.48
macro 0.46 0.51 0.48
weighted 0.47 0.51 0.49
  • Weighted avg comparison
- precision recall f1-score
xlm-roberta-large 0.60 0.62 0.61
bert-fa-zwnj-base 0.47 0.51 0.49
  • Entity comparison (f1-score)
Entities xlm-roberta-large bert-fa-zwnj-base
CORP 0.56 0.53
CW 0.46 0.28
GRP 0.57 0.48
LOC 0.65 0.51
PER 0.71 0.60

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published