🐯 fix issue#7

cyk1337 · Aug 9, 2022 · 8e4140d · 8e4140d
1 parent 9492114
commit 8e4140d
Show file tree

Hide file tree

Showing 10 changed files with 305 additions and 34 deletions.
diff --git a/build/lib/eval4ner/muc.py b/build/lib/eval4ner/muc.py
@@ -279,18 +279,40 @@ def evaluate_all(predictions: list, golden_labels: list, texts: list, verbose=Fa
 
     print('\n', 'NER evaluation scores:')
     for mode, res in total_results.items():
+        res['precision'] /= res['count']
+        res['recall'] /= res['count']
+        res['f1_score'] /= res['count']
         print("{:>8s} mode, Precision={:<6.4f}, Recall={:<6.4f}, F1:{:<6.4f}"
-              .format(mode, res['precision'] / res['count'], res['recall'] / res['count'],
-                      res['f1_score'] / res['count']))
+              .format(mode, res['precision'], res['recall'], res['f1_score']))
     return total_results
 
 
 if __name__ == '__main__':
-    grount_truth = [('PER', 'John Jones'), ('PER', 'Peter Peters'), ('LOC', 'York')]
-    prediction = [('PER', 'John Jones and Peter Peters came to York')]
-    text = 'John Jones and Peter Peters came to York'
-    # print(evaluate_one(prediction, grount_truth, text))
-    # print(evaluate_one(prediction, grount_truth, text))
-
-    evaluate_all([prediction] * 1, [grount_truth] * 1, [text] * 1, verbose=True)
-    evaluate_all([prediction] * 1, [grount_truth] * 1, [text] * 1, verbose=True)
+    # grount_truth = [('PER', 'John Jones'), ('PER', 'Peter Peters'), ('LOC', 'York')]
+    # prediction = [('PER', 'John Jones and Peter Peters came to York')]
+    # text = 'John Jones and Peter Peters came to York'
+    # # print(evaluate_one(prediction, grount_truth, text))
+    # # print(evaluate_one(prediction, grount_truth, text))
+
+    # evaluate_all([prediction] * 1, [grount_truth] * 1, [text] * 1, verbose=True)
+    # evaluate_all([prediction] * 1, [grount_truth] * 1, [text] * 1, verbose=True)
+
+    grount_truths = [
+    [('PER', 'John Jones'), ('PER', 'Peter Peters'), ('LOC', 'York')],
+    [('PER', 'John Jones'), ('PER', 'Peter Peters'), ('LOC', 'York')],
+    [('PER', 'John Jones'), ('PER', 'Peter Peters'), ('LOC', 'York')]
+    ]
+    # NER model prediction
+    predictions = [
+        [('PER', 'John Jones and Peter Peters came to York')],
+        [('LOC', 'John Jones'), ('PER', 'Peters'), ('LOC', 'York')],
+        [('PER', 'John Jones'), ('PER', 'Peter Peters'), ('LOC', 'York')]
+    ]
+    # input texts
+    texts = [
+        'John Jones and Peter Peters came to York',
+        'John Jones and Peter Peters came to York',
+        'John Jones and Peter Peters came to York'
+    ]
+    res = evaluate_all(predictions, grount_truths * 1, texts, verbose=True)
+    print(res)
diff --git a/dist/eval4ner-0.0.5-py2.py3-none-any.whl b/dist/eval4ner-0.0.5-py2.py3-none-any.whl
diff --git a/dist/eval4ner-0.0.5-py3-none-any.whl b/dist/eval4ner-0.0.5-py3-none-any.whl
diff --git a/dist/eval4ner-0.0.5.tar.gz b/dist/eval4ner-0.0.5.tar.gz
diff --git a/dist/eval4ner-0.1.0-py3-none-any.whl b/dist/eval4ner-0.1.0-py3-none-any.whl
diff --git a/dist/eval4ner-0.1.0.tar.gz b/dist/eval4ner-0.1.0.tar.gz
diff --git a/eval4ner.egg-info/PKG-INFO b/eval4ner.egg-info/PKG-INFO
@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: eval4ner
-Version: 0.0.5
+Version: 0.1.0
 Summary: A package for NER evaluation
 Home-page: https://github.com/cyk1337/eval4ner
 Author: cyk1337
@@ -12,18 +12,231 @@ Classifier: License :: OSI Approved :: MIT License
 Classifier: Operating System :: OS Independent
 Description-Content-Type: text/markdown
 License-File: LICENCE
+License-File: LICENSE
 
-# NER-evaluation
+# eval4ner: An All-Round Evaluation for Named Entity Recognition
+![Stable version](https://img.shields.io/pypi/v/eval4ner)
+![Python3](https://img.shields.io/pypi/pyversions/eval4ner)![wheel:eval4ner](https://img.shields.io/pypi/wheel/eval4ner)
+![Download](https://img.shields.io/pypi/dm/eval4ner)
+![MIT License](https://img.shields.io/pypi/l/eval4ner)
 
-This is a Python implementation of NER MUC evaluation. Refer to the blog [Evaluation Metrics of Name Entity Recognition](https://ychai.uk/notes/2018/11/21/NLP/NER/Evaluation-metrics-of-Name-Entity-Recognition-systems/#SemEval%E2%80%9813) for explanations of MUC metric.
 
-## Installation
+
+Table of Contents
+=================
+
+- [TL;DR](https://github.com/cyk1337/eval4ner/#tldr)
+- [Preliminaries for NER Evaluation](https://github.com/cyk1337/eval4ner/#preliminaries-for-ner-evaluation)
+- [User Guide](https://github.com/cyk1337/eval4ner/#user-guide)
+    - [Installation](https://github.com/cyk1337/eval4ner/#installation)
+    - [Usage](https://github.com/cyk1337/eval4ner/#usage)
+- [Citation](https://github.com/cyk1337/eval4ner/#citation)
+- [References](https://github.com/cyk1337/eval4ner/#references)
+
+This is a Python toolkit of MUC-5 evaluation metrics for evaluating Named Entity Recognition (NER) results. 
+
+
+## TL;DR
+It considers not only the mode of strict matching, *i.e.*, extracted entities are correct w.r.t both boundaries and types, but that of partial match, summarizing as following four modes:  
+- Strict：exact match (Both entity boundary and type are correct)
+- Exact boundary matching：predicted entity boundary is correct, regardless of entity boundary
+- Partial boundary matching：entity boundaries overlap, regardless of entity boundary
+- Type matching：some overlap between the system tagged entity and the gold annotation is required;
+
+
+Refer to the blog [Evaluation Metrics of Name Entity Recognition](https://ychai.uk/notes/2018/11/21/NLP/NER/NER-Evaluation-Metrics/#SemEval%E2%80%9813) for explanations of MUC metric.
+
+## Preliminaries for NER Evaluation
+In research and production, following scenarios of NER systems can occur frequently: 
+
+<table class="tg">
+  <tr>
+    <th class="tg-0pky">Scenario</th>
+    <th class="tg-c3ow" colspan="2">Golden Standard</th>
+    <th class="tg-c3ow" colspan="2">NER system prediction</th>
+    <th class="tg-c3ow" colspan="4">Measure</th>
+  </tr>
+  <tr>
+    <td class="tg-0pky"></td>
+    <td class="tg-c3ow">Entity Type</td>
+    <td class="tg-c3ow">Entity Boundary (Surface String)</td>
+    <td class="tg-0pky">Entity Type</td>
+    <td class="tg-0pky">Entity Boundary (Surface String)</td>
+    <td class="tg-0pky">Type</td>
+    <td class="tg-0pky">Partial</td>
+    <td class="tg-0pky">Exact</td>
+    <td class="tg-0pky">Strict</td>
+  </tr>
+  <tr>
+    <td class="tg-0pky">III</td>
+    <td class="tg-c3ow">MUSIC_NAME</td>
+    <td class="tg-c3ow">告白气球</td>
+    <td class="tg-0pky"></td>
+    <td class="tg-0pky"></td>
+    <td class="tg-0pky">MIS</td>
+    <td class="tg-0pky">MIS</td>
+    <td class="tg-0pky">MIS</td>
+    <td class="tg-0pky">MIS</td>
+  </tr>
+  <tr>
+    <td class="tg-0pky">II</td>
+    <td class="tg-c3ow"></td>
+    <td class="tg-c3ow"></td>
+    <td class="tg-0pky">MUSIC_NAME</td>
+    <td class="tg-0pky">年轮</td>
+    <td class="tg-0pky">SPU</td>
+    <td class="tg-0pky">SPU</td>
+    <td class="tg-0pky">SPU</td>
+    <td class="tg-0pky">SPU</td>
+  </tr>
+  <tr>
+    <td class="tg-0pky">V</td>
+    <td class="tg-c3ow">MUSIC_NAME</td>
+    <td class="tg-c3ow">告白气球</td>
+    <td class="tg-0pky">MUSIC_NAME</td>
+    <td class="tg-0pky">一首告白气球</td>
+    <td class="tg-0pky">COR</td>
+    <td class="tg-0pky">PAR</td>
+    <td class="tg-0pky">INC</td>
+    <td class="tg-0pky">INC</td>
+  </tr>
+  <tr>
+    <td class="tg-0pky">IV</td>
+    <td class="tg-c3ow">MUSIC_NAME</td>
+    <td class="tg-c3ow">告白气球</td>
+    <td class="tg-0pky">SINGER</td>
+    <td class="tg-0pky">告白气球</td>
+    <td class="tg-0pky">INC</td>
+    <td class="tg-0pky">COR</td>
+    <td class="tg-0pky">COR</td>
+    <td class="tg-0pky">INC</td>
+  </tr>
+  <tr>
+    <td class="tg-0pky">I</td>
+    <td class="tg-c3ow">MUSIC_NAME</td>
+    <td class="tg-c3ow">告白气球</td>
+    <td class="tg-0pky">MUSIC_NAME</td>
+    <td class="tg-0pky">告白气球</td>
+    <td class="tg-0pky">COR</td>
+    <td class="tg-0pky">COR</td>
+    <td class="tg-0pky">COR</td>
+    <td class="tg-0pky">COR</td>
+  </tr>
+  <tr>
+    <td class="tg-0pky">VI</td>
+    <td class="tg-c3ow">MUSIC_NAME</td>
+    <td class="tg-c3ow">告白气球</td>
+    <td class="tg-0pky">SINGER</td>
+    <td class="tg-0pky">一首告白气球</td>
+    <td class="tg-0pky">INC</td>
+    <td class="tg-0pky">PAR</td>
+    <td class="tg-0pky">INC</td>
+    <td class="tg-0pky">INC</td>
+  </tr>
+</table>
+
+Thus, MUC-5 takes into account all these scenarios for all-sided evaluation. 
+
+Then we can compute:
+
+**Number of golden standard**:
+
+<img src="https://render.githubusercontent.com/render/math?math=Possible(POS) = COR %2B INC %2B PAR %2B MIS = TP %2B FN">
+
+**Number of predictee**: 
+
+<img src="https://render.githubusercontent.com/render/math?math=Actual(ACT) = COR %2B INC %2B PAR %2B SPU = TP %2B FP">
+
+The evaluation type of exact match and partial match are as follows:
+### Exact match(i.e. Strict, Exact)
+<img src="https://render.githubusercontent.com/render/math?math=Precision = \frac{COR}{ACT} = \frac{TP}{TP%2BFP}">
+<img src="https://render.githubusercontent.com/render/math?math=Recall =\frac{COR}{POS}=\frac{TP}{TP%2BFN}">
+
+
+### Partial match (i.e. Partial, Type)
+<img src="https://render.githubusercontent.com/render/math?math=Precision = \frac{COR %2B 0.5\times PAR}{ACT}">
+<img src="https://render.githubusercontent.com/render/math?math=Recall = \frac{COR %2B 0.5 \times PAR}{POS}">
+
+
+### F-Measure
+<img src="https://render.githubusercontent.com/render/math?math=F_\alpha = \frac{(\alpha^2 %2B 1)PR}{\alpha^2 P%2BR}">
+<img src="https://render.githubusercontent.com/render/math?math=F_1 = \frac{2PR}{P%2BR}">
+
+Therefore, we can get the results:
+<table class="tg">
+  <tr>
+    <th class="tg-e6bt">Measure</th>
+    <th class="tg-23iq">Type</th>
+    <th class="tg-23iq">Partial</th>
+    <th class="tg-ww3v">Exact</th>
+    <th class="tg-ww3v">Strict</th>
+  </tr>
+  <tr>
+    <td class="tg-e6bt">Correct</td>
+    <td class="tg-23iq">2</td>
+    <td class="tg-23iq">2</td>
+    <td class="tg-ww3v">2</td>
+    <td class="tg-ww3v">1</td>
+  </tr>
+  <tr>
+    <td class="tg-e6bt">Incorrect</td>
+    <td class="tg-23iq">2</td>
+    <td class="tg-23iq">0</td>
+    <td class="tg-ww3v">2</td>
+    <td class="tg-ww3v">3</td>
+  </tr>
+  <tr>
+    <td class="tg-e6bt">Partial</td>
+    <td class="tg-23iq">0</td>
+    <td class="tg-23iq">2</td>
+    <td class="tg-ww3v">0</td>
+    <td class="tg-ww3v">0</td>
+  </tr>
+  <tr>
+    <td class="tg-e6bt">Missed</td>
+    <td class="tg-23iq">1</td>
+    <td class="tg-23iq">1</td>
+    <td class="tg-ww3v">1</td>
+    <td class="tg-ww3v">1</td>
+  </tr>
+  <tr>
+    <td class="tg-e6bt">Spurius</td>
+    <td class="tg-23iq">1</td>
+    <td class="tg-23iq">1</td>
+    <td class="tg-ww3v">1</td>
+    <td class="tg-ww3v">1</td>
+  </tr>
+  <tr>
+    <td class="tg-e6bt">Precision</td>
+    <td class="tg-23iq">0.4</td>
+    <td class="tg-23iq">0.6</td>
+    <td class="tg-ww3v">0.4</td>
+    <td class="tg-ww3v">0.2</td>
+  </tr>
+  <tr>
+    <td class="tg-e6bt">Recall</td>
+    <td class="tg-23iq">0.4</td>
+    <td class="tg-23iq">0.6</td>
+    <td class="tg-ww3v">0.4</td>
+    <td class="tg-ww3v">0.2</td>
+  </tr>
+  <tr>
+    <td class="tg-gx32">F1 score</td>
+    <td class="tg-t0np">0.4</td>
+    <td class="tg-t0np">0.6</td>
+    <td class="tg-8l38">0.4</td>
+    <td class="tg-8l38">0.2</td>
+  </tr>
+</table>
+
+## User Guide
+### Installation
 ```bash
-pip install eval4ner
+pip install [-U] eval4ner
 ```
 
-## Usage
-1. Evaluate single prediction
+### Usage
+#### 1. Evaluate single prediction
 ```python
 import eval4ner.muc as muc
 import pprint
@@ -34,7 +247,7 @@ one_result = muc.evaluate_one(prediction, grount_truth, text)
 pprint.pprint(one_result)
 ```
 
-Output
+Output:
 ```bash
 {'exact': {'actual': 1,
            'correct': 0,
@@ -79,7 +292,7 @@ Output
 
 ```
 
-2. Evaluate all predictions
+#### 2. Evaluate all predictions
 ```python
 import eval4ner.muc as muc
 # ground truth
@@ -112,17 +325,30 @@ Output:
     type mode, Precision=0.8889, Recall=0.6667, F1:0.7222
 ```
 
-## Cite
+This repo will be long-term supported. Welcome to contribute and PR.
+
+## Citation
+For attribution in academic contexts, please cite this work as:
 ```
 @misc{eval4ner,
-  title={eval4ner},
-  author={Yekun Chai},
+  title={Evaluation Metrics of Named Entity Recognition},
+  author={Chai, Yekun},
   year={2018},
-  howpublished={\url{https://cyk1337.github.io/notes/2018/11/21/NLP/NER/Evaluation-metrics-of-Name-Entity-Recognition-systems/}},
+  howpublished={\url{https://cyk1337.github.io/notes/2018/11/21/NLP/NER/NER-Evaluation-Metrics/}},
+}
+
+@misc{chai2018-ner-eval,
+  author = {Chai, Yekun},
+  title = {eval4ner: An All-Round Evaluation for Named Entity Recognition},
+  year = {2019},
+  publisher = {GitHub},
+  journal = {GitHub repository},
+  howpublished = {\url{https://github.com/cyk1337/eval4ner}}
 }
 ```
 
 ## References
 1. [Evaluation of the SemEval-2013 Task 9.1: Recognition and Classification of pharmacological substances](https://www.cs.york.ac.uk/semeval-2013/task9/data/uploads/semeval_2013-task-9_1-evaluation-metrics.pdf)
 2. [MUC-5 Evaluation Metrics](https://www.aclweb.org/anthology/M93-1007.pdf)
 
+
diff --git a/eval4ner.egg-info/SOURCES.txt b/eval4ner.egg-info/SOURCES.txt
@@ -1,4 +1,5 @@
 LICENCE
+LICENSE
 README.md
 pyproject.toml
 setup.py