Skip to content

hmBench: Fine-Tuning, Evaluating & Benchmarking of Historic Language Models on NER Datasets

Notifications You must be signed in to change notification settings

stefan-it/hmBench

Repository files navigation

hmBench: A Benchmark for Historical Language Models on NER Datasets

hmBench

This repository presents a benchmark for Historical Language Models with main focus on NER Datasets such as HIPE-2022.

Models

The following Historical Language Models are currently used in benchmarks:

Model Hugging Face Model Hub Org
hmBERT Historical Multilingual Language Models for Named Entity Recognition
hmTEAMS Historical Multilingual TEAMS Models
hmByT5 Historical Multilingual and Monolingual ByT5 Models

Datasets

We benchmark pretrained language models on various datasets from HIPE-2020, HIPE-2022 and Europeana. The following table shows an overview of used datasets:

Language Datasets
English AjMC - TopRes19th
German AjMC - NewsEye - HIPE-2020
French AjMC - ICDAR-Europeana - LeTemps - NewsEye - HIPE-2020
Finnish NewsEye
Swedish NewsEye
Dutch ICDAR-Europeana

Results

The hmLeaderboard space on the Hugging Face Model Hub shows all results and can be accessed here.

Best Models

A collection of best performing models can be found here (grouped by the used backbone LM):

Fine-Tuning

We use Flair for fine-tuning NER models on HIPE-2022 datasets from HIPE-2022 Shared Task. Additionally, the ICDAR-Europeana is used for benchmarks on Dutch and French.

We use a tagged version of Flair to ensure a kind of reproducibility. The following commands need to be run to install all necessary dependencies:

$ pip3 install -r requirements.txt

In order to use the hmTEAMS models you need to authorize with your account on Hugging Face Model Hub. This can be done via cli:

# Use access token from https://huggingface.co/settings/tokens
$ huggingface-cli login

We use a config-driven hyper-parameter search. The script flair-fine-tuner.py can be used to fine-tune NER models from our Model Zoo.

Additionally, we provide a script that uses Hugging Face AutoTrain Advanced (Space Runner) to fine-tune models. The following snippet shows an example:

$ pip3 install git+https://github.com/huggingface/autotrain-advanced.git
$  export HF_TOKEN="" # Get token from: https://huggingface.co/settings/tokens
$ autotrain spacerunner --project-name "flair-hmbench-hmbyt5-ajmc-de" \
  --script-path $(pwd) \
  --username stefan-it \
  --token $HF_TOKEN \
  --backend spaces-t4s \
  --env "CONFIG=configs/ajmc/de/hmbyt5.json;HF_TOKEN=$HF_TOKEN;HUB_ORG_NAME=stefan-it"

The concrete implementation can be found in script.py.

Notice: the AutoTrain implementation is currently under development!

All configurations for fine-tuning are located in the ./configs folder with the following naming convention: ./configs/<dataset-name>/<language>/<model-name>.json.

Changelog

  • 17.10.2023: Over 1.200 models from hyper-parameter search are now available on the Model Hub.
  • 05.10.2023: Initial version of this repository.

Acknowledgements

We thank Luisa März, Katharina Schmid and Erion Çano for their fruitful discussions about Historical Language Models.

Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC). Many Thanks for providing access to the TPUs ❤️

About

hmBench: Fine-Tuning, Evaluating & Benchmarking of Historic Language Models on NER Datasets

Topics

Resources

Stars

Watchers

Forks

Languages