Named Entity Recognition (NER)

This sample shows how to use a BERT/DistilBERT based ONNX models for Token Classification / NER in ML.NET.

Export a model to ONNX

To export a Hugging Face model to ONNX you can follow the instructions provided by Hugging Face or:

Install Python
Install these packages:

pip install optimum[exporters]

pip install accelerate

and finally use the installed Optimum CLI tool optimum-cli to export the model:

optimum-cli export onnx --model dslim/bert-base-NER bert-base-NER/

or

optimum-cli export onnx --model dmargutierrez/distilbert-base-multilingual-cased-mapa_coarse-ner distilbert-base-multilingual-cased-mapa_coarse-ner

One model which seems to perform much better than others, especially on multiple languages can be found here and it can be downloaded with optimum-cli with this:
optimum-cli export onnx --model Babelscape/wikineural-multilingual-ner wikineural-multilingual-ner
This model is licensed for non-commercial research purposes only.

according to the model you want to use.

The ONNX model, the configuration files and the vocabulary will be downloaded in a subfolder with the name of the model from where you are executing the CLI.

For my tests I have used a multilingual cased model found here.

This model does not have token type ids, hence the configuration sets HasTokenTypeIds to false :

var configuration = new Configuration(modelPath, numberOfTokens: 5)
{
    HasTokenTypeIds = false
};

You can use Netron to check the shape of the input/output of your ONNX model.
Once you have provided your model, select the input_ids node on the pane, and check the model properties.

If your model has token_type_ids defined, simply set the configuration property to true.

The folder where the ONNX model is exported should contain a bunch of files. To run this example we need the configuration file config.json and the vocabulary vocab.txt.

The console should show the result on the NER process:

 Wolfgang=B-PERSON
 Mu¨ller=I-PERSON
 Berlin=B-ADDRESS
 ,=I-ADDRESS
 Germany=I-ADDRESS

Where each word identified is associated to one of the tokens supported by the model.

B- indicates the beginning of an entity.
I- indicates a token is contained inside the same entity.

If you want to find out more about the meaning of these tokens, Hugging Face is a good source of information.

Different models might have different labels. The configuration file config.json found in the folder where the model is downloaded, will give you some information on the supported labels and their names:

"id2label": {
    "0": "O",
    "1": "B-ORGANISATION",
    "2": "I-ORGANISATION",
    "3": "B-ADDRESS",
    "4": "I-ADDRESS",
    "5": "B-DATE",
    "6": "I-DATE",
    "7": "B-PERSON",
    "8": "I-PERSON",
    "9": "B-AMOUNT",
    "10": "I-AMOUNT",
    "11": "B-TIME",
    "12": "I-TIME"
  },

I have used a Bert Tokenizer found in this repo.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Assets		Assets
NamedEntityRecognizer		NamedEntityRecognizer
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
NamedEntityRecognizer.sln		NamedEntityRecognizer.sln
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Named Entity Recognition (NER)

Export a model to ONNX

About

Releases

Packages

Languages

License

Leftyx/NamedEntityRecognizer

Folders and files

Latest commit

History

Repository files navigation

Named Entity Recognition (NER)

Export a model to ONNX

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages