A Natural Language Processing (NLP) Java application that detects names
, organizations
, and locations
in a text by running Hugging Face's Roberta NER model using ONNX Runtime and the Deep Java Library.
Open the project folder in a Java IDE (recommended: IntelliJ IDEA Community) with Gradle support and build the project.
- Java Development Kit (JDK) version 17
- Gradle version 8.9
These files are required to run the project:
- ONNX model
tokenizer.json
file
To convert the Hugging Face NER model to ONNX, open this Google Colaboratory Notebook, run the code as shown in the image below, and follow all the steps.
(The code for this purpose is also saved in the Jupyter notebook file convert Huggingface model to ONNX.ipynb
. You can run the code using Jupyter Notebook.)
After running one of the above codes, your ONNX model will be saved in the onnx/
folder.
The tokenizer file tokenizer.json
was taken from this Hugging Face repository. Download the tokenizer.json
from this link.
Move Files
Copy the files created from the above steps into the raw-files
directory as shown in the image below.
Build the project using the button shown below.
Open the Main.java
file and click the play button as shown in the red box in the image below.