A little tool I made for annotating text to train spaCy's (or any) Named Entity Recognition model.
⚠️ Named Entity Anotate is generally functional; however, the frontend has issues with complex text selection. I don't have the time to resolve these issues, so I've put this project on hold for unknown time. Please do fork and open a pull request if you want to take a stab at fixing these issues!
- Clone or download this repo
- Copy the repo folder to your python project
- Import modules from Named_Entity_Annotate
Named_Entity_Annotate's main function is Server.run()
from Named_Entity_Annotate import Server, Generators
source_generator = Generators.parse_json_string(Generators.from_folder('examples'))
save_callback = Generators.save_line_to_file('output.json')
Server.run(
avalable_entitiy_labels=["PRODUCT","org","GpE","LOC","MONEY","TIME",],
next_example_generator=source_generator,
save_example_callback=save_callback
)
-
Get examples from a folder of plain text files:
source_generator = Generators.make_empty_ent_dict_with_text(Generators.from_folder('examples_folder'))
-
Get examples from a file where each line is a plain text example:
source_generator = Generators.make_empty_ent_dict_with_text(Generators.from_file('examples_list_file.txt'))
-
Get examples from a folder where each file is already json formatted like the data format below
source_generator = Generators.parse_json_string(Generators.from_folder('examples_folder'))
Note: Server.run() expects the source data to be a formatted dict already, so all of these will need some modifier generator.
-
See the Recipies for common functionality section for some examples.
-
Get examples from a folder of plain text files:
source_generator = Generators.make_empty_ent_dict_with_text(Generators.from_folder('examples_folder'))
-
Save examples to a folder
save_callback = Generators.save_as_file_in_folder('annotated_examples_folder',output_file_extension="json"):
-
Save examples as json to a each line of a file
save_callback = Generators.save_line_to_file('annotated_examples_list.json')
Named_Entity_Annotator follows a basic pipeline model. When the WebApp requests a new example, it calls the pipeline ending at the source generator which returns the next example text or json, each generator modifier can modifiy and return the example, until the last generator modifier returns the fully formed json to the Named_Entity_Annotate server and the webapp receives it:
[Annotator WebApp]--< Generator Modifier(Previous Modifier Return Value) << Generator Modifier(Source Generator Return Value) << Source Generator()
When you finish annotating one example in the browser, it sends the json back to the Python & calls your save callback with the json data.
[Annotator WebApp]--> Save Callback(Saved Data in the JSON format mentioned above)