Positive Tokenization? #13383
Unanswered
dave-richards
asked this question in
Help: Coding & Implementations
Replies: 1 comment
-
Hi! Just to be sure: are you aware that we are supporting "Ancient greek" with the language tag |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am new to NLU and spacy, but I have been reading he docs and doing some testing. I would like to implement a custom tokenizer for Biblical Greek. My reading of the tokenizer docs is that the customizations are "negative", i.e. a token is not a whitespace character and it's not a prefix and its not a suffix and its not an infix. Everything else is a valid token. I would like to work the other way around. I would like to define exactly what is a token and continues down the pipeline and skip over what is not. Is my understanding correct and is it possible to invert the logic to work as I would like?
Beta Was this translation helpful? Give feedback.
All reactions