You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be nice if token_strings could be supported for offline tokenization, so that the online and offline behavior is identical. I'll attach a pull request for how this could be done.
The text was updated successfully, but these errors were encountered:
Yes, token_strings removing from the network call would also make things more uniform.
I have a use case that uses token_strings, however, I can work around this issue - I can get the token strings by using the Hugging Face tokenizers library directly with the downloaded tokenizer.json files.
When I run the script on this doc: https://docs.cohere.com/reference/tokenize
I get:
where
token_strings
is an empty array, even thought the docs suggests that it should be non-empty. However, if I run:I get the
token_strings
as expected:It would be nice if
token_strings
could be supported for offline tokenization, so that the online and offline behavior is identical. I'll attach a pull request for how this could be done.The text was updated successfully, but these errors were encountered: