You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@jchill-git and I had a call discussing what changes we'd need to make to support custom tokenizers (e.g. leveraging tools from CAMeL).
Our end goal would be to support additional tokenizers / tokenization schemes on a version-by-version basis.
Initially, @jchill-git will produce a CSV and I will be adding some "customization hooks" to scaife-viewer-atlas to use that CSV rather than the "built-in" tokenizer (which simply splits on whitespace).
I got an initial proof-of-concept done today (see screenshot below) and will keep working on this iteratively to support "subword" tokens and punctuation across Scaife Viewer stack (backend / frontend).
The text was updated successfully, but these errors were encountered:
@jchill-git and I had a call discussing what changes we'd need to make to support custom tokenizers (e.g. leveraging tools from CAMeL).
Our end goal would be to support additional tokenizers / tokenization schemes on a version-by-version basis.
Initially, @jchill-git will produce a CSV and I will be adding some "customization hooks" to
scaife-viewer-atlas
to use that CSV rather than the "built-in" tokenizer (which simply splits on whitespace).I got an initial proof-of-concept done today (see screenshot below) and will keep working on this iteratively to support "subword" tokens and punctuation across Scaife Viewer stack (backend / frontend).
The text was updated successfully, but these errors were encountered: