Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid repeatedly calling into char_indices to build ngrams #457

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

adamreichold
Copy link

This adds a single allocation of an appropriately sized deque shared between all words of a text to avoid repeatedly restarting the UTF-8 decoding of the words. Instead a single pass using char_indices suffices by keeping the last n indices as upcoming offsets in the deque.

This adds a single allocation of an appropriately sized deque shared between all
words of a text to avoid repeatedly restarting the UTF-8 decoding of the words.
Instead a single pass using char_indices suffices by keeping the last n indices
as upcoming offsets in the deque.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant