Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tokenizer #31

Open
10 tasks
dwhieb opened this issue May 26, 2024 · 0 comments
Open
10 tasks

Tokenizer #31

dwhieb opened this issue May 26, 2024 · 0 comments
Labels
🆕 enhancement Improvements or new features 🛠️ tool Requests for new tools

Comments

@dwhieb
Copy link
Member

dwhieb commented May 26, 2024

A tool which tokenizes a string based on a set of punctuation and a set of delimiters. Users should also be able to save tokenization schemas, and export their data in different formats (.csv, .tsv, one token per line, JSON).

What features should the suggested tool have?

  • punctuation presets
  • delimiter presets
  • copy-pasting input/output
  • file upload/download for input/output

To Do

  • Purpose/Overview
  • Directions, hidden once dismissed
  • Underlying libraries, with links
  • Save work + settings to local storage
  • Domain redirects (e.g. transliterate.digitallinguistics.io > tools.digitallinguistics.io/transliterate
  • Data Import/Export
@dwhieb dwhieb added 🛠️ tool Requests for new tools 🆕 enhancement Improvements or new features labels May 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🆕 enhancement Improvements or new features 🛠️ tool Requests for new tools
Projects
None yet
Development

No branches or pull requests

1 participant