-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
P1Priority 1 - Not a launch blocker but urgent issue.Priority 1 - Not a launch blocker but urgent issue.passwordgenAffects the Rust library implementationAffects the Rust library implementation
Description
Problem statement
The script that I use to generate the wordlist language model data is not version controlled in this repo. Moreover it currently assumes you take a raw Wikipedia dump and pass it through spacy to extract words.
Change the script to instead use the easier-to-parse Wikipedia cirrussearch dumps is very handy here. so that spacy is no longer required, then version control it. Why not add some tests too?
Acceptance criteria
- Version control the Rust tool that processes a Wikipedia dump into a wordlist data file.
- Change the Rust tool to be able to process a cirrussearch dump instead.
- Add integration tests (preferable to unit tests).
Metadata
Metadata
Assignees
Labels
P1Priority 1 - Not a launch blocker but urgent issue.Priority 1 - Not a launch blocker but urgent issue.passwordgenAffects the Rust library implementationAffects the Rust library implementation