Skip to content

Rust implementation of hfst-ospell

Notifications You must be signed in to change notification settings

albbas/divvunspell

 
 

Repository files navigation

divvunspell

An implementation of hfst-ospell in Rust, with added features like tokenization, case handling, and parallelisation.

Actions Status

No rust?

curl https://sh.rustup.rs -sSf | sh
source $HOME/.cargo/env
rustup default stable
cargo build --release

Building and installing commandline tools

cd divvunspell-tools
cargo install --path .

This builds and install three binaries: divvunspell, accuracy and thfst-tools

divvunspell

Usage:

divvunspell 0.5.0
Testing frontend for the DivvunSpell library

USAGE:
    divvunspell [FLAGS] [OPTIONS] <--zhfst <ZHFST>|--bhfst <BHFST>|--acceptor <acceptor>> [WORDS]...

FLAGS:
    -S, --always-suggest    Always show suggestions even if word is correct (implies -s)
    -h, --help              Prints help information
        --json              Output results in JSON
    -s, --suggest           Show suggestions for given word(s)
    -V, --version           Prints version information

OPTIONS:
        --acceptor <acceptor>    Use the given acceptor file
    -b, --bhfst <BHFST>          Use the given BHFST file
        --errmodel <errmodel>    Use the given errmodel file
    -n, --nbest <nbest>          Maximum number of results for suggestions
    -w, --weight <weight>        Maximum weight limit for suggestions
    -z, --zhfst <ZHFST>          Use the given ZHFST file

ARGS:
    <WORDS>...    The words to be processed

accuracy

Usage:

divvunspell-accuracy 1.0.0-alpha.5
Accuracy testing for DivvunSpell.

USAGE:
    accuracy [OPTIONS] [ARGS]

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

OPTIONS:
    -c <config>             Provide JSON config file to override test defaults
    -o <JSON-OUTPUT>        The file path for the JSON report output
    -w <max-words>          Truncate typos list to max number of words specified

ARGS:
    <WORDS>    The 'input -> expected' list in tab-delimited value file (TSV)
    <ZHFST>    Use the given ZHFST file

thfst-tools

Usage:

thfst-tools 1.0.0-alpha.5
Tromsø-Helsinki Finite State Transducer toolkit.

USAGE:
    thfst-tools <SUBCOMMAND>

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

SUBCOMMANDS:
    bhfst-info         Print metadata for BHFST
    help               Prints this message or the help of the given subcommand(s)
    hfst-to-thfst      Convert an HFST file to THFST
    thfsts-to-bhfst    Convert a THFST acceptor/errmodel pair to BHFST
    zhfst-to-bhfst     Convert a ZHFST file to BHFST

Speller testing

There's a prototype-level testing tool in support/accuracy-viewer. Use it like:

accuracy -o support/accuracy-viewer/public/report.json typos.txt sma.zhfst
cd support/accuracy-viewer
npm i && npm run dev

View in http://localhost:5000.

typos.txt is a TSV file with typos in the first column and expected correction in the second. More info by cargo accuracy-test -- --help.

License

The crate divvunspell is licensed under either of

at your option.

The divvunspell-tools binaries are licensed under the GPL version 3 license.

About

Rust implementation of hfst-ospell

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Rust 92.3%
  • C 2.5%
  • Svelte 2.4%
  • C++ 1.6%
  • JavaScript 0.6%
  • CSS 0.3%
  • Other 0.3%