Skip to content

Performance issues with suggestions in French #15

Description

@ByteOfBrie

Spellchecking performance seems especially bad in French (compared to German or English), even after #14

This was also originally found by @new-years-eve, but I did a bit of testing around it.

Some testing data (formatted for clarity):

Spellbook:
en_US correct word: 14.818µs 
fr_FR correct word: 19.015µs 
en_US incorrect word: 12.574µs
fr_FR incorrect word: 360.648µs

en_US impécable suggestions: 130.336522ms
fr_FR impécable suggestions: 1.465016974s

en_US réceptioniste suggestions: 188.750999ms
fr_FR réceptioniste suggestions: 1.662972801s

de_DE frühstrucken suggestions: 370.188776ms

(before #14, impécable took ~30 seconds, I killed my test for réceptioniste after 15 minutes)

Click to expand (Hunspell results)
Hunspell:
en_US correct word: 4.148µs
fr_FR correct word: 38.251µs
en_US incorrect word: 11.712µs
fr_FR incorrect word: 20.358µs

en_US impécable suggestions: 59.585589ms
fr_FR impécable suggestions: 243.439059ms

en_US réceptioniste suggestions: 56.378308ms
fr_FR réceptioniste suggestions: 117.79522ms

de_DE frühstrucken suggestions: 98.131046ms
Click to expand (source code)
use std::time::Instant;

use hunspell_rs::{CheckResult, Hunspell};
use spellbook::Dictionary;

fn test_hunspell() {
    let fr_dict = Hunspell::new(
        "/usr/share/hunspell/fr_FR.aff",
        "/usr/share/hunspell/fr_FR.dic",
    );
    let en_dict = Hunspell::new(
        "/usr/share/hunspell/en_US.aff",
        "/usr/share/hunspell/en_US.dic",
    );
    let de_dict = Hunspell::new(
        "/home/brie/Downloads/de_DE.aff",
        "/home/brie/Downloads/de_DE.dic",
    );

    let dur = Instant::now();
    assert!(en_dict.check("test") == CheckResult::FoundInDictionary);
    println!("en_US correct word: {:?}", dur.elapsed());

    let dur = Instant::now();
    assert!(fr_dict.check("test") == CheckResult::FoundInDictionary);
    println!("fr_FR correct word: {:?}", dur.elapsed());

    let dur = Instant::now();
    assert!(en_dict.check("foobarbaz") != CheckResult::FoundInDictionary);
    println!("en_US incorrect word: {:?}", dur.elapsed());

    let dur = Instant::now();
    assert!(fr_dict.check("foobarbaz") != CheckResult::FoundInDictionary);
    println!("fr_FR incorrect word: {:?}", dur.elapsed());

    let dur = Instant::now();
    let suggestions = en_dict.suggest("impécable");
    println!("en_US impécable suggestions: {:?}", dur.elapsed());
    println!("Suggestions: {suggestions:?}");

    let dur = Instant::now();
    let suggestions = fr_dict.suggest("impécable");
    println!("fr_FR impécable suggestions: {:?}", dur.elapsed());
    println!("Suggestions: {suggestions:?}");

    let dur = Instant::now();
    let suggestions = en_dict.suggest("réceptioniste");
    println!("en_US réceptioniste suggestions: {:?}", dur.elapsed());
    println!("Suggestions: {suggestions:?}");

    let dur = Instant::now();
    let suggestions = fr_dict.suggest("réceptioniste");
    println!("fr_FR réceptioniste suggestions: {:?}", dur.elapsed());
    println!("Suggestions: {suggestions:?}");

    let dur = Instant::now();
    let suggestions = de_dict.suggest("frühstrucken");
    println!("de_DE frühstrucken suggestions: {:?}", dur.elapsed());
    println!("Suggestions: {suggestions:?}");
}

fn test_spellbook() {
    let fr_aff = std::fs::read_to_string("/usr/share/hunspell/fr_FR.aff").unwrap();
    let fr_dic = std::fs::read_to_string("/usr/share/hunspell/fr_FR.dic").unwrap();
    let fr_dict = Dictionary::new(&fr_aff, &fr_dic).unwrap();
    let en_aff = std::fs::read_to_string("/usr/share/hunspell/en_US.aff").unwrap();
    let en_dic = std::fs::read_to_string("/usr/share/hunspell/en_US.dic").unwrap();
    let en_dict = Dictionary::new(&en_aff, &en_dic).unwrap();
    let de_aff = std::fs::read_to_string("/home/brie/Downloads/de_DE.aff").unwrap();
    let de_dic = std::fs::read_to_string("/home/brie/Downloads/de_DE.dic").unwrap();
    let de_dict = Dictionary::new(&de_aff, &de_dic).unwrap();

    let dur = Instant::now();
    assert!(en_dict.check("test"));
    println!("en_US correct word: {:?} ", dur.elapsed());

    let dur = Instant::now();
    assert!(fr_dict.check("test"));
    println!("fr_FR correct word: {:?} ", dur.elapsed());

    let dur = Instant::now();
    assert!(!en_dict.check("foobarbaz"));
    println!("en_US incorrect word: {:?}", dur.elapsed());

    let dur = Instant::now();
    assert!(!fr_dict.check("foobarbaz"));
    println!("fr_FR incorrect word: {:?}", dur.elapsed());

    let mut suggestions = Vec::new();

    let dur = Instant::now();
    en_dict.suggest("impécable", &mut suggestions);
    println!("en_US impécable suggestions: {:?}", dur.elapsed());
    println!("Suggestions: {suggestions:?}");

    let dur = Instant::now();
    fr_dict.suggest("impécable", &mut suggestions);
    println!("fr_FR impécable suggestions: {:?}", dur.elapsed());
    println!("Suggestions: {suggestions:?}");

    let dur = Instant::now();
    en_dict.suggest("réceptioniste", &mut suggestions);
    println!("en_US réceptioniste suggestions: {:?}", dur.elapsed());
    println!("Suggestions: {suggestions:?}");

    let dur = Instant::now();
    fr_dict.suggest("réceptioniste", &mut suggestions);
    println!("fr_FR réceptioniste suggestions: {:?}", dur.elapsed());
    println!("Suggestions: {suggestions:?}");

    let dur = Instant::now();
    de_dict.suggest("frühstrucken", &mut suggestions);
    println!("de_DE frühstrucken suggestions: {:?}", dur.elapsed());
    println!("Suggestions: {suggestions:?}");
}

fn main() {
    println!("Hunspell:");
    test_hunspell();
    println!("----------");
    println!("Spellbook:");
    test_spellbook();
}

I can provide dictionary files to replicate if necessary, but I'm just using French and English from the standard fedora 43 packages (hunspell-en, hunspell-fr). The German package is from Arch Linux, since the Fedora de_DE dictionary uses ISO-8859 instead of UTF-8 (and spellbook cannot handle non-UTF-8)


I'm well aware that the README doesn't guarantee that this will work well:

Some dictionaries which use complex compounding directives may work less well.

It seemed like this was (potentially) bad enough to be worth an issue

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions