Skip to content

Commit

Permalink
Dependencies: Upgrade Stanza to 1.7.0
Browse files Browse the repository at this point in the history
  • Loading branch information
BLKSerene committed Dec 30, 2023
1 parent c4a23bc commit 52b4f5b
Show file tree
Hide file tree
Showing 12 changed files with 47 additions and 56 deletions.
2 changes: 1 addition & 1 deletion ACKS.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ As Wordless stands on the shoulders of giants, I hereby extend my sincere gratit
25|[simplemma](https://github.com/adbar/simplemma)|0.9.1|Adrien Barbaresi|[MIT](https://github.com/adbar/simplemma/blob/main/LICENSE)
26|[spaCy](https://spacy.io/)|3.7.2|Matthew Honnibal, Ines Montani, Sofie Van Landeghem,<br>Adriane Boyd, Paul O'Leary McCann|[MIT](https://github.com/explosion/spaCy/blob/master/LICENSE)
27|[spacy-pkuseg](https://github.com/explosion/spacy-pkuseg)|0.0.33|Ruixuan Luo (罗睿轩), Jingjing Xu (许晶晶),<br>Xuancheng Ren (任宣丞), Yi Zhang (张艺),<br>Zhiyuan Zhang (张之远), Bingzhen Wei (位冰镇),<br>Xu Sun (孙栩)<br>Adriane Boyd, Ines Montani|[MIT](https://github.com/explosion/spacy-pkuseg/blob/master/LICENSE)
28|[Stanza](https://github.com/stanfordnlp/stanza)|1.5.1|Peng Qi (齐鹏), Yuhao Zhang (张宇浩),<br>Yuhui Zhang (张钰晖), Jason Bolton,<br>Tim Dozat, John Bauer|[Apache-2.0](https://github.com/stanfordnlp/stanza/blob/main/LICENSE)
28|[Stanza](https://github.com/stanfordnlp/stanza)|1.7.0|Peng Qi (齐鹏), Yuhao Zhang (张宇浩),<br>Yuhui Zhang (张钰晖), Jason Bolton,<br>Tim Dozat, John Bauer|[Apache-2.0](https://github.com/stanfordnlp/stanza/blob/main/LICENSE)
29|[SudachiPy](https://github.com/WorksApplications/sudachi.rs)|0.6.7|Works Applications Co., Ltd.|[Apache-2.0](https://github.com/WorksApplications/sudachi.rs/blob/develop/LICENSE)
30|[Underthesea](https://undertheseanlp.com/)|6.8.0|Vu Anh|[GPL-3.0-or-later](https://github.com/undertheseanlp/underthesea/blob/main/LICENSE)
31|[wordcloud](https://github.com/amueller/word_cloud)|1.9.3|Andreas Christian Müller|[MIT](https://github.com/amueller/word_cloud/blob/main/LICENSE)
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@
- Dependencies: Upgrade Sacremoses to 0.1.1
- Dependencies: Upgrade spaCy to 3.7.2
- Dependencies: Upgrade spacy-pkuseg to 0.0.33
- Dependencies: Upgrade Stanza to 1.7.0
- Dependencies: Upgrade wordcloud to 1.9.3

## [3.4.0](https://github.com/BLKSerene/Wordless/releases/tag/3.4.0) - 09/30/2023
Expand Down
2 changes: 1 addition & 1 deletion doc/trs/zho_cn/ACKS.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@
25|[simplemma](https://github.com/adbar/simplemma)|0.9.1|Adrien Barbaresi|[MIT](https://github.com/adbar/simplemma/blob/main/LICENSE)
26|[spaCy](https://spacy.io/)|3.7.2|Matthew Honnibal, Ines Montani, Sofie Van Landeghem,<br>Adriane Boyd, Paul O'Leary McCann|[MIT](https://github.com/explosion/spaCy/blob/master/LICENSE)
27|[spacy-pkuseg](https://github.com/explosion/spacy-pkuseg)|0.0.33|罗睿轩, 许晶晶, 任宣丞, 张艺, 张之远, 位冰镇, 孙栩<br>Adriane Boyd, Ines Montani|[MIT](https://github.com/explosion/spacy-pkuseg/blob/master/LICENSE)
28|[Stanza](https://github.com/stanfordnlp/stanza)|1.5.1|齐鹏, 张宇浩, 张钰晖,<br>Jason Bolton, Tim Dozat, John Bauer|[Apache-2.0](https://github.com/stanfordnlp/stanza/blob/main/LICENSE)
28|[Stanza](https://github.com/stanfordnlp/stanza)|1.7.0|齐鹏, 张宇浩, 张钰晖,<br>Jason Bolton, Tim Dozat, John Bauer|[Apache-2.0](https://github.com/stanfordnlp/stanza/blob/main/LICENSE)
29|[SudachiPy](https://github.com/WorksApplications/sudachi.rs)|0.6.7|Works Applications Co., Ltd.|[Apache-2.0](https://github.com/WorksApplications/sudachi.rs/blob/develop/LICENSE)
30|[Underthesea](https://undertheseanlp.com/)|6.8.0|Vu Anh|[GPL-3.0-or-later](https://github.com/undertheseanlp/underthesea/blob/main/LICENSE)
31|[wordcloud](https://github.com/amueller/word_cloud)|1.9.3|Andreas Christian Müller|[MIT](https://github.com/amueller/word_cloud/blob/main/LICENSE)
2 changes: 1 addition & 1 deletion doc/trs/zho_tw/ACKS.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@
25|[simplemma](https://github.com/adbar/simplemma)|0.9.1|Adrien Barbaresi|[MIT](https://github.com/adbar/simplemma/blob/main/LICENSE)
26|[spaCy](https://spacy.io/)|3.7.2|Matthew Honnibal, Ines Montani, Sofie Van Landeghem,<br>Adriane Boyd, Paul O'Leary McCann|[MIT](https://github.com/explosion/spaCy/blob/master/LICENSE)
27|[spacy-pkuseg](https://github.com/explosion/spacy-pkuseg)|0.0.33|罗睿轩, 许晶晶, 任宣丞, 张艺, 张之远, 位冰镇, 孙栩<br>Adriane Boyd, Ines Montani|[MIT](https://github.com/explosion/spacy-pkuseg/blob/master/LICENSE)
28|[Stanza](https://github.com/stanfordnlp/stanza)|1.5.1|齐鹏, 张宇浩, 张钰晖,<br>Jason Bolton, Tim Dozat, John Bauer|[Apache-2.0](https://github.com/stanfordnlp/stanza/blob/main/LICENSE)
28|[Stanza](https://github.com/stanfordnlp/stanza)|1.7.0|齐鹏, 张宇浩, 张钰晖,<br>Jason Bolton, Tim Dozat, John Bauer|[Apache-2.0](https://github.com/stanfordnlp/stanza/blob/main/LICENSE)
29|[SudachiPy](https://github.com/WorksApplications/sudachi.rs)|0.6.7|Works Applications Co., Ltd.|[Apache-2.0](https://github.com/WorksApplications/sudachi.rs/blob/develop/LICENSE)
30|[Underthesea](https://undertheseanlp.com/)|6.8.0|Vu Anh|[GPL-3.0-or-later](https://github.com/undertheseanlp/underthesea/blob/main/LICENSE)
31|[wordcloud](https://github.com/amueller/word_cloud)|1.9.3|Andreas Christian Müller|[MIT](https://github.com/amueller/word_cloud/blob/main/LICENSE)
2 changes: 1 addition & 1 deletion requirements/requirements_tests.txt
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ pyphen == 0.14.0
pythainlp == 4.0.2
sacremoses == 0.1.1
simplemma == 0.9.1
stanza == 1.5.1
stanza == 1.7.0
underthesea == 6.8.0

## python-mecab-ko
Expand Down
10 changes: 5 additions & 5 deletions tests/tests_nlp/tests_stanza/test_stanza.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,20 +42,20 @@ def wl_test_stanza(
else:
lang_stanza = lang

if lang_stanza in wl_nlp_utils.LANGS_STANZA_TOKENIZERS:
if lang_stanza in wl_nlp_utils.get_langs_stanza(main, util_type = 'word_tokenizers'):
wl_test_sentence_tokenize(lang, results_sentence_tokenize)
wl_test_word_tokenize(lang, results_word_tokenize)

if lang_stanza in wl_nlp_utils.LANGS_STANZA_POS_TAGGERS:
if lang_stanza in wl_nlp_utils.get_langs_stanza(main, util_type = 'pos_taggers'):
wl_test_pos_tag(lang, results_pos_tag, results_pos_tag_universal)

if lang_stanza in wl_nlp_utils.LANGS_STANZA_LEMMATIZERS:
if lang_stanza in wl_nlp_utils.get_langs_stanza(main, util_type = 'lemmatizers'):
wl_test_lemmatize(lang, results_lemmatize)

if lang_stanza in wl_nlp_utils.LANGS_STANZA_DEPENDENCY_PARSERS:
if lang_stanza in wl_nlp_utils.get_langs_stanza(main, util_type = 'dependency_parsers'):
wl_test_dependency_parse(lang, results_dependency_parse)

if lang_stanza in wl_nlp_utils.LANGS_STANZA_SENTIMENT_ANALYZERS:
if lang_stanza in wl_nlp_utils.get_langs_stanza(main, util_type = 'sentiment_analyzers'):
wl_test_sentiment_analyze(lang, results_sentiment_analayze)

def wl_test_get_lang_util(main, lang):
Expand Down
6 changes: 5 additions & 1 deletion tests/tests_nlp/tests_stanza/test_stanza_snd.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,14 @@
from tests.tests_nlp.tests_stanza import test_stanza

def test_stanza_snd():
results_pos_tag = [('سنڌي', 'NOUN'), ('(', 'PUNCT'), ('/', 'NUM'), ('ˈsɪndi', 'PROPN'), ('/', 'NUM'), ('[6]सिन्धी,', 'PUNCT'), ('Sindhi', 'NOUN'), (')', 'PUNCT'), ('ھڪ', 'NUM'), ('ھند', 'PROPN'), ('-', 'PUNCT'), ('آريائي', 'ADJ'), ('ٻولي', 'NOUN'), ('آھي', 'AUX'), ('جيڪا', 'DET'), ('سنڌ', 'PROPN'), ('جي', 'ADP'), ('تاريخي', 'ADJ'), ('خطي', 'NOUN'), ('۾', 'ADP'), ('سنڌي', 'NOUN'), ('ماڻھن', 'NOUN'), ('پاران', 'ADP'), ('ڳالھائي', 'VERB'), ('وڃي', 'VERB'), ('ٿي', 'AUX'), ('.', 'PUNCT')]

test_stanza.wl_test_stanza(
lang = 'snd',
results_sentence_tokenize = ['سنڌي (/ˈsɪndi/[6]सिन्धी, Sindhi)ھڪ ھند-آريائي ٻولي آھي جيڪا سنڌ جي تاريخي خطي ۾ سنڌي ماڻھن پاران ڳالھائي وڃي ٿي.', 'سنڌي پاڪستان جي صوبي سنڌ جي سرڪاري ٻولي آھي.', '[7][8][9] انڊيا ۾، سنڌي وفاقي سرڪار پاران مڃتا حاصل ڪيل ٻولين يعني شيڊيولڊ ٻولين مان ھڪ آھي.', 'گھڻا سنڌي ڳالھائيندڙ پاڪستان جي صوبي سنڌ، ڀارت جي رياست گجرات جي علائقي ڪڇ ۽ مھاراشٽر جي علائقي الھاس نگر ۾ رھن ٿا.', 'ڀارت ۾ بچيل ڳالھائيندڙ سنڌي ھندو آھن جن پاڪستان جي آزادي کان بعد 1948ع ۾ ڀارت ۾ رھائش اختيار ڪئي ۽ باقي سنڌي سڄي دنيا جي مختلف علائقن ۾ رھن ٿا.', 'سنڌي ٻولي پاڪستان جي صوبن سنڌ، بلوچستان ۽ پنجاب، سان گڏوگڏ ڀارت جي رياستن راجستان، پنجاب ۽ گجرات ۾ ڳالھائي وڃي ٿي.', 'ان سان گڏوگڏ ھانگ ڪانگ، عمان، انڊونيشيا، سنگاپور، گڏيل عرب اماراتون، گڏيل بادشاھت ۽ آمريڪا ۾ لڏي ويل جماعتن پاران بہ ڳالھائي وڃي ٿي.', '[10]'],
results_word_tokenize = ['سنڌي', '(', '/', 'ˈsɪndi', '/', '[6]सिन्धी,', 'Sindhi', ')', 'ھڪ', 'ھند', '-', 'آريائي', 'ٻولي', 'آھي', 'جيڪا', 'سنڌ', 'جي', 'تاريخي', 'خطي', '۾', 'سنڌي', 'ماڻھن', 'پاران', 'ڳالھائي', 'وڃي', 'ٿي', '.']
results_word_tokenize = ['سنڌي', '(', '/', 'ˈsɪndi', '/', '[6]सिन्धी,', 'Sindhi', ')', 'ھڪ', 'ھند', '-', 'آريائي', 'ٻولي', 'آھي', 'جيڪا', 'سنڌ', 'جي', 'تاريخي', 'خطي', '۾', 'سنڌي', 'ماڻھن', 'پاران', 'ڳالھائي', 'وڃي', 'ٿي', '.'],
results_pos_tag = results_pos_tag,
results_pos_tag_universal = results_pos_tag
)

if __name__ == '__main__':
Expand Down
8 changes: 8 additions & 0 deletions tests/tests_settings/test_settings_global.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
import spacy_lookups_data

from tests import wl_test_init
from wordless.wl_nlp import wl_nlp_utils
from wordless.wl_settings import wl_settings_global
from wordless.wl_utils import wl_conversion

Expand Down Expand Up @@ -333,6 +334,13 @@ def check_settings_global(self):

self.check_missing_extra_langs(langs_supported, langs, f"Stanza's {msg_lang_util}")

assert set(langs_stanza_sentence_tokenizers) | {'other'} == wl_nlp_utils.get_langs_stanza(main, util_type = 'sentence_tokenizers')
assert set(langs_stanza_word_tokenizers) | {'other'} == wl_nlp_utils.get_langs_stanza(main, util_type = 'word_tokenizers')
assert set(langs_stanza_pos_taggers) == wl_nlp_utils.get_langs_stanza(main, util_type = 'pos_taggers')
assert set(langs_stanza_lemmatizers) == wl_nlp_utils.get_langs_stanza(main, util_type = 'lemmatizers')
assert set(langs_stanza_dependency_parsers) == wl_nlp_utils.get_langs_stanza(main, util_type = 'dependency_parsers')
assert set(langs_stanza_sentiment_analyzers) == wl_nlp_utils.get_langs_stanza(main, util_type = 'sentiment_analyzers')

# Check for missing and extra languages
settings_langs = [lang[0] for lang in settings_global['langs'].values()]
settings_langs_lang_utils = set([
Expand Down
2 changes: 1 addition & 1 deletion utils/wl_generate_acks.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@
['simplemma', 'https://github.com/adbar/simplemma', '0.9.1', 'Adrien Barbaresi', 'MIT', 'https://github.com/adbar/simplemma/blob/main/LICENSE'],
['spaCy', 'https://spacy.io/', '3.7.2', "Matthew Honnibal, Ines Montani, Sofie Van Landeghem,<br>Adriane Boyd, Paul O'Leary McCann", 'MIT', 'https://github.com/explosion/spaCy/blob/master/LICENSE'],
['spacy-pkuseg', 'https://github.com/explosion/spacy-pkuseg', '0.0.33', 'Ruixuan Luo (罗睿轩), Jingjing Xu (许晶晶),<br>Xuancheng Ren (任宣丞), Yi Zhang (张艺),<br>Zhiyuan Zhang (张之远), Bingzhen Wei (位冰镇),<br>Xu Sun (孙栩)<br>Adriane Boyd, Ines Montani', 'MIT', 'https://github.com/explosion/spacy-pkuseg/blob/master/LICENSE'],
['Stanza', 'https://github.com/stanfordnlp/stanza', '1.5.1', 'Peng Qi (齐鹏), Yuhao Zhang (张宇浩),<br>Yuhui Zhang (张钰晖), Jason Bolton,<br>Tim Dozat, John Bauer', 'Apache-2.0', 'https://github.com/stanfordnlp/stanza/blob/main/LICENSE'],
['Stanza', 'https://github.com/stanfordnlp/stanza', '1.7.0', 'Peng Qi (齐鹏), Yuhao Zhang (张宇浩),<br>Yuhui Zhang (张钰晖), Jason Bolton,<br>Tim Dozat, John Bauer', 'Apache-2.0', 'https://github.com/stanfordnlp/stanza/blob/main/LICENSE'],
['SudachiPy', 'https://github.com/WorksApplications/sudachi.rs', '0.6.7', 'Works Applications Co., Ltd.', 'Apache-2.0', 'https://github.com/WorksApplications/sudachi.rs/blob/develop/LICENSE'],
['Underthesea', 'https://undertheseanlp.com/', '6.8.0', 'Vu Anh', 'GPL-3.0-or-later', 'https://github.com/undertheseanlp/underthesea/blob/main/LICENSE'],
['wordcloud', 'https://github.com/amueller/word_cloud', '1.9.3', 'Andreas Christian Müller', 'MIT', 'https://github.com/amueller/word_cloud/blob/main/LICENSE']
Expand Down
62 changes: 18 additions & 44 deletions wordless/wl_nlp/wl_nlp_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -104,39 +104,14 @@ def to_lang_util_texts(main, util_type, util_codes):
'tur', 'urd'
]

LANGS_STANZA_TOKENIZERS = [
'afr', 'ara', 'hye', 'hyw', 'eus', 'bel', 'bul', 'mya', 'bxr', 'cat',
'lzh', 'zho_cn', 'zho_tw', 'chu', 'cop', 'hrv', 'ces', 'dan', 'nld', 'eng',
'myv', 'est', 'fao', 'fin', 'fra', 'fro', 'glg', 'deu', 'got', 'grc',
'ell', 'hbo', 'heb', 'hin', 'hun', 'isl', 'ind', 'gle', 'ita', 'jpn',
'kaz', 'kor', 'kmr', 'kir', 'lat', 'lav', 'lij', 'lit', 'mlt', 'glv',
'mar', 'pcm', 'nob', 'nno', 'fas', 'pol', 'qpm', 'por', 'ron', 'rus',
'orv', 'sme', 'san', 'gla', 'srp_latn', 'snd', 'slk', 'slv', 'hsb', 'spa',
'swe', 'tam', 'tel', 'tha', 'tur', 'ukr', 'urd', 'uig', 'vie', 'cym',
'wol', 'other'
]
LANGS_STANZA_POS_TAGGERS = [
'afr', 'ara', 'hye', 'hyw', 'eus', 'bel', 'bul', 'bxr', 'cat', 'lzh',
'zho_cn', 'zho_tw', 'chu', 'cop', 'hrv', 'ces', 'dan', 'nld', 'eng', 'myv',
'est', 'fao', 'fin', 'fra', 'fro', 'glg', 'deu', 'got', 'grc', 'ell',
'hbo', 'heb', 'hin', 'hun', 'isl', 'ind', 'gle', 'ita', 'jpn', 'kaz',
'kor', 'kmr', 'kir', 'lat', 'lav', 'lij', 'lit', 'mlt', 'glv', 'mar',
'pcm', 'nob', 'nno', 'fas', 'pol', 'qpm', 'por', 'ron', 'rus', 'orv',
'sme', 'san', 'gla', 'srp_latn', 'slk', 'slv', 'hsb', 'spa', 'swe', 'tam',
'tel', 'tur', 'ukr', 'urd', 'uig', 'vie', 'cym', 'wol'
]
LANGS_STANZA_LEMMATIZERS = [
'afr', 'ara', 'hye', 'hyw', 'eus', 'bel', 'bul', 'bxr', 'cat', 'lzh',
'zho_cn', 'zho_tw', 'chu', 'cop', 'hrv', 'ces', 'dan', 'nld', 'eng', 'myv',
'est', 'fin', 'fra', 'fro', 'glg', 'deu', 'got', 'grc', 'ell', 'hbo',
'heb', 'hin', 'hun', 'isl', 'ind', 'gle', 'ita', 'jpn', 'kaz', 'kor',
'kmr', 'kir', 'lat', 'lav', 'lij', 'lit', 'glv', 'mar', 'pcm', 'nob',
'nno', 'fas', 'pol', 'qpm', 'por', 'ron', 'rus', 'orv', 'sme', 'san',
'gla', 'srp_latn', 'slk', 'slv', 'hsb', 'spa', 'swe', 'tam', 'tur', 'ukr',
'urd', 'uig', 'cym', 'wol'
]
LANGS_STANZA_DEPENDENCY_PARSERS = LANGS_STANZA_POS_TAGGERS
LANGS_STANZA_SENTIMENT_ANALYZERS = ['zho_cn', 'eng', 'deu', 'mar', 'spa', 'vie']
def get_langs_stanza(main, util_type):
langs_stanza = set()

for lang_code, lang_utils in main.settings_global[util_type].items():
if any(('stanza' in lang_util for lang_util in lang_utils)):
langs_stanza.add(lang_code)

return langs_stanza

def check_models(main, langs, lang_utils = None):
def update_gui_stanza(main, err_msg):
Expand Down Expand Up @@ -203,10 +178,7 @@ def update_gui_stanza(main, err_msg):
else:
lang_spacy = wl_conversion.remove_lang_code_suffixes(main, lang)
elif util.startswith('stanza_'):
if lang not in ['zho_cn', 'zho_tw', 'srp_latn']:
lang_stanza = wl_conversion.remove_lang_code_suffixes(main, lang)
else:
lang_stanza = lang
lang_stanza = lang

if (
util.startswith('spacy_')
Expand All @@ -232,7 +204,7 @@ def update_gui_stanza(main, err_msg):
models_ok = False
elif (
util.startswith('stanza_')
and lang_stanza in LANGS_STANZA_TOKENIZERS
and lang_stanza in get_langs_stanza(main, util_type = 'word_tokenizers')
):
worker_download_model = Wl_Worker_Download_Model_Stanza(
main,
Expand Down Expand Up @@ -322,15 +294,15 @@ def run(self):

processors = []

if self.lang in LANGS_STANZA_TOKENIZERS:
if self.lang in get_langs_stanza(self.main, util_type = 'word_tokenizers'):
processors.append('tokenize')
if self.lang in LANGS_STANZA_POS_TAGGERS:
if self.lang in get_langs_stanza(self.main, util_type = 'pos_taggers'):
processors.append('pos')
if self.lang in LANGS_STANZA_LEMMATIZERS:
if self.lang in get_langs_stanza(self.main, util_type = 'lemmatizers'):
processors.append('lemma')
if self.lang in LANGS_STANZA_DEPENDENCY_PARSERS:
if self.lang in get_langs_stanza(self.main, util_type = 'dependency_parsers'):
processors.append('depparse')
if self.lang in LANGS_STANZA_SENTIMENT_ANALYZERS:
if self.lang in get_langs_stanza(self.main, util_type = 'sentiment_analyzers'):
processors.append('sentiment')

if self.lang == 'zho_cn':
Expand All @@ -347,6 +319,7 @@ def run(self):
stanza.download(
lang = lang_stanza,
model_dir = model_dir,
package = 'default',
processors = processors,
proxies = wl_misc.wl_get_proxies(self.main),
download_json = False
Expand Down Expand Up @@ -413,7 +386,7 @@ def init_model_stanza(main, lang, lang_util, tokenized = False):
processors = ['tokenize', 'sentiment']

if (
lang in LANGS_STANZA_TOKENIZERS
lang in get_langs_stanza(main, util_type = 'word_tokenizers')
and (
f'stanza_nlp_{lang}' not in main.__dict__
or set(processors) != set(main.__dict__[f'stanza_nlp_{lang}'].processors)
Expand All @@ -439,6 +412,7 @@ def init_model_stanza(main, lang, lang_util, tokenized = False):
main.__dict__[f'stanza_nlp_{lang}'] = stanza.Pipeline(
lang = lang_stanza,
dir = model_dir,
package = 'default',
processors = processors,
download_method = None,
tokenize_pretokenized = tokenized
Expand Down
1 change: 1 addition & 0 deletions wordless/wl_settings/wl_settings_default.py
Original file line number Diff line number Diff line change
Expand Up @@ -1543,6 +1543,7 @@ def init_settings_default(main):
'san': 'stanza_san',
'gla': 'stanza_gla',
'srp_latn': 'stanza_srp_latn',
'snd': 'stanza_snd',
'slk': 'stanza_slk',
'slv': 'spacy_slv',
'hsb': 'stanza_hsb',
Expand Down
5 changes: 4 additions & 1 deletion wordless/wl_settings/wl_settings_global.py
Original file line number Diff line number Diff line change
Expand Up @@ -875,6 +875,7 @@ def init_settings_global():
_tr('init_settings_global', 'Stanza - Sanskrit part-of-speech tagger'): 'stanza_san',
_tr('init_settings_global', 'Stanza - Scottish Gaelic part-of-speech tagger'): 'stanza_gla',
_tr('init_settings_global', 'Stanza - Serbian (Latin) part-of-speech tagger'): 'stanza_srp_latn',
_tr('init_settings_global', 'Stanza - Sindhi part-of-speech tagger'): 'stanza_snd',
_tr('init_settings_global', 'Stanza - Slovak part-of-speech tagger'): 'stanza_slk',
_tr('init_settings_global', 'Stanza - Slovenian part-of-speech tagger'): 'stanza_slv',
_tr('init_settings_global', 'Stanza - Sorbian (Upper) part-of-speech tagger'): 'stanza_hsb',
Expand Down Expand Up @@ -1724,7 +1725,8 @@ def init_settings_global():

'other': [
'nltk_punkt_eng',
'spacy_sentencizer'
'spacy_sentencizer',
'stanza_eng'
]
},

Expand Down Expand Up @@ -2622,6 +2624,7 @@ def init_settings_global():
'san': ['stanza_san'],
'gla': ['stanza_gla'],
'srp_latn': ['stanza_srp_latn'],
'snd': ['stanza_snd'],
'slk': ['stanza_slk'],

'slv': [
Expand Down

0 comments on commit 52b4f5b

Please sign in to comment.