Skip to content

Commit

Permalink
Dependencies: Upgrade Pyphen to 0.15.0; Utils: Add Pyphen's Basque sy…
Browse files Browse the repository at this point in the history
…llable tokenizer
  • Loading branch information
BLKSerene committed May 18, 2024
1 parent 3669c14 commit 7d77695
Show file tree
Hide file tree
Showing 9 changed files with 12 additions and 5 deletions.
2 changes: 1 addition & 1 deletion ACKS.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ As Wordless stands on the shoulders of giants, I hereby extend my sincere gratit
13|[PyInstaller](http://www.pyinstaller.org/)|6.0|Hartmut Goebel, Jasper Harrison, Bryan A. Jones,<br>Brénainn Woodsend, Rok Mandeljc|[Bootloader-exception](https://github.com/pyinstaller/pyinstaller/blob/develop/COPYING.txt)
14|[pymorphy3](https://github.com/no-plagiarism/pymorphy3)|2.0.1|Mikhail Korobov, Danylo Halaiko|[MIT](https://github.com/no-plagiarism/pymorphy3/blob/master/LICENSE.txt)
15|[pypdf](https://github.com/py-pdf/pypdf)|3.16.2|Mathieu Fenniak, Ashish Kulkarni, Steve Witham, Martin Thoma|[BSD-3-Clause](https://github.com/py-pdf/pypdf/blob/main/LICENSE)
16|[Pyphen](https://pyphen.org/)|0.14.0|Guillaume Ayoub|[GPL-2.0-or-later/LGPL-2.1-or-later/MPL-1.1](https://github.com/Kozea/Pyphen/blob/master/LICENSE)
16|[Pyphen](https://pyphen.org/)|0.15.0|Guillaume Ayoub|[GPL-2.0-or-later/LGPL-2.1-or-later/MPL-1.1](https://github.com/Kozea/Pyphen/blob/master/LICENSE)
17|[PyQt](https://riverbankcomputing.com/software/pyqt/)|5.15.10|Riverbank Computing|[Commercial-License/GPL-3.0-only](https://www.riverbankcomputing.com/static/Docs/PyQt5/introduction.html#license)
18|[PyThaiNLP](https://github.com/PyThaiNLP/pythainlp)|5.0.3|Wannaphong Phatthiyaphaibun (วรรณพงษ์ ภัททิยไพบูลย์)|[Apache-2.0](https://github.com/PyThaiNLP/pythainlp/blob/dev/LICENSE)
19|[python-docx](https://github.com/python-openxml/python-docx)|1.1.0|Steve Canny|[MIT](https://github.com/python-openxml/python-docx/blob/master/LICENSE)
Expand Down
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
### 🎉 New Features
- Settings: Add Settings - Stop Word Lists - Stop Word List Settings - Case-sensitive
- Settings: Add Settings - Tables - Dependency Parser
- Utils: Add Pyphen's Basque syllable tokenizer
- Utils: Add PyThaiNLP's Han-solo
- Utils: Add Stanza's Sindhi part-of-speech tagger
- Utils: Add VADER's sentiment analyzers
Expand Down Expand Up @@ -50,6 +51,7 @@
- Dependencies: Upgrade LaoNLP to 1.1.3
- Dependencies: Upgrade Lingua to 2.0.2
- Dependencies: Upgrade pymorphy3 to 2.0.1
- Dependencies: Upgrade Pyphen to 0.15.0
- Dependencies: Upgrade PyQt to 5.15.10
- Dependencies: Upgrade PyThaiNLP to 5.0.3
- Dependencies: Upgrade python-docx to 1.1.0
Expand Down
2 changes: 1 addition & 1 deletion doc/trs/zho_cn/ACKS.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@
13|[PyInstaller](http://www.pyinstaller.org/)|6.0|Hartmut Goebel, Jasper Harrison, Bryan A. Jones,<br>Brénainn Woodsend, Rok Mandeljc|[Bootloader-exception](https://github.com/pyinstaller/pyinstaller/blob/develop/COPYING.txt)
14|[pymorphy3](https://github.com/no-plagiarism/pymorphy3)|2.0.1|Mikhail Korobov, Danylo Halaiko|[MIT](https://github.com/no-plagiarism/pymorphy3/blob/master/LICENSE.txt)
15|[pypdf](https://github.com/py-pdf/pypdf)|3.16.2|Mathieu Fenniak, Ashish Kulkarni, Steve Witham, Martin Thoma|[BSD-3-Clause](https://github.com/py-pdf/pypdf/blob/main/LICENSE)
16|[Pyphen](https://pyphen.org/)|0.14.0|Guillaume Ayoub|[GPL-2.0-or-later/LGPL-2.1-or-later/MPL-1.1](https://github.com/Kozea/Pyphen/blob/master/LICENSE)
16|[Pyphen](https://pyphen.org/)|0.15.0|Guillaume Ayoub|[GPL-2.0-or-later/LGPL-2.1-or-later/MPL-1.1](https://github.com/Kozea/Pyphen/blob/master/LICENSE)
17|[PyQt](https://riverbankcomputing.com/software/pyqt/)|5.15.10|Riverbank Computing|[Commercial-License/GPL-3.0-only](https://www.riverbankcomputing.com/static/Docs/PyQt5/introduction.html#license)
18|[PyThaiNLP](https://github.com/PyThaiNLP/pythainlp)|5.0.3|Wannaphong Phatthiyaphaibun (วรรณพงษ์ ภัททิยไพบูลย์)|[Apache-2.0](https://github.com/PyThaiNLP/pythainlp/blob/dev/LICENSE)
19|[python-docx](https://github.com/python-openxml/python-docx)|1.1.0|Steve Canny|[MIT](https://github.com/python-openxml/python-docx/blob/master/LICENSE)
Expand Down
2 changes: 1 addition & 1 deletion doc/trs/zho_tw/ACKS.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@
13|[PyInstaller](http://www.pyinstaller.org/)|6.0|Hartmut Goebel, Jasper Harrison, Bryan A. Jones,<br>Brénainn Woodsend, Rok Mandeljc|[Bootloader-exception](https://github.com/pyinstaller/pyinstaller/blob/develop/COPYING.txt)
14|[pymorphy3](https://github.com/no-plagiarism/pymorphy3)|2.0.1|Mikhail Korobov, Danylo Halaiko|[MIT](https://github.com/no-plagiarism/pymorphy3/blob/master/LICENSE.txt)
15|[pypdf](https://github.com/py-pdf/pypdf)|3.16.2|Mathieu Fenniak, Ashish Kulkarni, Steve Witham, Martin Thoma|[BSD-3-Clause](https://github.com/py-pdf/pypdf/blob/main/LICENSE)
16|[Pyphen](https://pyphen.org/)|0.14.0|Guillaume Ayoub|[GPL-2.0-or-later/LGPL-2.1-or-later/MPL-1.1](https://github.com/Kozea/Pyphen/blob/master/LICENSE)
16|[Pyphen](https://pyphen.org/)|0.15.0|Guillaume Ayoub|[GPL-2.0-or-later/LGPL-2.1-or-later/MPL-1.1](https://github.com/Kozea/Pyphen/blob/master/LICENSE)
17|[PyQt](https://riverbankcomputing.com/software/pyqt/)|5.15.10|Riverbank Computing|[Commercial-License/GPL-3.0-only](https://www.riverbankcomputing.com/static/Docs/PyQt5/introduction.html#license)
18|[PyThaiNLP](https://github.com/PyThaiNLP/pythainlp)|5.0.3|Wannaphong Phatthiyaphaibun (วรรณพงษ์ ภัททิยไพบูลย์)|[Apache-2.0](https://github.com/PyThaiNLP/pythainlp/blob/dev/LICENSE)
19|[python-docx](https://github.com/python-openxml/python-docx)|1.1.0|Steve Canny|[MIT](https://github.com/python-openxml/python-docx/blob/master/LICENSE)
Expand Down
2 changes: 1 addition & 1 deletion requirements/requirements_tests.txt
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ khmer-nltk == 1.6
laonlp == 1.1.3
lingua-language-detector == 2.0.2
nltk == 3.8.1
pyphen == 0.14.0
pyphen == 0.15.0
pythainlp == 5.0.3
sacremoses == 0.1.1
simplemma == 0.9.1
Expand Down
2 changes: 2 additions & 0 deletions tests/tests_nlp/test_syl_tokenization.py
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,8 @@ def test_syl_tokenize(lang, syl_tokenizer):
assert syls_tokens == [('Afri', 'kaans'), ('is',), ('ti', 'po', 'lo', 'gies'), ('be', 'skou'), ("'n",), ('In', 'do', 'Eu', 'ro', 'pe', 'se'), (',',), ('Wes', 'Ger', 'maan', 'se'), (',',), ('Ne', 'derfran', 'kie', 'se'), ('taal',), (',',), ('[',), ('2',), (']',), ('wat',), ('aan',), ('die',), ('suid', 'punt'), ('van',), ('Afri', 'ka'), ('on', 'der'), ('in', 'vloed'), ('van',), ('ver', 'skeie'), ('an', 'der'), ('ta', 'le'), ('en',), ('taal', 'groe', 'pe'), ('ont', 'staan'), ('het',), ('.',)]
case 'sqi':
assert syls_tokens == [('Gju', 'ha'), ('shqi', 'pe'), ('(',), ('ose',), ('thjesht',), ('shqi', 'p', 'ja'), (')',), ('ësh', 'të'), ('gju', 'hë'), ('dhe',), ('de', 'gë'), ('e',), ('ve', 'ça', 'n', 'të'), ('e',), ('fa', 'mi', 'l', 'jes'), ('in', 'do', 'e', 'v', 'ro', 'pi', 'ane'), ('që',), ('fli', 'tet'), ('nga',), ('rreth',), ('7', '10'), ('mi', 'li', 'onë'), ('nje', 'rëz'), ('në',), ('bo', 'të'), (',',), ('[',), ('1',), (']',), ('kry', 'esisht'), ('në',), ('Shqi', 'pë', 'ri'), (',',), ('Ko', 'so', 'vë'), ('dhe',), ('Ma', 'qe', 'do', 'ni', 'në'), ('e',), ('Ve', 'ri', 'ut'), (',',), ('por',), ('edhe',), ('në',), ('zo', 'na'), ('të',), ('tje', 'ra'), ('të',), ('Ev', 'ro', 'pës'), ('Ju', 'g', 'li', 'n', 'do', 're'), ('ku',), ('ka',), ('një',), ('po', 'pu', 'll', 'si'), ('shqi', 'p', 'ta', 're'), (',',), ('du', 'ke'), ('pë', 'r', 'f', 'shi', 'rë'), ('Ma', 'lin'), ('e',), ('Zi',), ('dhe',), ('Lu', 'gi', 'nën'), ('e',), ('Pre', 'she', 'vës'), ('.',)]
case 'eus':
assert syls_tokens == [('Eus', 'ka', 'ra'), ('Eus', 'kal'), ('He', 'rri', 'ko'), ('hiz', 'kun', 'tza'), ('da.',), ('[',), ('8',), (']',)]
case 'bel':
assert syls_tokens == [('Бе', 'ла', 'ру́с', 'кая'), ('мо́', 'ва'), ('—',), ('на', 'цы', 'я', 'на', 'ль', 'ная'), ('мо', 'ва'), ('бе', 'ла', 'ру', 'саў'), (',',), ('ува', 'хо', 'дзіць'), ('у',), ('ін', 'да', 'еў', 'ра', 'пей', 'с', 'кую'), ('моў', 'ную'), ('сям',), ("'",), ('ю',), (',',), ('сла', 'вя', 'н', 'с', 'кую'), ('гру', 'пу'), (',',), ('ус', 'хо', 'д', 'не', 'с', 'ла', 'вя', 'н', 'с', 'кую'), ('па', 'д', 'г', 'ру', 'пу'), ('.',)]
case 'bul':
Expand Down
2 changes: 1 addition & 1 deletion utils/wl_generate_acks.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@
['PyInstaller', 'http://www.pyinstaller.org/', '6.0', 'Hartmut Goebel, Jasper Harrison, Bryan A. Jones,<br>Brénainn Woodsend, Rok Mandeljc', 'Bootloader-exception', 'https://github.com/pyinstaller/pyinstaller/blob/develop/COPYING.txt'],
['pymorphy3', 'https://github.com/no-plagiarism/pymorphy3', '2.0.1', 'Mikhail Korobov, Danylo Halaiko', 'MIT', 'https://github.com/no-plagiarism/pymorphy3/blob/master/LICENSE.txt'],
['pypdf', 'https://github.com/py-pdf/pypdf', '3.16.2', 'Mathieu Fenniak, Ashish Kulkarni, Steve Witham, Martin Thoma', 'BSD-3-Clause', 'https://github.com/py-pdf/pypdf/blob/main/LICENSE'],
['Pyphen', 'https://pyphen.org/', '0.14.0', 'Guillaume Ayoub', 'GPL-2.0-or-later/LGPL-2.1-or-later/MPL-1.1', 'https://github.com/Kozea/Pyphen/blob/master/LICENSE'],
['Pyphen', 'https://pyphen.org/', '0.15.0', 'Guillaume Ayoub', 'GPL-2.0-or-later/LGPL-2.1-or-later/MPL-1.1', 'https://github.com/Kozea/Pyphen/blob/master/LICENSE'],
['PyQt', 'https://riverbankcomputing.com/software/pyqt/', '5.15.10', 'Riverbank Computing', 'Commercial-License/GPL-3.0-only', 'https://www.riverbankcomputing.com/static/Docs/PyQt5/introduction.html#license'],
['PyThaiNLP', 'https://github.com/PyThaiNLP/pythainlp', '5.0.3', 'Wannaphong Phatthiyaphaibun (วรรณพงษ์ ภัททิยไพบูลย์)', 'Apache-2.0', 'https://github.com/PyThaiNLP/pythainlp/blob/dev/LICENSE'],
['python-docx', 'https://github.com/python-openxml/python-docx', '1.1.0', 'Steve Canny', 'MIT', 'https://github.com/python-openxml/python-docx/blob/master/LICENSE'],
Expand Down
1 change: 1 addition & 0 deletions wordless/wl_settings/wl_settings_default.py
Original file line number Diff line number Diff line change
Expand Up @@ -1435,6 +1435,7 @@ def init_settings_default(main):
'syl_tokenizer_settings': {
'afr': 'pyphen_afr',
'sqi': 'pyphen_sqi',
'eus': 'pyphen_eus',
'bel': 'pyphen_bel',
'bul': 'pyphen_bul',
'cat': 'pyphen_cat',
Expand Down
2 changes: 2 additions & 0 deletions wordless/wl_settings/wl_settings_global.py
Original file line number Diff line number Diff line change
Expand Up @@ -723,6 +723,7 @@

_tr('wl_settings_global', 'Pyphen - Afrikaans syllable tokenizer'): 'pyphen_afr',
_tr('wl_settings_global', 'Pyphen - Albanian syllable tokenizer'): 'pyphen_sqi',
_tr('wl_settings_global', 'Pyphen - Basque syllable tokenizer'): 'pyphen_eus',
_tr('wl_settings_global', 'Pyphen - Belarusian syllable tokenizer'): 'pyphen_bel',
_tr('wl_settings_global', 'Pyphen - Bulgarian syllable tokenizer'): 'pyphen_bul',
_tr('wl_settings_global', 'Pyphen - Catalan syllable tokenizer'): 'pyphen_cat',
Expand Down Expand Up @@ -2468,6 +2469,7 @@
'syl_tokenizers': {
'afr': ['pyphen_afr'],
'sqi': ['pyphen_sqi'],
'eus': ['pyphen_eus'],
'bel': ['pyphen_bel'],
'bul': ['pyphen_bul'],
'cat': ['pyphen_cat'],
Expand Down

0 comments on commit 7d77695

Please sign in to comment.