Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

publish on pypi #9

Open
lsmith77 opened this issue Oct 19, 2022 · 8 comments
Open

publish on pypi #9

lsmith77 opened this issue Oct 19, 2022 · 8 comments

Comments

@lsmith77
Copy link

it would be awesome to get the project registered on https://pypi.org/

@DuyguA
Copy link
Owner

DuyguA commented Oct 19, 2022

Thanks for your comment! I made the library quite some time ago, I don't remember why I skipped registering to pypi. Though project is Python3 compatible, still I want to do some revisions. I do it when I have time, after that I can register the new package.

@lsmith77
Copy link
Author

that would be amazing. I was planning to try out this project. We are currently using https://github.com/gambolputty/german-nouns but are hoping to find a single library that can handle nouns, verbs and adjectives for German.

@lsmith77
Copy link
Author

FYI our use case is our inclusive writing assistant https://www.witty.works/ and we are looking for ways to make our alternatives grammatically correct.

so we will need to align the word(s) we detected as problematic with the alternatives.

f.e. ambitionierten => engagierten

@DuyguA
Copy link
Owner

DuyguA commented Oct 19, 2022

FYI our use case is our inclusive writing assistant https://www.witty.works/ and we are looking for ways to make our alternatives grammatically correct.

so we will need to align the word(s) we detected as problematic with the alternatives.

f.e. ambitionierten => engagierten

Ah OK, got it so you need to match the morphological features as well. OK then, I can update you from here when I'm finished.

@lsmith77
Copy link
Author

exactly. thank you so much for your work

@lsmith77
Copy link
Author

another wrinkle is sicherzustellen which has the spacy lemma sicherstellen.

so if we have an alternative umsetzen we need to transform this to umzustellen or an alternative bewirken needs to be come zu bewirken.

not sure if compound word splitting is within the scope here.

@DuyguA
Copy link
Owner

DuyguA commented Jan 17, 2023

another wrinkle is sicherzustellen which has the spacy lemma sicherstellen.

so if we have an alternative umsetzen we need to transform this to umzustellen or an alternative bewirken needs to be come zu bewirken.

not sure if compound word splitting is within the scope here.

No, compound splitting not in the scope indeed. However, the case of sichzustellen should be fairly easy. The lemma is not a substring of the surface form, and there's a zu in between. If you split the surface form from zu and unite the pieces it becomes the lemma soooo you can divide this word as sicher + zu + stellen .

Actually you can use my German corpus to generate a small model. I believe there are many zu , um and be prefixed words in the corpus, you can show those words to (Phonetisaurus)[https://github.com/AdolfVonKleist/Phonetisaurus] . Phonetisaurus is a g2p originally, it can align sequences. So, you train a efficient seq2seq as input sequence are words as chars, and output words as surface forms you want to create. I have a community day on 27th Jan, if you want I can schedule a small consultation to offer some solutions (or better make a tool for compound analysis, I wanted to develop one for German for some time)

@lsmith77
Copy link
Author

Thank you.

As noted it is not too hard to detect that the source word sicherzustellen and with the spacy lemma sicherstellen has a zu injected.

The hard part is then taken an alternative like bewirken, hochstellen and umstellen and then know where the place the zu to align the form, i.e. zu bewirken, hochzustellen and umzustellen. Now be as a prefix is regular (always prepend zu ) but um is irregular. Also, hochstellen is the form adjective + verb but for that one first has to split the words to be able to determine if it is the given case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants