-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add further tests for number expansion: ordinals #6
Comments
Since ordinals are already included in formatRules.txt we might start here easiest: @JMK-CB : could you please check them again? How should that look like?
For feminine and neutral resolution you would have to consider the last letter of the following noun?
|
What about the other cases? So far we have only nominative.. |
Thanks for opening this and the info! I'm writing integration tests that should make it easier to specify the desired behavior. Unfortunately, the hard-wired input/output module chain in MaryTTS is this (extracted from
There doesn't seem to be an intuitive way of handling number expansion at the However, we can definitely move forward with simple things, and then reassess. |
Incidentally, @JMK-CB how should real numbers like 1,14159 be spelled out? What's the word for "comma", or is it a period instead? |
I have composed a list of test sentences for ordinal numbers combined with the different cases. It probably is not realistically possible to solve all those specific cases but we could try to tackle some of them (some don´t really occur very often anyway). I am also compiling a similar list for special cases with cardinal numbers. |
comma is "koma" in both Sorbian languages. As far as I can assess this I´d say real numbers should not be a problem because Sorbian simply counts the numbers one by one without any modification by cases or similar. So your example should always result in "jedyn koma jedyn štyri jedyn pjeć dźewjeć". However, Astrid has directed my attention to fractions. Those won´t be just that easy to handle but I will have a look at it and compile a list of testable cases. |
Those are correct. |
@psibre: So decimals could be one of the "simple things"? |
Now I'm unsure about the cases, Jan's list scares me ;-) And it doesn't only concern ordinals.. Shall we discuss this tomorrow in Zoom? |
Thanks for the details and feedback! Regarding the real numbers, that's something I expect to easily solve later today. The list of sentences with ordinal numbers is a great resource, and we can use it to investigate how to support those linguistic cases (pun intended). |
I agree with Astrid, the default should be nominative masculine because thats probably what people would expect as a technical default. Other forms could be felt as erroneous. However, I think it would be great to at least be able to recognize the grammatical gender the numbers refer to. I think a wrong gender could be more confusing (because of other nouns in the context) than expecting a case ending but getting nominative. |
I hope we can get some momentum back into this topic :-) At the moment the number expansion is only done for cardinals (nominative masculine).
where
Possible regexes should be (I tried java notation):
All other cases of "\d+." would be the else variant and should be expanded according to the ordinalRule (=%spelloutspellout-ordinal-maskulinum) @JMK-CB : Plural nominative seems to behave like neutrum? Only maybe add another ordinalPluralRule if the following word ends with -i? I'm not sure how and where exactly implement these if's and else's in the Preprocess-file, so @psibre maybe you can help? |
@JMK-CB @aStereoID Thanks very much for your help!
I think we need to follow up on this to get more flexibility, especially regarding real numbers (in addition to natural numbers), ordinals (in addition to cardinals), and special numbers such as years.
We should do that in a new issue.
Originally posted by @psibre in https://github.com/psibre/marytts-lang-hsb/issues/2#issuecomment-792598040
The text was updated successfully, but these errors were encountered: