Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug fix #145 #143 #135 #153 #146

Merged
merged 1 commit into from
Dec 16, 2020
Merged

bug fix #145 #143 #135 #153 #146

merged 1 commit into from
Dec 16, 2020

Conversation

JarbasAl
Copy link
Collaborator

@JarbasAl JarbasAl commented Nov 2, 2020

  • case preservation in extract_duration
  • extract_datetime number words support
  • number replacement in normalize
  • adds better support for ordinals

fixes #145
fixes #143
fixes #135
fixed #153

@JarbasAl JarbasAl added the bug Something isn't working label Nov 2, 2020
@devs-mycroft devs-mycroft added the CLA: Yes Contributor License Agreement exists (see https://github.com/MycroftAI/contributors) label Nov 2, 2020
@JarbasAl JarbasAl changed the title bug fix #145 #143 #142 bug fix #145 #143 Nov 2, 2020
Copy link
Contributor

@ChanceNCounter ChanceNCounter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the one question, and an out-of-scope observation that jumped out at me.

lingua_franca/lang/parse_en.py Outdated Show resolved Hide resolved
lingua_franca/lang/parse_en.py Show resolved Hide resolved
self.assertEqual(res[1], expected_leftover, "for=" + text)

testExtract("Set the ambush for half an hour",
"2017-06-27 13:34:00", "set ambush")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: remainder is also lowercased here, opening a new issue for this elsewhere

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

documented here #147

@@ -697,15 +739,16 @@ def test_numbers(self):
self.assertEqual(normalize("that's eighteen nineteen twenty"),
"that is 18 19 20")
self.assertEqual(normalize("that's one nineteen twenty two"),
"that is 1 19 20 2")
"that is 1 19 22")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for these unittests we might want to throw a deprecation warning in case some downstream depends on this, relevant discussion MycroftAI/mycroft-core#1962

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont think we need a warning, if anything dowtreamns is depending on this behavior it is already utterly broken for n > 20 in general... we would have received bug reports already.

@JarbasAl JarbasAl changed the title bug fix #145 #143 bug fix #145 #143 #135 Nov 2, 2020
@JarbasAl JarbasAl added the en relates to english language label Nov 2, 2020
@ChanceNCounter
Copy link
Contributor

Wellp, I was just being obtuse with my only review comment =P

Tests passing, satisfies the issues, that DeprecationWarning is up to you. Looks good to me.

@JarbasAl
Copy link
Collaborator Author

JarbasAl commented Nov 2, 2020

id rather not throw the deprecation warning, i consider this a bug, you dont tell your users "we are delaying this fix until next release, work around it somehow"

only pointed that out due to the linked discussion, which should have been considered a bug since the start IMHO

@ChanceNCounter
Copy link
Contributor

We could raise a DeprecationError, so at least stuff in the wild breaks... descriptively?

@ChanceNCounter
Copy link
Contributor

Suppose that'd require you to put the correct behavior on a flag in the interim, which still isn't great...

ordinals=None), 1)

# test plurals
# NOTE plurals are never considered ordinals, but also not
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChanceNCounter @krisgesling does this note make sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the plural form not considered a fraction in the tests where ordinals are True or None?

Copy link
Contributor

@ChanceNCounter ChanceNCounter Nov 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you have to extract ordinals explicitly, and there's no such ordinal as The Three Fifth, it's probably sensible to avoid returning fractions in all cases.

On the other hand, it might make sense to call them plurals ordinals, on the assumption that extract_number("the thirds", ordinals=True) will always be an input failure somewhere upstairs, like STT mishearing "the third."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was unsure of this one, for consistency i thought it made sense that ordinals=True would not parse fractions at all, the same way ordinals=False would only parse fractions. But since we support explict ordinals (1st, 2nd, 3rd, 4th...) i'm not sure....

i guess its a matter of how explicit is this? what are the possible sources of ambiguity?

"the thirds" could very well mean 3 instead of 1/3 , so i don't think this should be considered an explicit fraction

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if when ordinals=True we should consider the plural form an integer?

test/test_parse.py Outdated Show resolved Hide resolved
test/test_parse.py Outdated Show resolved Hide resolved
@JarbasAl JarbasAl marked this pull request as draft November 4, 2020 05:54
@@ -400,7 +409,7 @@ def _extract_whole_number_with_text_en(tokens, short_scale, ordinals):
current_val = val

else:
if all([
if current_val and all([
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an unrelated minor bug fix, during testing i ran into a case where current_val was None and int comparison inside all() would throw an exception

@JarbasAl JarbasAl changed the title bug fix #145 #143 #135 bug fix #145 #143 #135 #153 Nov 4, 2020
@JarbasAl
Copy link
Collaborator Author

JarbasAl commented Nov 4, 2020

eventually ordinals flag in extract number should become an Enum, for backwards compatibility this should not be changed yet but only in next major version

ordinals=None), 1)

# test plurals
# NOTE plurals are never considered ordinals, but also not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the plural form not considered a fraction in the tests where ordinals are True or None?

test/test_parse.py Outdated Show resolved Hide resolved
Comment on lines +116 to +152
self.assertEqual(extract_number("one third of a cup",
ordinals=None), 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to make the ordinals=None behaviour quite explicit in the docs as I can see this causing headaches. But given it defaults to False they will presumably be doing it intentionally.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at some point i would like to replace this with an Enum, i did it this way for backwards compat only. but imho None should be the default in a next major release

Comment on lines 139 to 140
self.assertEqual(extract_number("sixth third"),
1 / 6 / 3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this one makes sense.
"Sixth third" would probably refer to "the sixth instance of a third" which I guess means there's 2 all up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I dunno what should be returned here, but I don't think it should be cumulative.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one has been around for ages, i just changed it to a different place so all related tests are grouped together

i dont think it makes a lot of sense either, but i'm declaring it out of scope since it isnt touched by this PR

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a code comment for this one

test/test_parse.py Outdated Show resolved Hide resolved
Comment on lines +769 to +797
# TODO imperfect test, maybe should return 'my favorite numbers are 20 2',
# let is pass for now since this is likely a STT issue if ever
# encountered in the wild and is somewhat ambiguous, if this was
# spoken by a human the result is what we expect, if in written form
# it is ambiguous but could mean separate numbers
self.assertEqual(normalize('my favorite numbers are twenty 2'),
'my favorite numbers are 22')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may also be worth adding in where both numbers are digits

"my favorite numbers are 20 2"

Which I would expect would remain unchanged.

Copy link
Collaborator Author

@JarbasAl JarbasAl Nov 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a test for this, but result is also changed to a single number. This shares logic with extract_number and will be hard to fix without a bigger refactor, if you are ok with that i would like to leave it for a follow up PR

@JarbasAl JarbasAl marked this pull request as ready for review November 5, 2020 22:42
@JarbasAl
Copy link
Collaborator Author

JarbasAl commented Nov 29, 2020

whats blocking this?

some more context, here is a use case where the lack of case preservation is messing things up, in simple_NER it mutates my input text, so i'm actually mutating it before the function call so i can do a meaningful comparison, otherwise i will get false diffs between strings due to the lowercasing

@ChanceNCounter
Copy link
Contributor

Next in line after Catalan, let this block release IMO. Shouldn't be too bad. The will-be merge conflicts look like they should be click-fixes.

@ChanceNCounter
Copy link
Contributor

squashed and pushed. speak soonish or forever hold your peace 🚀

@ChanceNCounter ChanceNCounter merged commit ef111f1 into master Dec 16, 2020
@ChanceNCounter ChanceNCounter deleted the fixyfixes branch December 16, 2020 00:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA: Yes Contributor License Agreement exists (see https://github.com/MycroftAI/contributors) en relates to english language
Projects
None yet
4 participants