bug fix #145 #143 #135 #153 #146

JarbasAl · 2020-11-02T21:54:49Z

case preservation in extract_duration
extract_datetime number words support
number replacement in normalize
adds better support for ordinals

fixes #145
fixes #143
fixes #135
fixed #153

ChanceNCounter

Just the one question, and an out-of-scope observation that jumped out at me.

lingua_franca/lang/parse_en.py

JarbasAl · 2020-11-02T22:23:40Z

test/test_parse.py

+            self.assertEqual(res[1], expected_leftover, "for=" + text)
+
+        testExtract("Set the ambush for half an hour",
+                    "2017-06-27 13:34:00", "set ambush")


NOTE: remainder is also lowercased here, opening a new issue for this elsewhere

documented here #147

JarbasAl · 2020-11-02T22:24:58Z

test/test_parse.py

@@ -697,15 +739,16 @@ def test_numbers(self):
        self.assertEqual(normalize("that's eighteen nineteen twenty"),
                         "that is 18 19 20")
        self.assertEqual(normalize("that's one nineteen twenty two"),
-                         "that is 1 19 20 2")
+                         "that is 1 19 22")


for these unittests we might want to throw a deprecation warning in case some downstream depends on this, relevant discussion MycroftAI/mycroft-core#1962

i dont think we need a warning, if anything dowtreamns is depending on this behavior it is already utterly broken for n > 20 in general... we would have received bug reports already.

ChanceNCounter · 2020-11-02T23:32:52Z

Wellp, I was just being obtuse with my only review comment =P

Tests passing, satisfies the issues, that DeprecationWarning is up to you. Looks good to me.

JarbasAl · 2020-11-02T23:38:07Z

id rather not throw the deprecation warning, i consider this a bug, you dont tell your users "we are delaying this fix until next release, work around it somehow"

only pointed that out due to the linked discussion, which should have been considered a bug since the start IMHO

ChanceNCounter · 2020-11-03T00:10:54Z

We could raise a DeprecationError, so at least stuff in the wild breaks... descriptively?

ChanceNCounter · 2020-11-03T00:11:32Z

Suppose that'd require you to put the correct behavior on a flag in the interim, which still isn't great...

JarbasAl · 2020-11-04T05:51:11Z

test/test_parse.py

+                                        ordinals=None), 1)
+
+        # test plurals
+        # NOTE plurals are never considered ordinals, but also not


@ChanceNCounter @krisgesling does this note make sense?

Why is the plural form not considered a fraction in the tests where ordinals are True or None?

Since you have to extract ordinals explicitly, and there's no such ordinal as The Three Fifth, it's probably sensible to avoid returning fractions in all cases.

On the other hand, it might make sense to call ~~them~~ plurals ordinals, on the assumption that extract_number("the thirds", ordinals=True) will always be an input failure somewhere upstairs, like STT mishearing "the third."

i was unsure of this one, for consistency i thought it made sense that ordinals=True would not parse fractions at all, the same way ordinals=False would only parse fractions. But since we support explict ordinals (1st, 2nd, 3rd, 4th...) i'm not sure....

i guess its a matter of how explicit is this? what are the possible sources of ambiguity?

"the thirds" could very well mean 3 instead of 1/3 , so i don't think this should be considered an explicit fraction

i wonder if when ordinals=True we should consider the plural form an integer?

test/test_parse.py

JarbasAl · 2020-11-04T06:22:31Z

lingua_franca/lang/parse_en.py

@@ -400,7 +409,7 @@ def _extract_whole_number_with_text_en(tokens, short_scale, ordinals):
                current_val = val

        else:
-            if all([
+            if current_val and all([


this is an unrelated minor bug fix, during testing i ran into a case where current_val was None and int comparison inside all() would throw an exception

JarbasAl · 2020-11-04T16:58:22Z

eventually ordinals flag in extract number should become an Enum, for backwards compatibility this should not be changed yet but only in next major version

krisgesling · 2020-11-04T06:33:52Z

test/test_parse.py

+                                        ordinals=None), 1)
+
+        # test plurals
+        # NOTE plurals are never considered ordinals, but also not


Why is the plural form not considered a fraction in the tests where ordinals are True or None?

test/test_parse.py

krisgesling · 2020-11-04T06:40:38Z

test/test_parse.py

+        self.assertEqual(extract_number("one third of a cup",
+                                        ordinals=None), 1)


We'll need to make the ordinals=None behaviour quite explicit in the docs as I can see this causing headaches. But given it defaults to False they will presumably be doing it intentionally.

at some point i would like to replace this with an Enum, i did it this way for backwards compat only. but imho None should be the default in a next major release

krisgesling · 2020-11-04T21:08:09Z

test/test_parse.py

+        self.assertEqual(extract_number("sixth third"),
+                         1 / 6 / 3)


I don't think this one makes sense.
"Sixth third" would probably refer to "the sixth instance of a third" which I guess means there's 2 all up.

Agreed. I dunno what should be returned here, but I don't think it should be cumulative.

this one has been around for ages, i just changed it to a different place so all related tests are grouped together

i dont think it makes a lot of sense either, but i'm declaring it out of scope since it isnt touched by this PR

added a code comment for this one

test/test_parse.py

krisgesling · 2020-11-04T21:41:14Z

test/test_parse.py

+        # TODO imperfect test, maybe should return 'my favorite numbers are 20 2',
+        #  let is pass for now since this is likely a STT issue if ever
+        #  encountered in the wild and is somewhat ambiguous, if this was
+        #  spoken by a human the result is what we expect, if in written form
+        #  it is ambiguous but could mean separate numbers
+        self.assertEqual(normalize('my favorite numbers are twenty 2'),
+                         'my favorite numbers are 22')


may also be worth adding in where both numbers are digits

"my favorite numbers are 20 2"

Which I would expect would remain unchanged.

added a test for this, but result is also changed to a single number. This shares logic with extract_number and will be hard to fix without a bigger refactor, if you are ok with that i would like to leave it for a follow up PR

JarbasAl · 2020-11-29T17:49:06Z

whats blocking this?

some more context, here is a use case where the lack of case preservation is messing things up, in simple_NER it mutates my input text, so i'm actually mutating it before the function call so i can do a meaningful comparison, otherwise i will get false diffs between strings due to the lowercasing

ChanceNCounter · 2020-12-10T01:01:31Z

Next in line after Catalan, let this block release IMO. Shouldn't be too bad. The will-be merge conflicts look like they should be click-fixes.

ChanceNCounter · 2020-12-15T02:16:31Z

squashed and pushed. speak soonish or forever hold your peace 🚀

JarbasAl added the bug Something isn't working label Nov 2, 2020

JarbasAl requested a review from ChanceNCounter November 2, 2020 21:54

devs-mycroft added the CLA: Yes Contributor License Agreement exists (see https://github.com/MycroftAI/contributors) label Nov 2, 2020

JarbasAl changed the title ~~bug fix #145 #143 #142~~ bug fix #145 #143 Nov 2, 2020

ChanceNCounter reviewed Nov 2, 2020

View reviewed changes

lingua_franca/lang/parse_en.py Outdated Show resolved Hide resolved

lingua_franca/lang/parse_en.py Show resolved Hide resolved

JarbasAl commented Nov 2, 2020

View reviewed changes

JarbasAl mentioned this pull request Nov 2, 2020

bug - extract datetime lowercases remainder #147

Open

JarbasAl changed the title ~~bug fix #145 #143~~ bug fix #145 #143 #135 Nov 2, 2020

JarbasAl added the en relates to english language label Nov 2, 2020

JarbasAl requested a review from krisgesling November 2, 2020 23:41

JarbasAl mentioned this pull request Nov 3, 2020

normalize mishandles multi-word numbers #135

Closed

JarbasAl commented Nov 4, 2020

View reviewed changes

test/test_parse.py Outdated Show resolved Hide resolved

JarbasAl commented Nov 4, 2020

View reviewed changes

test/test_parse.py Outdated Show resolved Hide resolved

JarbasAl marked this pull request as draft November 4, 2020 05:54

JarbasAl commented Nov 4, 2020

View reviewed changes

JarbasAl changed the title ~~bug fix #145 #143 #135~~ bug fix #145 #143 #135 #153 Nov 4, 2020

krisgesling reviewed Nov 4, 2020

View reviewed changes

JarbasAl commented Nov 4, 2020

View reviewed changes

test/test_parse.py Outdated Show resolved Hide resolved

krisgesling reviewed Nov 4, 2020

View reviewed changes

JarbasAl marked this pull request as ready for review November 5, 2020 22:42

fix #145 #143 #135 #153 #146

0fe0ecf

ChanceNCounter force-pushed the fixyfixes branch from 360fda9 to 0fe0ecf Compare December 15, 2020 01:34

ChanceNCounter merged commit ef111f1 into master Dec 16, 2020

ChanceNCounter deleted the fixyfixes branch December 16, 2020 00:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug fix #145 #143 #135 #153 #146

bug fix #145 #143 #135 #153 #146

JarbasAl commented Nov 2, 2020 •

edited

Loading

ChanceNCounter left a comment

JarbasAl Nov 2, 2020

JarbasAl Nov 2, 2020

JarbasAl Nov 2, 2020

JarbasAl Nov 4, 2020

ChanceNCounter commented Nov 2, 2020

JarbasAl commented Nov 2, 2020 •

edited

Loading

ChanceNCounter commented Nov 3, 2020

ChanceNCounter commented Nov 3, 2020

JarbasAl Nov 4, 2020

krisgesling Nov 4, 2020

ChanceNCounter Nov 4, 2020 •

edited

Loading

JarbasAl Nov 4, 2020

JarbasAl Nov 4, 2020

JarbasAl Nov 4, 2020

JarbasAl commented Nov 4, 2020

krisgesling Nov 4, 2020

krisgesling Nov 4, 2020

JarbasAl Nov 4, 2020

krisgesling Nov 4, 2020

ChanceNCounter Nov 4, 2020

JarbasAl Nov 4, 2020

JarbasAl Nov 4, 2020

krisgesling Nov 4, 2020

JarbasAl Nov 4, 2020 •

edited

Loading

JarbasAl commented Nov 29, 2020 •

edited

Loading

ChanceNCounter commented Dec 10, 2020

ChanceNCounter commented Dec 15, 2020

		self.assertEqual(extract_number("one third of a cup",
		ordinals=None), 1)

bug fix #145 #143 #135 #153 #146

bug fix #145 #143 #135 #153 #146

Conversation

JarbasAl commented Nov 2, 2020 • edited Loading

ChanceNCounter left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ChanceNCounter commented Nov 2, 2020

JarbasAl commented Nov 2, 2020 • edited Loading

ChanceNCounter commented Nov 3, 2020

ChanceNCounter commented Nov 3, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ChanceNCounter Nov 4, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JarbasAl commented Nov 4, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JarbasAl Nov 4, 2020 • edited Loading

Choose a reason for hiding this comment

JarbasAl commented Nov 29, 2020 • edited Loading

ChanceNCounter commented Dec 10, 2020

ChanceNCounter commented Dec 15, 2020

JarbasAl commented Nov 2, 2020 •

edited

Loading

JarbasAl commented Nov 2, 2020 •

edited

Loading

ChanceNCounter Nov 4, 2020 •

edited

Loading

JarbasAl Nov 4, 2020 •

edited

Loading

JarbasAl commented Nov 29, 2020 •

edited

Loading