-
-
Notifications
You must be signed in to change notification settings - Fork 17
Correctly parse and place verse text in verse 0 #344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #344 +/- ##
=======================================
Coverage 72.33% 72.34%
=======================================
Files 418 418
Lines 35599 35611 +12
Branches 4919 4923 +4
=======================================
+ Hits 25750 25762 +12
Misses 8752 8752
Partials 1097 1097 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
I'm taking a look at this Peter and stewing on whether there's a cleaner way to do it. Also, confirming that this plays nicely with the updates being made in the getting-rows-by-ref PR. |
@Enkidu93 I totally understand. Probably the cleanest way I can think of would involve a custom |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's a little awkward, but all the alternatives I could think of felt hacky. I'll leave this to Damien and see if he has any alternative ideas. Every time we think the parser is stable, there's another corner case 😅.
The sorting of the rows in
ParallelTextCorpus
also added to some of the kludge - verse 0 is the only verse where the verse number appears mid-verse.
Could you give me more details? I tried running someCompareTo()
s with verse-0 refs and didn't notice anything odd, but if they aren't being compared properly, that's definitely a bug.
@Enkidu93 reviewed 6 of 6 files at r1, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @ddaspit)
src/SIL.Machine/Corpora/UpdateUsfmParserHandler.cs
line 423 at r1 (raw file):
while (_rowIndex < _rows.Count && sourceIndex < segScrRefs.Count) { // Handle the special case of verse 0, which although first in the rows, will be retrieved after other segments
After we merge in the the other PR, this will not be the case, so the extra code in this function shouldn't be needed I don't think.
Does the test you wrote pass if this is removed (since the rows are in USFM-, not sort-order)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could always treat verse rows differently than non-verse rows, so that they can match at any point in the USFM. Let me play around with it a bit.
@ddaspit reviewed 1 of 6 files at r1, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @pmachapman)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @Enkidu93)
src/SIL.Machine/Corpora/UpdateUsfmParserHandler.cs
line 423 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
After we merge in the the other PR, this will not be the case, so the extra code in this function shouldn't be needed I don't think.
Does the test you wrote pass if this is removed (since the rows are in USFM-, not sort-order)?
PlaceMarkersUsfmUpdateBlockHandlerTests.UpdateUsfm_SupportVerseZero()
fails if this logic is removed and this PR is rebased onto the get_update_rows_by_ref
branch, but does so in a way that does not even pass through verse 0 to this function. I'll need to look into this after that PR is merged.
The PlaceMarkersUsfmUpdateBlockHandlerTests.UpdateUsfm_SupportVerseZero()
test matches the behavior I saw when I was manually running UsfmManualTests.ParseParallelCorpusAsync()
on a corpus that contained the problematic USFM.
d6a8a3f
to
9f433b6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 3 of 6 files reviewed, 1 unresolved discussion (waiting on @Enkidu93)
src/SIL.Machine/Corpora/UpdateUsfmParserHandler.cs
line 423 at r1 (raw file):
Previously, pmachapman (Peter Chapman) wrote…
PlaceMarkersUsfmUpdateBlockHandlerTests.UpdateUsfm_SupportVerseZero()
fails if this logic is removed and this PR is rebased onto theget_update_rows_by_ref
branch, but does so in a way that does not even pass through verse 0 to this function. I'll need to look into this after that PR is merged.The
PlaceMarkersUsfmUpdateBlockHandlerTests.UpdateUsfm_SupportVerseZero()
test matches the behavior I saw when I was manually runningUsfmManualTests.ParseParallelCorpusAsync()
on a corpus that contained the problematic USFM.
@Enkidu93 Your PR meant I could make my change a lot neater here. Thank you!
Fixes sillsdev/serval#749
This change is