Skip to content

Conversation

pmachapman
Copy link
Collaborator

@pmachapman pmachapman commented Sep 29, 2025

Fixes sillsdev/serval#749


This change is Reviewable

@codecov-commenter
Copy link

codecov-commenter commented Sep 29, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.34%. Comparing base (15919fb) to head (9f433b6).

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #344   +/-   ##
=======================================
  Coverage   72.33%   72.34%           
=======================================
  Files         418      418           
  Lines       35599    35611   +12     
  Branches     4919     4923    +4     
=======================================
+ Hits        25750    25762   +12     
  Misses       8752     8752           
  Partials     1097     1097           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Enkidu93
Copy link
Collaborator

Enkidu93 commented Sep 30, 2025

I'm taking a look at this Peter and stewing on whether there's a cleaner way to do it. Also, confirming that this plays nicely with the updates being made in the getting-rows-by-ref PR.

@pmachapman
Copy link
Collaborator Author

pmachapman commented Sep 30, 2025

I'm taking a look at this Peter and stewing on whether there's a cleaner way to do it. Also, confirming that this plays nicely with the updates being made in the getting-rows-by-ref PR.

@Enkidu93 I totally understand. Probably the cleanest way I can think of would involve a custom record replacing VerseRef, as VerseRef doesn't like verse numbers less than 0 (technically it can have -1 when initialized, but that was quite hacky). The sorting of the rows in ParallelTextCorpus also added to some of the kludge - verse 0 is the only verse where the verse number appears mid-verse.

Copy link
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's a little awkward, but all the alternatives I could think of felt hacky. I'll leave this to Damien and see if he has any alternative ideas. Every time we think the parser is stable, there's another corner case 😅.

The sorting of the rows in ParallelTextCorpus also added to some of the kludge - verse 0 is the only verse where the verse number appears mid-verse.
Could you give me more details? I tried running some CompareTo()s with verse-0 refs and didn't notice anything odd, but if they aren't being compared properly, that's definitely a bug.

@Enkidu93 reviewed 6 of 6 files at r1, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @ddaspit)


src/SIL.Machine/Corpora/UpdateUsfmParserHandler.cs line 423 at r1 (raw file):

            while (_rowIndex < _rows.Count && sourceIndex < segScrRefs.Count)
            {
                // Handle the special case of verse 0, which although first in the rows, will be retrieved after other segments

After we merge in the the other PR, this will not be the case, so the extra code in this function shouldn't be needed I don't think.

Does the test you wrote pass if this is removed (since the rows are in USFM-, not sort-order)?

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could always treat verse rows differently than non-verse rows, so that they can match at any point in the USFM. Let me play around with it a bit.

@ddaspit reviewed 1 of 6 files at r1, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @pmachapman)

Copy link
Collaborator Author

@pmachapman pmachapman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @Enkidu93)


src/SIL.Machine/Corpora/UpdateUsfmParserHandler.cs line 423 at r1 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

After we merge in the the other PR, this will not be the case, so the extra code in this function shouldn't be needed I don't think.

Does the test you wrote pass if this is removed (since the rows are in USFM-, not sort-order)?

PlaceMarkersUsfmUpdateBlockHandlerTests.UpdateUsfm_SupportVerseZero() fails if this logic is removed and this PR is rebased onto the get_update_rows_by_ref branch, but does so in a way that does not even pass through verse 0 to this function. I'll need to look into this after that PR is merged.

The PlaceMarkersUsfmUpdateBlockHandlerTests.UpdateUsfm_SupportVerseZero() test matches the behavior I saw when I was manually running UsfmManualTests.ParseParallelCorpusAsync() on a corpus that contained the problematic USFM.

@pmachapman pmachapman force-pushed the support_verse_zero_text branch from d6a8a3f to 9f433b6 Compare October 1, 2025 23:00
Copy link
Collaborator Author

@pmachapman pmachapman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 3 of 6 files reviewed, 1 unresolved discussion (waiting on @Enkidu93)


src/SIL.Machine/Corpora/UpdateUsfmParserHandler.cs line 423 at r1 (raw file):

Previously, pmachapman (Peter Chapman) wrote…

PlaceMarkersUsfmUpdateBlockHandlerTests.UpdateUsfm_SupportVerseZero() fails if this logic is removed and this PR is rebased onto the get_update_rows_by_ref branch, but does so in a way that does not even pass through verse 0 to this function. I'll need to look into this after that PR is merged.

The PlaceMarkersUsfmUpdateBlockHandlerTests.UpdateUsfm_SupportVerseZero() test matches the behavior I saw when I was manually running UsfmManualTests.ParseParallelCorpusAsync() on a corpus that contained the problematic USFM.

@Enkidu93 Your PR meant I could make my change a lot neater here. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

When pretranslating a chapter with verse 0, text is placed in the wrong segment
4 participants