Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix annotations that are reported as missing after tokenization #16

Merged
merged 4 commits into from
Dec 11, 2023

Conversation

ArneBinder
Copy link
Owner

@ArneBinder ArneBinder commented Dec 11, 2023

This PR fixes which annotations are reported as missing after tokenization. When tokenize_document() returns multiple documents because of partitioning or windowing, the annotations are distributed on the respective result documents. This means, that each result document may lack some annotations but these may be still covered by other result docs. With this PR, we only report the annotations as missing that are not added to any of the result documents (e.g. spans because of alignment errors or relations because their arguments are in different result documents).

This requires: ArneBinder/pytorch-ie#390 (i.e. pytorch-ie>=0.29.4)

@ArneBinder ArneBinder added the bug Something isn't working label Dec 11, 2023
@ArneBinder ArneBinder changed the title fix annotations reported as missed durign tokenization fix annotations that are reported as missing after tokenization Dec 11, 2023
@codecov-commenter
Copy link

codecov-commenter commented Dec 11, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (9ea864d) 96.28% compared to head (9ef8330) 96.33%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #16      +/-   ##
==========================================
+ Coverage   96.28%   96.33%   +0.05%     
==========================================
  Files          23       23              
  Lines        1694     1718      +24     
==========================================
+ Hits         1631     1655      +24     
  Misses         63       63              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ArneBinder ArneBinder linked an issue Dec 11, 2023 that may be closed by this pull request
@ArneBinder ArneBinder merged commit 8d0dbca into main Dec 11, 2023
@ArneBinder ArneBinder deleted the fix_tokenize_document_missed_annotations branch December 11, 2023 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

collect and warn about removed annotations in tokenize_document()
2 participants