Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add handle_relations_with_same_arguments parameter to RETextClassificationWithIndicesTaskModule #155

Merged
merged 16 commits into from
Feb 17, 2025

Conversation

RainbowRivey
Copy link
Collaborator

@RainbowRivey RainbowRivey commented Feb 10, 2025

If there are multiple relations with same pair of arguments, handle_relations_with_same_arguments defines if we remove both of them (keep_none) or keep the first one (keep_first). Full duplicates (if relation label is also the same) are not affected by this, one relation will be kept and a warning shown.

Note that if collect_statistics=True, final statistics do not include such "full duplicates" either as 'available' nor as 'skipped'. Also, warnings about elements collected in the statistics, e.g. skipped relations with same arguments, will not be shown if collect_statistics=True.

This PR changes the default behavior to keep_none, that's why is labeled as breaking (previously it was implicitly keep_first). Doing so prevents the model from learning conflicting relations and gaining biases for arbitrary 'first' relation occurred.

TODO:

  • Test new behaviour on a downstream task (e.g. runs with 3 seeds at drugprot)

Note: keep_none reduces train instances count of drugprot by 199 (1,16%) from 17058 to 16859.

metric baseline with keep_none
quality (micro f1) 0.7737 (±0.0042) 0.7653 (±0.0022)
Inference time 13.5 (±0.3) 11.2 (±0.5)

@RainbowRivey RainbowRivey changed the title Add relations_with_same_arguments parameter Add relations_with_same_arguments parameter to RETextClassificationWithIndicesTaskModule Feb 10, 2025
@RainbowRivey RainbowRivey changed the title Add relations_with_same_arguments parameter to RETextClassificationWithIndicesTaskModule add relations_with_same_arguments parameter to RETextClassificationWithIndicesTaskModule Feb 10, 2025
@codecov-commenter
Copy link

codecov-commenter commented Feb 10, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.53%. Comparing base (7ad87d8) to head (aba5a1e).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #155      +/-   ##
==========================================
+ Coverage   95.51%   95.53%   +0.02%     
==========================================
  Files          61       61              
  Lines        5212     5237      +25     
==========================================
+ Hits         4978     5003      +25     
  Misses        234      234              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@RainbowRivey RainbowRivey added the breaking Breaking Changes label Feb 10, 2025
Copy link
Owner

@ArneBinder ArneBinder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, some comments below

EDIT: please also add some documentation about the new parameter to the docstring

@ArneBinder
Copy link
Owner

ArneBinder commented Feb 10, 2025

can you add a simple test for the not yet test-covered line? (from codecov report)

@RainbowRivey RainbowRivey changed the title add relations_with_same_arguments parameter to RETextClassificationWithIndicesTaskModule add handle_relations_with_same_arguments parameter to RETextClassificationWithIndicesTaskModule Feb 11, 2025
@RainbowRivey RainbowRivey added the enhancement New feature or request label Feb 11, 2025
Copy link
Owner

@ArneBinder ArneBinder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feedback below

@ArneBinder ArneBinder force-pushed the skipped_same_arguments_skip_both branch from 5872369 to f2a2746 Compare February 11, 2025 19:05
@ArneBinder ArneBinder merged commit f808d85 into main Feb 17, 2025
2 checks passed
@ArneBinder ArneBinder deleted the skipped_same_arguments_skip_both branch February 17, 2025 10:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Breaking Changes enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants