-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add handle_relations_with_same_arguments
parameter to RETextClassificationWithIndicesTaskModule
#155
Conversation
relations_with_same_arguments
parameterrelations_with_same_arguments
parameter to RETextClassificationWithIndicesTaskModule
relations_with_same_arguments
parameter to RETextClassificationWithIndicesTaskModule
relations_with_same_arguments
parameter to RETextClassificationWithIndicesTaskModule
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #155 +/- ##
==========================================
+ Coverage 95.51% 95.53% +0.02%
==========================================
Files 61 61
Lines 5212 5237 +25
==========================================
+ Hits 4978 5003 +25
Misses 234 234 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, some comments below
EDIT: please also add some documentation about the new parameter to the docstring
src/pie_modules/taskmodules/re_text_classification_with_indices.py
Outdated
Show resolved
Hide resolved
can you add a simple test for the not yet test-covered line? (from codecov report) |
relations_with_same_arguments
parameter to RETextClassificationWithIndicesTaskModule
handle_relations_with_same_arguments
parameter to RETextClassificationWithIndicesTaskModule
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feedback below
src/pie_modules/taskmodules/re_text_classification_with_indices.py
Outdated
Show resolved
Hide resolved
src/pie_modules/taskmodules/re_text_classification_with_indices.py
Outdated
Show resolved
Hide resolved
src/pie_modules/taskmodules/re_text_classification_with_indices.py
Outdated
Show resolved
Hide resolved
5872369
to
f2a2746
Compare
If there are multiple relations with same pair of arguments,
handle_relations_with_same_arguments
defines if we remove both of them (keep_none
) or keep the first one (keep_first
). Full duplicates (if relation label is also the same) are not affected by this, one relation will be kept and a warning shown.Note that if
collect_statistics=True
, final statistics do not include such "full duplicates" either as 'available' nor as 'skipped'. Also, warnings about elements collected in the statistics, e.g. skipped relations with same arguments, will not be shown ifcollect_statistics=True
.This PR changes the default behavior to
keep_none
, that's why is labeled asbreaking
(previously it was implicitlykeep_first
). Doing so prevents the model from learning conflicting relations and gaining biases for arbitrary 'first' relation occurred.TODO:
Note:
keep_none
reduces train instances count of drugprot by 199 (1,16%) from 17058 to 16859.