enforce_same_original_doc_id
parameter & improved text pair metadata
#121
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This changes
construct_text_pair_coref_documents_from_partitions_via_relations()
to add the following to metadata:original_doc_id
: doc id of the original document where thetext
is fromoriginal_doc_id_pair
: doc id of the original document where thetext_pair
is fromoriginal_doc_span
: a dict withstart
andend
keys indicating where thetext
is from with respect to the original docoriginal_doc_span_pair
: a dict withstart
andend
keys indicating where thetext_pair
is from with respect to the original docAlso, this gets correctly propagated in
add_negative_coref_relations()
, when available.Finally, this adds the parameter
enforce_same_original_doc_id
toadd_negative_coref_relations()
. If enabled, only negatives where bothtext
andtext_pair
are from the same original document, get created.