You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
thank you gathering all the data used in the competitions in a single point.
I was looking at wmt23 test data and it looks like en-de/de-en data, being the only directions at document-level, should be organised under the directories wmt23 (for paragraph-level data) and wmt23.sent/ (for sentence-level data). However this is true only for en-de, with respectively 557 and 1950 segments, on the other hand sentence-level data for de-en is missing to date.
Could you please update the archive to include the original tokenised data for de-en?
The text was updated successfully, but these errors were encountered:
Hi there,
thank you gathering all the data used in the competitions in a single point.
I was looking at wmt23 test data and it looks like
en-de
/de-en
data, being the only directions at document-level, should be organised under the directorieswmt23
(for paragraph-level data) andwmt23.sent/
(for sentence-level data). However this is true only foren-de
, with respectively 557 and 1950 segments, on the other hand sentence-level data forde-en
is missing to date.Could you please update the archive to include the original tokenised data for
de-en
?The text was updated successfully, but these errors were encountered: