Missing de-en sentence-level data #15

LeonardoEmili · 2024-04-09T13:43:20Z

Hi there,

thank you gathering all the data used in the competitions in a single point.

I was looking at wmt23 test data and it looks like en-de/de-en data, being the only directions at document-level, should be organised under the directories wmt23 (for paragraph-level data) and wmt23.sent/ (for sentence-level data). However this is true only for en-de, with respectively 557 and 1950 segments, on the other hand sentence-level data for de-en is missing to date.

Could you please update the archive to include the original tokenised data for de-en?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing de-en sentence-level data #15

Missing de-en sentence-level data #15

LeonardoEmili commented Apr 9, 2024

Missing de-en sentence-level data #15

Missing de-en sentence-level data #15

Comments

LeonardoEmili commented Apr 9, 2024