You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The issue here is that the max fluency score was set too high, and it doesn't save any of the lines. The fluency value is much lower for Chinese. I think the fix here would be to compute a good fluency score in the config generator.
gregtatum
changed the title
dataset-hplt-mono_v1_2-zh failed
dataset-hplt-mono_v1_2-zh failed due to a too large fluency score in the config
Jan 2, 2025
Interesting, thanks for investigating! So, it depends on the language... Maybe we should switch to HPLT 2.0 and see what approach they use there for cleaning.
https://firefox-ci-tc.services.mozilla.com/tasks/TZWvhasASyWFpk4KnAfNmw/runs/0/logs/public/logs/live.log
The text was updated successfully, but these errors were encountered: