Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task547_spl_translation_entk_en and task560_alt_translation_en_entk #589

Open
yeganehkordi opened this issue Nov 10, 2021 · 7 comments
Open
Labels

Comments

@yeganehkordi
Copy link
Contributor

The instruction is incomplete, and it seems the task is to remove the space before punctuation marks. @PhaniRohithaKaza Can you explain the tokens?

@PhaniRohithaKaza
Copy link
Contributor

@yeganehkordi The task is sentence translation and the task what usually does is convert given english sentence to it's tokens. But as tokens are also in english the input and output doesn't differ. Even I'm confused on this task. @swarooprm can you please look into it and help us.

@yeganehkordi yeganehkordi changed the title Task547_spl_translation_entk_en Task547_spl_translation_entk_en and task560_alt_translation_en_entk Nov 10, 2021
@yeganehkordi
Copy link
Contributor Author

Yeah, except for punctuations, they seem to be the same.

@Palipoor
Copy link
Contributor

Palipoor commented Feb 8, 2022

Yeah, "English tokens" is one of the translated versions in their data. I think we can drop these two tasks. @danyaljj
Also, some of the other tasks from this dataset have two "Domains". I can fix these in a PR.

@danyaljj
Copy link
Contributor

danyaljj commented Feb 9, 2022

Sounds good, thank you!

@danyaljj
Copy link
Contributor

Moving some of the comments from #709 to here:

@yeganehkordi 's comment:

I'm in favor of not dropping en_entk and entk_en tasks. We already have simpler tasks than these tasks. I think we can change their definition and keep them.

@Palipoor 's comment:

I think those tasks being simple(just putting space before punctuation marks or removing the space before them) is one thing, but the other thing is that the models are probably going to process these at the token level. So it seems like both the input and the output are going to end up being encoded the same way, which makes the task pointless.

@swarooprm
Copy link
Contributor

In my opinion, we could keep en_entk and entk_en tasks and mention in the definition that the task is to remove/add space before punctuation. This is a simple task, but not an invalid task.

Let's not worry about how models are going to process it. E.g. even if the encoder ignores the space, the decoder has to decode it back with space.

My opinion is not very strong and I am also fine if we decide to delete it. However, we should note that, creating a task requires significant effort, and deleting is easy. So, we should avoid deleting as much as possible and think about repairing.

@yeganehkordi
Copy link
Contributor Author

yeganehkordi commented Feb 13, 2022

I think in the worst-case scenario, we can shuffle the tokens in the input and change these tasks to "order generation" tasks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants