Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sango to English translation #130

Open
micuentadecasa opened this issue Oct 19, 2020 · 5 comments
Open

Sango to English translation #130

micuentadecasa opened this issue Oct 19, 2020 · 5 comments

Comments

@micuentadecasa
Copy link

Hi, I'm trying to use your script in https://github.com/masakhane-io/masakhane-mt/blob/master/starter_notebook.ipynb to create a translator for Sango language, but it fails when trying to download the global test set, it doesn't exist for "test.en-any.en". I think it happens because my src is "sg" and my target is "en"; I tested the inverse (English to Sango) and it worked.

Regards.

@juliakreutzer
Copy link
Collaborator

Hi @micuentadecasa! Yes, you're right, it's because of the language direction: the default in the starter notebook is to translate from English. However, you may use the same data for English to Sango for testing the reverse model as well - just set the paths accordingly so that your source paths point to the Sango portion and the target to the English portion.

@micuentadecasa
Copy link
Author

Hi Julia,

of course, I changed the code as you suggested, but the files that it tries to download doesnt' are in your URL,

Can I suggest some ideas for helping other people:

  1. in the code instead of using test.en-$trg.en, it should be test.$src-$trg.en
  2. as we now have another files, those files should be on your site in https://raw.githubusercontent.com/juliakreutzer/masakhane/master/jw300_utils/test/ , or
  3. stablish how we can create these files in our drives and not having to download from yours

Regards.

@juliakreutzer
Copy link
Collaborator

Dear @micuentadecasa,

thanks for your ideas! I agree that the starter notebook is not adequately equipped for handling other directions. We should at least make that explicit in the notebook. The links to my Github are also not needed anymore, since the test sets are available in the Masakhane Github repo as well. If you have concrete suggestions, please pack them in a PR and we can review and merge them.

Please submit a pull request for adding your new test set to https://github.com/masakhane-io/masakhane-mt/tree/master/jw300_utils/test.

@micuentadecasa
Copy link
Author

micuentadecasa commented Oct 21, 2020

Thanks Julia,

I need a bit more of your help to continue,

it seems there is no corpus for Sango 2 English, "JW300_latest_xml_sg-en.xml.gz not found"

In addition, how can I create the test.sg-en.en file? or the test.sg-en.sg? are these the files to be created for translating from Sango to English?

I'll be happy to create a PR, but as you can see I will need some help from you.

Regards.

@juliakreutzer
Copy link
Collaborator

You can use the same data (English to Sango) and just swap the sides (rename en-sg.en for example to sg-en.en) 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants