Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fula Pulaar <-> English Resource (Sentence Pairs) #146

Open
nikisix opened this issue Mar 23, 2021 · 2 comments
Open

Fula Pulaar <-> English Resource (Sentence Pairs) #146

nikisix opened this issue Mar 23, 2021 · 2 comments

Comments

@nikisix
Copy link

nikisix commented Mar 23, 2021

Hi I would like to contribute a Pulaar translator model, but need pointed to the the sentence pairs. Can anyone help me out?

@juliakreutzer
Copy link
Collaborator

Hi @nikisix ! It looks like JW300 which we used as source for other languages does not include Pulaar. On the OPUS website you can look for other corpora: https://opus.nlpl.eu/ -- It lists CCAligned, Wikimedia, Ubuntu, QED for Fula, but I'm not sure if it's Pulaar. The CCAligned corpus was previously found (https://arxiv.org/abs/2103.12028) to contain mostly noise for Fula, so I would not recommend using it. Perhaps Wikimedia, Ubuntu or QED? These might be quite domain-specific though.

@nikisix
Copy link
Author

nikisix commented Mar 30, 2021

Haven't used those last sources you mention before. I did notice JW300 has code 'fub' for pular defined, but no supporting data files unfortunately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants