Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contributing docs #973

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Contributing docs #973

wants to merge 4 commits into from

Conversation

eu9ene
Copy link
Collaborator

@eu9ene eu9ene commented Jan 6, 2025

Here's a v1. We can add more details later if there is interest.

Also, I will follow up on updating the model training guide and development docs.

closes #387

@eu9ene eu9ene requested review from gregtatum and marco-c January 6, 2025 23:56
@eu9ene eu9ene requested a review from a team as a code owner January 6, 2025 23:56
Copy link
Member

@gregtatum gregtatum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Comment on lines 46 to 47
The main tool for this is [OpusCleaner](https://github.com/hplt-project/OpusCleaner).
If the dataset looks too noisy, create a PR with exclusion rules in the [config-generator](https://github.com/mozilla/translations/blob/main/utils/config_generator.py).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have examples of exclusion rules?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, skip_datasets is in the beginning of the file, it's not per language though. I can clarify that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add community contribution guidelines
3 participants