Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Brazilian Portuguese translation #6199

Merged
merged 5 commits into from
Jul 16, 2024
Merged

Conversation

rffontenelle
Copy link
Contributor

This is the result of the task force by me, Leonardo Fontenelle and Italo Santos to translate data.table.

Besides providing these PO files, I tracked other possible changes I could include in this pull request, however I didn't add because I wanted to check with you first. These are:

  • Add updated POT files (which I used to generate these PO files)
  • Update .dev/CRAN-Release.cmd with mention to pt_BR as well
  • Update .github/workflows/R-CMD-check-occasional.yaml to include pt_BR

Please let me know if I should apply those changes as well.

@MichaelChirico
Copy link
Member

Add updated POT files (which I used to generate these PO files)

Can you clarify here? Ideally all translations (languages) are based off the same .pot file. We only update that .pot file once per release cycle.

Update .dev/CRAN-Release.cmd with mention to pt_BR as well

Yes, please do that here

Update .github/workflows/R-CMD-check-occasional.yaml to include pt_BR

I think no need to do that -- IMO there's no need to have a CI job for each covered language. Currently we include zh_CN because it's a non-Latin script which can bubble up some encoding issues; I would be surprised if pt_BR turned up anything not already covered. The only thing we might want to do is add an RTL language if we ever get RTL translations.

@rffontenelle
Copy link
Contributor Author

rffontenelle commented Jun 23, 2024

Can you clarify here? Ideally all translations (languages) are based off the same .pot file. We only update that .pot file once per release cycle.

Sure. To deliver the result of our work, we generated POT files with potools::po_extract() because we found some strings not in the current POT and 1.16.0 is to be released soon.

However, if you prefer, I can msgmerge it against the POT files currently available in the repository, and the translation messages that might become obsolete (because they don't exist in such POT files) could be reused afterwards once the new POT are generated for the new release.

@MichaelChirico
Copy link
Member

we generated POT files with potools::po_extract() because we found some strings not in the current POT

Got it. No sense doubling effort to try and undo this, though if we're making a playbook for adding other translations later, let's please note that we should work directly with the existing .pot file on first commit.

This time, let's do the following:

  1. In a separate PR, check in the .pot file that was used to translate pt_BR. We'll merge that first
  2. I'll merge here
  3. (simultaneous) I'll ping the zh_CN crew that the new .pot file is ready to go

Co-authored-by: Leonardo Ferreira Fontenelle <[email protected]>
Co-authored-by: Ítalo Santos <[email protected]>
@rffontenelle
Copy link
Contributor Author

PR for updating .pot files opened as #6202. I've generated new files and new strings came up, so PO files updated.

@MichaelChirico
Copy link
Member

.po files LGTM! My understanding of Portugese is quite limited, so I can only say:

  • I checked there are no missing msgstr, i.e. the translations are complete. Nice!
  • I checked tools::checkPoFile() for both .po files
  • I checked the .po metadata, esp. to ensure that they are marked with UTF-8 encoding.

Once the other administrative requests discussed above are included we can go ahead and merge. Thank you to all translators!!

@MichaelChirico MichaelChirico added the translation issues/PRs related to message translation projects label Jul 16, 2024
@MichaelChirico
Copy link
Member

Thanks all!

@MichaelChirico MichaelChirico merged commit d02907d into master Jul 16, 2024
4 checks passed
@MichaelChirico MichaelChirico deleted the rffontenelle-add-pt_BR branch July 16, 2024 18:48
@rffontenelle
Copy link
Contributor Author

@MichaelChirico s/Portugese/Portuguese/ in NEWS.md

@MichaelChirico
Copy link
Member

Wow this typo even has its own Wiktionary entry 🤓

https://en.wiktionary.org/wiki/Portugese

@rffontenelle
Copy link
Contributor Author

Impressive. Might be a very common typo.

@MichaelChirico
Copy link
Member

Found this one in particular worth a chuckle

image

@rikivillalba

This comment was marked as resolved.

1 similar comment
@rikivillalba
Copy link
Contributor

Hello

Please take in account that there is at least a line on [.data.table that regexs on error messages assuming english!

(

if (grepl(":=.*defined for use in j.*only", e$message))
)
(
stopf('Check that is.data.table(DT) == TRUE. Otherwise, :=, `:=`(...) and let(...) are defined for use in j, once only and in particular ways. See help(":=").')
)

That is likely to leave := in i undetected.

PD: I has been working in spanish translations if I have some time I will complete and share. Are there a branch on which commit translations or something?

@tdhock
Copy link
Member

tdhock commented Jul 30, 2024

Hi @Rdatatable/brazil
I am impressed by the speed with which you created this translation.
Can you please explain what was your workflow? What software tools you used?
I think it would be beneficial to share your knowledge, for the benefit of people who are translating data.table to other languages.
If possible please write what you did on the wiki, https://github.com/Rdatatable/data.table/wiki/Translations#software-tools

@leofontenelle
Copy link
Contributor

leofontenelle commented Jul 30, 2024

Rafael will explain the technical bits better, but in short we used a Web translation platform (Transifex), which was interacting with a custom "datatable-pt_BR" git repository with multiple pt_BR.po files. I had split both the pt_BR and the R-pt_BR files by source code file so that (1) we could have more manageable chunks, (2) the same person would translate messages from the same file in the same day, (3) and translate from the corresponding R and C files in tandem. We had an online spreadsheet marking which file was to be translated by whom and reviewed by whom, and whether each was already done. Moreover, we have an WhatsApp group, and everybody was already briefed when we started translating.

@leofontenelle
Copy link
Contributor

I would love to see R packages have access to the Weblate. I'll see if I can write something about splitting the PO files, but most of the time working with split files would be easier with every translator having offline translation tools installed on their computers.

@MichaelChirico
Copy link
Member

I would love to see R packages have access to the Weblate.

cc @daroczig, it's something we discussed briefly, but don't think we're quite ready to support yet. for now we prioritize getting the base+Recommended packages translations in a good workflow.

@ChristianWia
Copy link

.po files work well. I have successfully used .po on vignettes too. Investigating now about cheat sheets translate possibilities.

@rffontenelle
Copy link
Contributor Author

rffontenelle commented Jul 30, 2024

If the issue on R Weblate is being overwhelmed for some reason, I wonder: have you considered using Hosted Weblate? Weblate hosts free/libre software in a "libre plan" that is gratis (zero price) up to 160K strings for the whole project (source language + all translations). C and R po files sums +/- 1200 strings. See https://weblate.org/en/hosting/

@MichaelChirico
Copy link
Member

I don't think it's a server traffic bottleneck, but rather a human maintainer bottleneck :)

@rffontenelle
Copy link
Contributor Author

I have some experience as a maintainer of Fedora Weblate (adding projects, improving project's custom placeholders, handling push/merge/etc. alerts). Let me know if you need a hand on this matter.

@daroczig
Copy link

I would love to see R packages have access to the Weblate.

cc @daroczig, it's something we discussed briefly, but don't think we're quite ready to support yet. for now we prioritize getting the base+Recommended packages translations in a good workflow.

yeah, there's no formal process to add non R Core/Recommended pkgs to the current weblate server, but I don't see any good reason why not change that right now :)

if there's interest, I'm happy to set up a new project and import the PO file(s) as components -- or feel free to do so yourself. Or let's jump on a call at your convenience to discuss?

@rikivillalba
Copy link
Contributor

Dear data.table contributors:

I has recently been working in spanish translations for R and data.table. Currently i've finished to check my translations and also added the most recent commited strings (some message related to inplace droplevels() not translated because not yet in template). These are in https://github.com/rikivillalba/data.table/tree/Spanish-translations

Some comments:

  • I've built and run the checks. Only 1702.3 fails because i'm in Buenos Aires 🤷‍♂️
  • I'm from Argentina. While I think there is not too much dialectal variation given the language, I use (i.e.) "archivo" rather than "fichero", the former being more common in LatAm, the latter in Spain. I use "Usted" the formal 2nd person. Base R has not dialectal variations of spanish.
  • I've been working alone not known wether other people is involved in data.table spanish translations, perhaps you know about.
  • data.table looks pretty "verbose" in the number and size of messages, each .mo file is about 100~150kb (uncompressed) in size so each new language will add to the ~5MB gzip package. ¿Can be them download by request?

I hope these translations be useful. If it is so how to proceed?

Thanks!

@rikivillalba
Copy link
Contributor

rikivillalba commented Jul 31, 2024

@rffontenelle I think this is meant to say "português" rather than "inglês". What do you think?

@tdhock
Copy link
Member

tdhock commented Jul 31, 2024

Hi @rikivillalba thanks for volunteering.
Please consult https://contributor.r-project.org/translations/Conventions_for_Languages/Spanish-specific-translations.html to see what the base R translators have done.
Also please contact @rivaquiroga who is planning to apply for awards to support a Spanish translation project, in response to our call https://rdatatable-community.github.io/The-Raft/posts/2023-10-17-translation_announcement-toby_hocking/ (probably best way for you to get an award is to join forces with Riva's team, instead of submitting your own/separate application)

@rffontenelle
Copy link
Contributor Author

@rffontenelle I think this is meant to say "português" rather than "inglês". What do you think?

@MichaelChirico can you confirm that translators should change the first occurrence of "English" to the target language name in the sentence below?

data.table/R/onAttach.R

Lines 27 to 28 in b8d5f83

if (gettext("TRANSLATION CHECK") != "TRANSLATION CHECK")
packageStartupMessagef("**********\nRunning data.table in English; package support is available in English only. When searching for online help, be sure to also check for the English error message. This can be obtained by looking at the po/R-<locale>.po and po/<locale>.po files in the package source, where the native language and English error messages can be found side-by-side\n**********")

@rikivillalba
Copy link
Contributor

Relevant thread: #3935

I think it will also be helpful to add a note to the start-up message about where how to find help from non-English to English. One way is to provide a link to the .po files about how to convert the error message to English before proceeding. Or perhaps numbering errors? I'm not sure the best way here.

@MichaelChirico
Copy link
Member

@rffontenelle I think this is meant to say "português" rather than "inglês". What do you think?

yes please!

I'm from Argentina.

Please just use es domain for now, if we get volunteers that want to differentiate other dialects later, we can add dialectal variants.

(similarly maybe pt_BR should actually be just pt? I don't know enough about how different pt_PT would be; cc @rffontenelle)

Can be them download by request?

Good point! I was operating under the assumption that the package po/ directory is not bundled up in the tar for CRAN, but I checked and indeed it is!

download.file("https://cran.r-project.org/src/contrib/data.table_1.15.4.tar.gz", tmp<-tempfile())
writeLines(grep("data.table/po/.", untar(tmp, list=TRUE), value=TRUE))
# data.table/po/R-zh_CN.po
# data.table/po/data.table.pot
# data.table/po/zh_CN.po
# data.table/po/R-data.table.pot

Filed #6331 to remove these -- all the installed package needs is the compressed .mo files in inst/po, which are O(100K) per language. I don't think we need to consider excluding the .mo files from the CRAN installation, for now.

@rikivillalba
Copy link
Contributor

rikivillalba commented Aug 1, 2024

Hi @rikivillalba thanks for volunteering.

You're welcome @tdhock. I was not aware of the past october call.
Translations I uploaded to the fork are complete and up to date from my point of view
I went into code when dubious, made several passes, checked with tools::checkPoFile(), passed the msgfmt builds and installed my own data.table in spanish. It is however my work alone, not revised by anyone else. So i'm willing to join efforts with @rivaquiroga and other people has been working in translations.
Thanks.

@MichaelChirico
Copy link
Member

please open a PR when you're ready! and immense thanks!

@Nj221102
Copy link
Contributor

Nj221102 commented Aug 1, 2024

please open a PR when you're ready! and immense thanks!

Hi, Since a lot of translation stuff is going on right now, i was wondering if their is any hindi translation team ?, i will like to help out if there is any, if there is none i might start it on my own as well, WDYT ? @MichaelChirico @tdhock

@rffontenelle
Copy link
Contributor Author

(similarly maybe pt_BR should actually be just pt? I don't know enough about how different pt_PT would be; cc @rffontenelle)

I can't say for sure. Even though they are close languages and the reader from one language might be able to understand the other, the differences make it a little bit weird or unnatural.

My assumption is that the user of data.table and R is not new to English (so many books or man page in English!), hence seeing pt_BR when expected pt_PT (or original English) might a little bit annoying.

Can this question be made public somewhere to try to reach target audience from Portugal and other countries Portuguese-speaking countries?

@tdhock
Copy link
Member

tdhock commented Aug 1, 2024

@Nj221102 you should ask @SaranjeetKaur about joining the Hindi translation team.

@leofontenelle
Copy link
Contributor

If possible please write what you did on the wiki, https://github.com/Rdatatable/data.table/wiki/Translations#software-tools

Just did that. After making split.R more general and creating a combine.R (actually we used a Python tool) I tested whether they would reproduce the combined R-pt_BR.po and pt_BR.po, files, but the next users should use the scripts with care. The wiki did not allow me to attach the scripts, so I added them below the page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
translation issues/PRs related to message translation projects
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants