Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

language on releases #96

Open
HughP opened this issue Apr 21, 2022 · 3 comments
Open

language on releases #96

HughP opened this issue Apr 21, 2022 · 3 comments

Comments

@HughP
Copy link

HughP commented Apr 21, 2022

Greetings

currently the documentation states the following about language of the release:

language (string, slug): the primary language used in this particular release of the work. Only a single language can be specified; additional languages can be stored in "extra" metadata (TODO: which field?). This field should be a valid RFC1766/ISO639 language code (two letters). AKA, a controlled vocabulary, not a free-form name of the language.

  1. for ISO639 if you want two letter codes ISO639-1 should be specified. ISO639 has 6 parts, the two letter codes comprise part one.

  2. referencing RFC1766 is old form. RFC 1766 was obsoleted by RFC 3066 which was obsoleted by RFC 4647, which was obsoleted by RFC 5646. The stable way to reference this chain is to reference BCP-47.

  3. Is there a downstream technical reason to limit this field to two characters? instead of supporting BCP-47?

@bnewbold
Copy link
Contributor

For 1 and 2, do you want to send a PR with preferred language? Or I can write something.

For 3, I didn't research this decision particularly deeply. One of the goals for this field was to be able to collect metadata from multiple sources (aka, other catalogs) and have them in a consistent format, even if that results in discarding some information from some sources. Another was to be able to aggregate (analytics) and query (search filters) simply. It would probably be possible to have more general purpose fields, and then synthesize them them to, eg, an ISO639-1 field for querying and analytics. These were different design priorities compared to a more authoritative/complete system like wikidata or MARC which are flexible to capture as much information as possible about each individual work.

@HughP
Copy link
Author

HughP commented Apr 21, 2022

BCP-47 allows for iso639-1 to be used for languages it exists for, but then for iso639-2 or 639-3 for languages outside that scope. However if the database field only expects two characters then an iso639-2 or 639-3 code of three characters will throw an error as unexpected length. So issue number 3 is important as it has implications on design requirements for infrastructure.

@HughP
Copy link
Author

HughP commented Apr 21, 2022

I'm happy to video chat for further clarification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants