Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Languages for "add new instrument name" feature #157

Open
kunfang98927 opened this issue Sep 16, 2024 · 6 comments · May be fixed by #158
Open

Update Languages for "add new instrument name" feature #157

kunfang98927 opened this issue Sep 16, 2024 · 6 comments · May be fixed by #158

Comments

@kunfang98927
Copy link
Contributor

Before I complete the "add new instrument name" feature. I have a question about updating language list in UMIL.

Based on my design, users can add new names to an instrument in a modal like this:

image

After the "publish" button is clicked, "wikidata_id" of the instrument, "language_code", "name", "source" will be saved to database. A new instrument name will be created by the following script in views/instrument_list.py:

InstrumentName.objects.create(
                    instrument=instrument,
                    language=language,
                    name=name,
                    source_name=source,
                )

According to our model design in models/instrument_name.py,

class InstrumentName(models.Model):
    instrument = models.ForeignKey("Instrument", on_delete=models.CASCADE)
    language = models.ForeignKey("Language", on_delete=models.PROTECT)
    name = models.CharField(max_length=100, blank=False)
    source_name = models.CharField(
        max_length=50, blank=False, help_text="Who or what called the instrument this?"
    )  # Stand-in for source data; format TBD

we should always choose a "instrument" and "language" from our database when publishing new instrument names. However, currently we only have two languages, "English" and "French", in our database. So should we synchronize as many languages as we can with Wikidata, or should we have our own language list (can be a subset of wikidata's language list) in UMIL so that users can choose from the list when adding new names for a instrument? @fujinaga @dchiller

@dchiller
Copy link
Contributor

dchiller commented Sep 16, 2024

A few comments:

  1. English and French were just chosen as initial options so I think we should feel free to add more now that are adding functionality to add more names.

  2. There is a set list of languages that can be used for the "names" of Q objects in Wikidata, so we could just add that set list to the database and periodically update if/as new options are added to Wikidata.

  3. Do we want people to be able to add languages not in Wikidata?

It seems that at the very least all languages in Wikidata should be supported.

@kunfang98927
Copy link
Contributor Author

Thank you for the comments.

3. Do we want people to be able to add languages not in Wikidata?

No. I think using wikidata's language list is enough for us.

kunfang98927 added a commit that referenced this issue Sep 17, 2024
@kunfang98927 kunfang98927 linked a pull request Sep 17, 2024 that will close this issue
@kunfang98927
Copy link
Contributor Author

@dchiller @fujinaga For the question "which languages are supported for adding item name to Wikidata", I haven't figured out the exact answer. Here is some other possible ways to get a language list.

  1. I found a language table in https://www.wikidata.org/wiki/Help:Wikimedia_language_codes/lists/all
    In this language table there are 714 unique QIDs (the third column). But one thing confusing is that which column is the "language code" we should use. It seems that the first column may be the most possible. But I found that the code in first column is not always unique. For example:
image It seems that both "als" and "gsw" are language codes for "Alemannic" but within different code system: image

So I think if we are going to use this language table, we can just copy it and clean the data as we want.

  1. We can use the Wikidata API to get a language list. A possible way suggested by ChatGPT is: https://www.wikidata.org/w/api.php?action=query&meta=siteinfo&siprop=languages&format=json. This can return a list of languages (600 items) with their respective language codes.

Between these two methods, I prefer the first one which is to create our own clean language table.

@fujinaga
Copy link
Member

Method 1 is fine.
We should always use ISO language codes.
Have you looked at this? https://www.mediawiki.org/wiki/Extension:UniversalLanguageSelector
When asking for the name (should be called "label" as in Wikidata, so "Name/Label"(?)), make sure you ask for the Description and optionally "Also known as".

@dchiller
Copy link
Contributor

For the question "which languages are supported for adding item name to Wikidata", I haven't figured out the exact answer

I also found this incredibly difficult to definitively determine. The results of my research is in #27 -- you found some of the same ways I did.

It seems that both "als" and "gsw" are language codes for "Alemannic" but within different code system.

I actually think this is a case of Wikidata being wrong and therefore maybe a point against this table (because it relies on the contents of Wikidata to populate). In the Universal Language Selector, als looks like it refers to a dialect of Albanian.

@dchiller
Copy link
Contributor

This is also maybe a useful tool (there's a link to the codebase which we could potentially pull from): https://codelookup.toolforge.org/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants