Skip to content

Conversation

@FredericBlum
Copy link
Collaborator

No description provided.

@FredericBlum
Copy link
Collaborator Author

@xrotwang @LinguList Here is the PR. However, I am not 100% happy. It does correctly fix the problem for missing or wrong macroareas. However, in other cases, it removes the macroarea, because the associated glottocode is on a family-level, not a language-level, and does not have a macroarea associated in the glottolog-cldf. How do we handle this?

@LinguList
Copy link
Contributor

The fix would be easy, but of course risk to yield problems later on:

"Macroarea": macmacroarea[0].name if macroarea else language.macroarea,

How many languages suffer from empty macroareas now? This is also important. We exclude so far:

  • languages without coordinates
  • languages without glottocode

Should we extend this?

@LinguList
Copy link
Contributor

Here's a quick hack:

macroarea = languoids[language.glottocode].macroareas
if not macroarea:
    macs = [l.macroareas[0].name for l in languoids[language.glottocode].iter_descendants() if l.macroareas]
    macroarea = sorted(set(macs), key=lambda x: macs.count(x), reverse=True)[0]
else:
    macroarea = macroareas[0].name

@LinguList
Copy link
Contributor

Alternative: determine valid macroareas after having defined all languoids:

        languoids = self.glottolog.cached_languoids
        valid_macs = set([l.macroareas[0].name for l in languoids if l.macroareas])

In the function _add_language then:

"Macroarea": language.macroarea if language.macroarea in valid_macs else languoids[language.glottocode].macroareas[0],

But this may still yield an error, if a maroarea is invalid and the glottocode represents a language that has no macroarea.

@FredericBlum
Copy link
Collaborator Author

Given that we only have 6 valid macro-areas, we might as well hardcode them to avoid the error, and only replace the given option if the current value deviates from the set of valid areas.

@FredericBlum
Copy link
Collaborator Author

But this may still yield an error, if a maroarea is invalid and the glottocode represents a language that has no macroarea.

Well, at least we identify the problem then and can solve it upstream, right? So I think this is a fine solution. If we find an error, we can fix it

@LinguList
Copy link
Contributor

Yes. We could in theory -- but that would lead too far -- even identify macroarea by geolocation (would be more exact).

@xrotwang
Copy link
Contributor

FWIW: glottolog/glottolog-cldf#29

@LinguList
Copy link
Contributor

Ah, nice to see that the discussion contributes to Glottolog :-)

@LinguList
Copy link
Contributor

I am now rerunning the code with the solution proposed (the one that iterates). But we have only 42 cases anyway, so I hope that we can afterwards really make the release 2.1 then.

@LinguList
Copy link
Contributor

@FredericBlum, I'll make another branch now, where I provide the fix.

@LinguList LinguList merged commit 309ebff into main Mar 27, 2025
4 checks passed
@FredericBlum FredericBlum deleted the fix-areas branch April 9, 2025 09:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants