Remove gender bias (metaissue) #29

jonorthwash · 2021-04-24T14:41:28Z

Gender bias permeates Apertium language modules and translation pairs. The following are some examples:

In languages with gendered pronouns, articles, etc., the masculine form is usually used as the lemma;
In translation from a language without gendered pronouns to one with, the masculine pronoun is usually chosen as the default translation, to the exclusion of non-gendered options (such as singular "they" in English);
In translating from a non-gendered noun in one language to a "natural-gendered" noun in another language, the masculine member of the pair is usually selected;
In languages where a "masculine plural" form is used to refer to mixed-gender groups (e.g., Spanish "los abuelos"), this is usually tagged as <m><pl> instead of <mf><pl>;
Some pronouns used to refer to non-binary individuals are not included or the non-binary analyses are not among those available and are not used in translation pairs (such as English singular "they").

This likely just grazes the surface. I'd like to use this issue as a place to gather general observations, carry out general discussion, and link to individual issues that address these problems in specific language modules and translation pairs.

The text was updated successfully, but these errors were encountered:

jonorthwash · 2021-04-24T15:01:34Z

The following PR is a proof-of-concept that addresses part of the non-binary pronoun issue in English: apertium/apertium-eng#37 .

unhammer · 2021-04-24T18:41:06Z

An example from the "gisting pair" sme-nob:

Northern Sámi uses the same pronoun son (gen/acc su) for all genders[0], while Norwegian has to choose between hun/han/hen (<f>/<m>/<mfntlgbtq_> resp.).

In sme→nob, the first version pretty much just used han, then we got some naïve t2x anaphora resolution choosing between han/hun based on proper nouns which was wrong 30% of the time, and when people started using it on news articles, Giellatekno would be contacted by people who were upset because the translator used the wrong pronoun. So we changed it into h_n to signal that we don't know any better, later changed into hun/han which, though a bit verbose, people seemed to like better. (Unlike in Swedish, we can't really use hen for "we don't know"; if you use hen in Norwegian for a named individual it signals that they explicitly prefer that term over han/hun.)

It's not a good solution for quality pairs used for post-editing, but then we don't have any of those going into English anyway …

[0] so now my Twitter bio says son/su (if you want to refer to me in the third person, you have to speak Sámi)

unhammer · 2021-04-24T18:44:26Z

Oh, another item for your list:

Use gender-unspecific job titles etc. where possible and natural (e.g. in Norwegian, we say ombudsperson now, not ombudsmann)

jonorthwash · 2021-04-24T19:06:34Z

This paper has some thoughts about this.

xavivars · 2021-04-24T22:13:23Z

On top of the actual changes to the output of the translator, something we've done at Softcatalà for the neural version of the eng-cat is to try to detect words that in English do not have associated gender, but they do in Catalan, is to show a "warning" in the UI.

Probably, this information could be sent through the pipeline using blanks, and then optionally show them in the UI, or on the command line.

Do you think that's a good idea, and something doable?

ftyers · 2021-04-24T22:17:21Z

@xavivars I like this solution, it would also be cool to have a switch (like with Unhammer's thing) to prefer m/f or neutral terms.

mr-martian · 2021-04-24T23:56:14Z

The implementation that comes to mind for what @xavivars describes is to have some way of specifying in the bidix that a particular word-bound blank should be attached to a particular entry.

flammie · 2021-04-25T00:45:02Z

For languages like German at least the gender selection is pretty much required for all human nouns or doer nouns at least, not just jobs. I initially just translated the female versions to feminine derivations in deu-fin but that can get quite silly, so it would be nice to have a solution like with user preferences and probably defaulting to neutral forms. How would this be defined, in bidix per entry?

jonorthwash · 2021-04-25T01:30:08Z

This conversation mostly seems to be relevant to issue #31.

hectoralos · 2021-04-25T05:22:49Z

* Use gender-unspecific job titles etc. where possible and natural (e.g. in Norwegian, we say _ombudsperson_ now, not _ombudsmann_)

I agree, but this causes problems if we try to choose collective forms which are singular instead of plural. For example, in Spanish to avoid the words "alumnos" and "profesores" (<n><m><pl>), which are perceived by a certain number of people as gendered, the collective nouns "alumnado" and "profesorado" (<n><m><sg>) are often used. The problem is that they often require "long-distant" changes to agreed verbs and other words, which is not trivial, e.g.:

The students in Mr Smith's class are tall.
Los alumnos de la clase del señor Smith son altos.
El alumnado de la clase del señor Smith es alto.

jonorthwash mentioned this issue Apr 24, 2021

Audit all pairs for gender-specificity in job titles #31

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove gender bias (metaissue) #29

Remove gender bias (metaissue) #29

jonorthwash commented Apr 24, 2021 •

edited

Loading

jonorthwash commented Apr 24, 2021

unhammer commented Apr 24, 2021 •

edited

Loading

unhammer commented Apr 24, 2021 •

edited

Loading

jonorthwash commented Apr 24, 2021

xavivars commented Apr 24, 2021

ftyers commented Apr 24, 2021

mr-martian commented Apr 24, 2021

flammie commented Apr 25, 2021

jonorthwash commented Apr 25, 2021

hectoralos commented Apr 25, 2021 •

edited

Loading

Remove gender bias (metaissue) #29

Remove gender bias (metaissue) #29

Comments

jonorthwash commented Apr 24, 2021 • edited Loading

jonorthwash commented Apr 24, 2021

unhammer commented Apr 24, 2021 • edited Loading

unhammer commented Apr 24, 2021 • edited Loading

jonorthwash commented Apr 24, 2021

xavivars commented Apr 24, 2021

ftyers commented Apr 24, 2021

mr-martian commented Apr 24, 2021

flammie commented Apr 25, 2021

jonorthwash commented Apr 25, 2021

hectoralos commented Apr 25, 2021 • edited Loading

jonorthwash commented Apr 24, 2021 •

edited

Loading

unhammer commented Apr 24, 2021 •

edited

Loading

unhammer commented Apr 24, 2021 •

edited

Loading

hectoralos commented Apr 25, 2021 •

edited

Loading