Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove gender bias (metaissue) #29

Open
jonorthwash opened this issue Apr 24, 2021 · 10 comments
Open

Remove gender bias (metaissue) #29

jonorthwash opened this issue Apr 24, 2021 · 10 comments

Comments

@jonorthwash
Copy link
Member

jonorthwash commented Apr 24, 2021

Gender bias permeates Apertium language modules and translation pairs. The following are some examples:

  • In languages with gendered pronouns, articles, etc., the masculine form is usually used as the lemma;
  • In translation from a language without gendered pronouns to one with, the masculine pronoun is usually chosen as the default translation, to the exclusion of non-gendered options (such as singular "they" in English);
  • In translating from a non-gendered noun in one language to a "natural-gendered" noun in another language, the masculine member of the pair is usually selected;
  • In languages where a "masculine plural" form is used to refer to mixed-gender groups (e.g., Spanish "los abuelos"), this is usually tagged as <m><pl> instead of <mf><pl>;
  • Some pronouns used to refer to non-binary individuals are not included or the non-binary analyses are not among those available and are not used in translation pairs (such as English singular "they").

This likely just grazes the surface. I'd like to use this issue as a place to gather general observations, carry out general discussion, and link to individual issues that address these problems in specific language modules and translation pairs.

@jonorthwash
Copy link
Member Author

The following PR is a proof-of-concept that addresses part of the non-binary pronoun issue in English: apertium/apertium-eng#37 .

@unhammer
Copy link
Member

unhammer commented Apr 24, 2021

An example from the "gisting pair" sme-nob:

Northern Sámi uses the same pronoun son (gen/acc su) for all genders[0], while Norwegian has to choose between hun/han/hen (<f>/<m>/<mfntlgbtq_> resp.).

In sme→nob, the first version pretty much just used han, then we got some naïve t2x anaphora resolution choosing between han/hun based on proper nouns which was wrong 30% of the time, and when people started using it on news articles, Giellatekno would be contacted by people who were upset because the translator used the wrong pronoun. So we changed it into h_n to signal that we don't know any better, later changed into hun/han which, though a bit verbose, people seemed to like better. (Unlike in Swedish, we can't really use hen for "we don't know"; if you use hen in Norwegian for a named individual it signals that they explicitly prefer that term over han/hun.)

It's not a good solution for quality pairs used for post-editing, but then we don't have any of those going into English anyway …


[0] so now my Twitter bio says son/su (if you want to refer to me in the third person, you have to speak Sámi)

@unhammer
Copy link
Member

unhammer commented Apr 24, 2021

Oh, another item for your list:

  • Use gender-unspecific job titles etc. where possible and natural (e.g. in Norwegian, we say ombudsperson now, not ombudsmann)

@jonorthwash
Copy link
Member Author

This paper has some thoughts about this.

@xavivars
Copy link
Member

On top of the actual changes to the output of the translator, something we've done at Softcatalà for the neural version of the eng-cat is to try to detect words that in English do not have associated gender, but they do in Catalan, is to show a "warning" in the UI.

image

Probably, this information could be sent through the pipeline using blanks, and then optionally show them in the UI, or on the command line.

Do you think that's a good idea, and something doable?

@ftyers
Copy link
Member

ftyers commented Apr 24, 2021

@xavivars I like this solution, it would also be cool to have a switch (like with Unhammer's thing) to prefer m/f or neutral terms.

@mr-martian
Copy link
Contributor

The implementation that comes to mind for what @xavivars describes is to have some way of specifying in the bidix that a particular word-bound blank should be attached to a particular entry.

@flammie
Copy link
Member

flammie commented Apr 25, 2021

For languages like German at least the gender selection is pretty much required for all human nouns or doer nouns at least, not just jobs. I initially just translated the female versions to feminine derivations in deu-fin but that can get quite silly, so it would be nice to have a solution like with user preferences and probably defaulting to neutral forms. How would this be defined, in bidix per entry?

@jonorthwash
Copy link
Member Author

This conversation mostly seems to be relevant to issue #31.

@hectoralos
Copy link
Member

hectoralos commented Apr 25, 2021

* Use gender-unspecific job titles etc. where possible and natural (e.g. in Norwegian, we say _ombudsperson_ now, not _ombudsmann_)

I agree, but this causes problems if we try to choose collective forms which are singular instead of plural. For example, in Spanish to avoid the words "alumnos" and "profesores" (<n><m><pl>), which are perceived by a certain number of people as gendered, the collective nouns "alumnado" and "profesorado" (<n><m><sg>) are often used. The problem is that they often require "long-distant" changes to agreed verbs and other words, which is not trivial, e.g.:

The students in Mr Smith's class are tall.
Los alumnos de la clase del señor Smith son altos.
El alumnado de la clase del señor Smith es alto.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants