Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The pronoun "they" can also be singular in English #37

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jonorthwash
Copy link
Member

No description provided.

@ftyers
Copy link
Member

ftyers commented Apr 24, 2021

  • Have you tested this with existing pairs?
  • Will this require retraining the tagger?
  • What are the regressions?

@jonorthwash
Copy link
Member Author

  • Have you tested this with existing pairs?

Not yet.

  • Will this require retraining the tagger?

Probably

  • What are the regressions?

Not yet known; it's a proof of concept.

@ftyers
Copy link
Member

ftyers commented Apr 24, 2021

Adding in ambiguity usually means having to do something with it, otherwise it will likely make things worse. I have no disagreement with the principle, but this requires more thought than just adding an analysis.

@jonorthwash
Copy link
Member Author

Yes, as with most commits, this needs additional work downstream.

@ftyers
Copy link
Member

ftyers commented Apr 24, 2021

Yes, as with most commits, this needs additional work downstream.

I encourage you to do them and attach them to this PR.

@jonorthwash
Copy link
Member Author

Yes, as with most commits, this needs additional work downstream.

I encourage you to do them and attach them to this PR.

I am not a dev for most of the pairs this will affect. Community buy-in here is a must.

@ftyers
Copy link
Member

ftyers commented Apr 24, 2021

No-one is a dev for most of the pairs with English.

@jonorthwash
Copy link
Member Author

According to @mr-martian, pairs with nor, fin, gle, hye, kmr, and gl mention 3rd person pronouns in their dix files, but don't hardcode <pl>.

@jonorthwash
Copy link
Member Author

None of these are production-level pairs.

@xavivars
Copy link
Member

xavivars commented Apr 24, 2021

At least @MarcRiera should take a look, who actively contributes to eng-cat

@MarcRiera
Copy link
Member

apertium-eng-cat has all pronouns in the bidix with gender and number tags, so adding this won't break anything. The pair will simply ignore the existence of singular "they".

I'm for the change, but I'd rather add it with sp as the number instead of sg. This is what we already do with "you": we only analyse sp and allow generation via sp, sgand pl. Considering it still behaves morphologically as plural, I'm not sure there'd be any significant advantage in having two forms and needing to choose one explicitly during disambiguation. Except very specific cases where antecedents make it clear it's in singular, we'd default to plural.

Besides the disambiguation aspects, this requires transfer modifications in each pair to recognise the singular pronoun and translate it accordingly (in English-Catalan, where explicit pronouns are usually removed, it should be simple).

"They/them/their/theirs" also appear in a few paradigms not covered by the initial commit, but there's time to change them once we agree what to do.

@jonorthwash
Copy link
Member Author

I like the idea of using <sp>. This will affect pairs though.

It's worth noting that in at least some varieties of English (including mine), one place where <sg> and <pl> forms differ is "themself"/"themselves".

@MarcRiera
Copy link
Member

I like the idea of using <sp>. This will affect pairs though.

It's worth noting that in at least some varieties of English (including mine), one place where <sg> and <pl> forms differ is "themself"/"themselves".

Sorry for the late reply.

The most affected pairs (with testvoc errors) would be the ones already mentioned that reference <p3> but don't have <pl> hardcoded. So either way they will require changes, both in the bidix and in transfer rules. Changes would be also necessary for pairs hardcoding <pl> in analysis to avoid marking "them" as an unknown word.

Regarding "themself"/"themselves", there're currently two entries, one for each. The same happens for "yourself"/"yourselves". I don't see any problem there, other than a possible simplification by combining them into a single entry.

@flammie
Copy link
Member

flammie commented May 11, 2021

According to @mr-martian, pairs with nor, fin, gle, hye, kmr, and gl mention 3rd person pronouns in their dix files, but don't hardcode <pl>.

fwiw Fin-eng is not production / stable and can be updated when as necessary, since it was mostly experiments for wmt and wmt doesn't do fin-eng anymore it's even less important in a way. +1 for defaulting to gender-neutral pronouns in lack of evidence otherwise.

<e r="RL"><p><l>you</l> <r>prpers<s n="prn"/><s n="subj"/><s n="p2"/><s n="m"/><s n="sp"/></r></p></e>
<e r="RL"><p><l>I</l> <r>prpers<s n="prn"/><s n="subj"/><s n="p1"/><s n="f"/><s n="sg"/></r></p></e>
<e r="RL"><p><l>you</l> <r>prpers<s n="prn"/><s n="subj"/><s n="p2"/><s n="f"/><s n="sg"/></r></p></e>
<e r="RL"><p><l>we</l> <r>prpers<s n="prn"/><s n="subj"/><s n="p1"/><s n="f"/><s n="pl"/></r></p></e>
<e r="RL"><p><l>you</l> <r>prpers<s n="prn"/><s n="subj"/><s n="p2"/><s n="f"/><s n="pl"/></r></p></e>
<e r="RL"><p><l>they</l> <r>prpers<s n="prn"/><s n="subj"/><s n="p3"/><s n="f"/><s n="pl"/></r></p></e>
<e r="RL"><p><l>they</l> <r>prpers<s n="prn"/><s n="subj"/><s n="p3"/><s n="f"/><s n="sg"/></r></p></e>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will result in ^prpers<prn><subj><p3><f><sg>$ generating she/they and similarly for line 881.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants