Disambiguate two Felix Schneider: KIT vs. Uni Jena #6531

weissenh · 2025-11-19T15:35:52Z

(Please replace this text with a description of the changes effected by this pull request.
Include a link to the corresponding Github Issue, if there is one.
Details on how to do this (can be found here).)

cf. #4369 Felix Schneider (graduated from KIT, now at Zoom) asked for disambiguation

Status quo

one page https://aclanthology.org/people/felix-schneider/ with 14 papers, issue submitter asked for disambiguation because two of these are not theirs

Changes

Introduced two new explicit ids (felix-schneider-kit and felix-schneider-fsujena) together with ORCID information. The first person received 12 of the 14 papers, the other one the two remaining ones. There is no catch-all left because all papers could be easily attributed to one of the two.

❓ Should I change felix-schneider-kit to felix-schneider? This will increase likelihood of new papers being again wrongly matched to his name as long as the new author system is not implemented. If that implementation is not too far away, I can do that. This would also mean the user doesn't have to change the ACL Anthology link on their OpenReview profile.

I noticed one paper was missing two authors in metadata compared to the paper, so I added them and added missing hyphens to two other co-author names for the same paper. Compare:

Metadata: https://aclanthology.org/2021.iwslt-1.13
PDF: https://aclanthology.org/2021.iwslt-1.13.pdf

❗ Noticed that the user originally asked about this in a (still open) metadata correction issue that we might want to close #4345 : he tried to add an affiliation to one of his namesake's papers hoping to disambiguate that way. Or do we ever ingest affiliations using metadata corrections?

Note: The issue submitter (kit) might to change the Anthology Link displayed on their OpenReview profile unless we decide the id should omit the -kit part.
Both should be encouraged to submit their ORCID in the future when submitting papers to conferences to aid disambiguation.

Collecting evidence for my changes/verifying information

The issue submitter didn't provide ORCID or degree institution or exact list of his papers. The namesake didn't open any issue.
However, information for these two persons was easy to find.
The XML data, on the other hand, did not include any ORCID information for this name for any of the 14 papers.

Felix Schneider KIT

in issue mentioned Google Scholar: https://scholar.google.com/citations?hl=en&user=8IF9cNUAAAAJ
on orcid-org found matching profile: https://orcid.org/0009-0006-5226-3023 : mentions Employment (KIT, now Zoom), master degree from KIT, and one paper also found in the Anthology: 2020.iwslt-1.28
https://openreview.net/profile?id=~Felix_Schneider5 with Google Scholar link, oRCID link, Anthology link (needs to be changed after PR is merged!), Career information (KIT, now Zoom) and 15 publications listed (haven't checked all of them but spotted a few titles matching)

Felix Schneider Uni Jena

https://orcid.org/0009-0008-9953-6695 mentioning 7 works, among them the two present in the Anthology and explicitly listed by the issue submitter not to be theirs
https://openreview.net/profile?id=~Felix_Schneider4 listing Google Scholar, career path (just Uni Jena)
https://scholar.google.com/citations?hl=en&user=gvUnZhUAAAAJ

I went through the 14 papers and found consistent information

11 papers were published with KIT affiliation and same email address and consistent topic (Translation) and frequent coauthors (Waibel, Williams), an additional paper was published by the person now at Zoom (consistent with their GitHub user name/profile and career info on OpenReview /ORCID
2 papers --the ones explicitly listed as not theirs by issue submitter-- had the same affiliation and email address at Uni Jena and weren't about translation

cf. #4369 Felix Schneider (graduated from KIT, now at Zoom) asked for disambiguation

weissenh · 2025-11-19T15:41:39Z

data/xml/2021.iwslt.xml

+      <author><first>Alexander</first><last>Waibel</last></author>
      <pages>125–130</pages>
      <abstract>This paper describes KIT’submission to the IWSLT 2021 Offline Speech Translation Task. We describe a system in both cascaded condition and end-to-end condition. In the cascaded condition, we investigated different end-to-end architectures for the speech recognition module. For the text segmentation module, we trained a small transformer-based model on high-quality monolingual data. For the translation module, our last year’s neural machine translation model was reused. In the end-to-end condition, we improved our Speech Relative Transformer architecture to reach or even surpass the result of the cascade system.</abstract>
      <url hash="dac417e1">2021.iwslt-1.13</url>


I noticed one paper was missing two authors in metadata compared to the paper, so I added them and added missing hyphens to two other co-author names for the same paper. Compare:

Metadata: https://aclanthology.org/2021.iwslt-1.13

PDF: https://aclanthology.org/2021.iwslt-1.13.pdf

weissenh · 2025-11-19T15:45:35Z

data/yaml/name_variants.yaml

+  id: felix-schneider-kit
+  orcid: 0009-0006-5226-3023
+  degree: Karlsruhe Institute of Technology
+  comment: KIT


Should this person have id: felix-schneider without -kit?

Pro:

user doesn't have to change the link on their openReview profile

most papers belong to this user (12/14)

he is the issue submitter, so if we use "first come, first serve" he could reserve the "default" id.

Con:

new papers (before new author system is live) will likely get added to this person - someone who has already complained about papers showing up under their name that are not theirs...

Should this person have id: felix-schneider without -kit?

I’d say “probably yes”, but I don’t fully remember if we ever made a final decision on our future ID policy, it doesn’t appear to be written down in the wiki at least? @mjpost

Con:

* new papers (before new author system is live) will likely get added to this person - someone who has already complained about papers showing up under their name that are not theirs...

I don’t think that’s correct — as soon as the name is ambiguous, manual intervention is required during ingestion, there’s no “default association” in that case.

I don’t think we should base any decisions on how the old system works.

but I don’t fully remember if we ever made a final decision on our future ID policy, it doesn’t appear to be written down in the wiki at least?

From the author page plan: https://github.com/acl-org/acl-anthology/wiki/Author-Page-Plan#disambiguation (last sentence before next section)

This means that the first person to have an explicit ID created for their name will "lock in" that ID (e.g. yang-liu) to themselves, while other persons with the same name will need a disambiguator appended to it.

So I thought maybe since the KIT person was the first to ask, he can reserve this ID for himself? Normally when dealing with author page requests right now, I need to reserve the simplest id to the catch-all "May refer to several persons" case because I can't always fully disambiguate the name, but just single out one author from "the rest". So right now, the first person to ask often gets a more complicated ID - unless I can assign each paper to a specific person, like in this case.

I don’t think that’s correct — as soon as the name is ambiguous, manual intervention is required during ingestion, there’s no “default association” in that case.

Interesting, there are some author page requests recently where for an ambiguous name a new paper got assigned to the catch-all ("May refer to several persons") rather than an existing, more specific ID (with degree institution as suffix) because ORCID-matching isn't enabled yet. However, I didn't check when the new paper got ingested and how the ingestion script looked at that point in time.
So I assumed that if there is a new "Felix Schneider" paper and there is a felix-schneider id, that paper will get mapped to this id, even when there is another "Felix Schneider" in name variants. I agree that one shouldn't rely too much on the old system logic when a new system is under way.

So I thought maybe since the KIT person was the first to ask, he can reserve this ID for himself?

There’s definitely lots of discussion on this exact topic buried in the new-author-system mega-thread, which is why I pinged @mjpost in the hopes that he remembers if we took a decision on that :) (don’t have time to dig it up right now)

I don’t think that’s correct — as soon as the name is ambiguous, manual intervention is required during ingestion, there’s no “default association” in that case.

Interesting, there are some author page requests recently where for an ambiguous name a new paper got assigned to the catch-all ("May refer to several persons") rather than an existing, more specific ID (with degree institution as suffix) because ORCID-matching isn't enabled yet.

I don’t know the ingestion scripts super well either, but what I meant is that under the old system, IDs do not need to get written to the XML (by default) except in ambiguous cases, so when there’s ambiguity, some decisions needs to be taken which ID to choose. It may be that we used to default to the "catch-all" ID when there’s no time to disambiguate manually. In any case, that’s the old system — let’s move on with the assumption that the new system will be in place for the next major ingestion.

weissenh · 2025-11-19T16:18:20Z

data/xml/2022.mwe.xml

+      <author id="felix-schneider-fsujena"><first>Felix</first><last>Schneider</last></author>
      <author><first>Sven</first><last>Sickert</last></author>
      <author><first>Phillip</first><last>Brandes</last></author>
      <author><first>Sophie</first><last>Marshall</last></author>


See issue 4345 : user originally asked about this in a (still open) metadata correction issue that we might want to close : he tried to add an affiliation to one of his namesake's papers hoping to disambiguate that way. Should we close the open issue on this metadata correction or do we ever ingest affiliations using metadata corrections?

I’m a bit on the fence on this one, I think the reason we record affiliations is because we sometimes get this data in ingestion materials anyway. However, we don’t currently use it for anything or plan to use it for anything, and we definitely don’t want to encourage users to submit metadata requests for this reason. So actually, I guess I’m tending towards “no”. :)

Disambiguate two Felix Schneider: KIT vs. Uni Jena

17425a1

cf. #4369 Felix Schneider (graduated from KIT, now at Zoom) asked for disambiguation

weissenh added this to the Author page backlog milestone Nov 19, 2025

weissenh self-assigned this Nov 19, 2025

weissenh linked an issue Nov 19, 2025 that may be closed by this pull request

Author Metadata: Felix Schneider #4369

Open

3 tasks

weissenh requested a review from Azax4 November 19, 2025 15:37

weissenh commented Nov 19, 2025

View reviewed changes

weissenh marked this pull request as ready for review November 19, 2025 16:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Disambiguate two Felix Schneider: KIT vs. Uni Jena #6531

Disambiguate two Felix Schneider: KIT vs. Uni Jena #6531

Uh oh!

weissenh commented Nov 19, 2025 •

edited

Loading

Uh oh!

weissenh Nov 19, 2025

Uh oh!

weissenh Nov 19, 2025 •

edited

Loading

Uh oh!

mbollmann Nov 19, 2025

Uh oh!

weissenh Nov 19, 2025

Uh oh!

mbollmann Nov 19, 2025

Uh oh!

weissenh Nov 19, 2025

Uh oh!

mbollmann Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Disambiguate two Felix Schneider: KIT vs. Uni Jena #6531

Are you sure you want to change the base?

Disambiguate two Felix Schneider: KIT vs. Uni Jena #6531

Uh oh!

Conversation

weissenh commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Status quo

Changes

Collecting evidence for my changes/verifying information

Uh oh!

weissenh Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

weissenh Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mbollmann Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

weissenh Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

mbollmann Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

weissenh Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

mbollmann Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

weissenh commented Nov 19, 2025 •

edited

Loading

weissenh Nov 19, 2025 •

edited

Loading