Skip to content

Conversation

@weissenh
Copy link
Contributor

@weissenh weissenh commented Nov 19, 2025

(Please replace this text with a description of the changes effected by this pull request.
Include a link to the corresponding Github Issue, if there is one.
Details on how to do this (can be found here).)

cf. #4369 Felix Schneider (graduated from KIT, now at Zoom) asked for disambiguation

Status quo

one page https://aclanthology.org/people/felix-schneider/ with 14 papers, issue submitter asked for disambiguation because two of these are not theirs

Changes

Introduced two new explicit ids (felix-schneider-kit and felix-schneider-fsujena) together with ORCID information. The first person received 12 of the 14 papers, the other one the two remaining ones. There is no catch-all left because all papers could be easily attributed to one of the two.

Should I change felix-schneider-kit to felix-schneider? This will increase likelihood of new papers being again wrongly matched to his name as long as the new author system is not implemented. If that implementation is not too far away, I can do that. This would also mean the user doesn't have to change the ACL Anthology link on their OpenReview profile.

I noticed one paper was missing two authors in metadata compared to the paper, so I added them and added missing hyphens to two other co-author names for the same paper. Compare:

❗ Noticed that the user originally asked about this in a (still open) metadata correction issue that we might want to close #4345 : he tried to add an affiliation to one of his namesake's papers hoping to disambiguate that way. Or do we ever ingest affiliations using metadata corrections?

Note: The issue submitter (kit) might to change the Anthology Link displayed on their OpenReview profile unless we decide the id should omit the -kit part.
Both should be encouraged to submit their ORCID in the future when submitting papers to conferences to aid disambiguation.

Collecting evidence for my changes/verifying information

The issue submitter didn't provide ORCID or degree institution or exact list of his papers. The namesake didn't open any issue.
However, information for these two persons was easy to find.
The XML data, on the other hand, did not include any ORCID information for this name for any of the 14 papers.

Felix Schneider KIT

Felix Schneider Uni Jena

I went through the 14 papers and found consistent information

  • 11 papers were published with KIT affiliation and same email address and consistent topic (Translation) and frequent coauthors (Waibel, Williams), an additional paper was published by the person now at Zoom (consistent with their GitHub user name/profile and career info on OpenReview /ORCID
  • 2 papers --the ones explicitly listed as not theirs by issue submitter-- had the same affiliation and email address at Uni Jena and weren't about translation

cf. #4369 Felix Schneider (graduated from KIT, now at Zoom) asked for disambiguation
@weissenh weissenh added this to the Author page backlog milestone Nov 19, 2025
@weissenh weissenh self-assigned this Nov 19, 2025
@weissenh weissenh linked an issue Nov 19, 2025 that may be closed by this pull request
3 tasks
@weissenh weissenh requested a review from Azax4 November 19, 2025 15:37
<author><first>Alexander</first><last>Waibel</last></author>
<pages>125–130</pages>
<abstract>This paper describes KIT’submission to the IWSLT 2021 Offline Speech Translation Task. We describe a system in both cascaded condition and end-to-end condition. In the cascaded condition, we investigated different end-to-end architectures for the speech recognition module. For the text segmentation module, we trained a small transformer-based model on high-quality monolingual data. For the translation module, our last year’s neural machine translation model was reused. In the end-to-end condition, we improved our Speech Relative Transformer architecture to reach or even surpass the result of the cascade system.</abstract>
<url hash="dac417e1">2021.iwslt-1.13</url>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed one paper was missing two authors in metadata compared to the paper, so I added them and added missing hyphens to two other co-author names for the same paper. Compare:

id: felix-schneider-kit
orcid: 0009-0006-5226-3023
degree: Karlsruhe Institute of Technology
comment: KIT
Copy link
Contributor Author

@weissenh weissenh Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this person have id: felix-schneider without -kit?

Pro:

  • user doesn't have to change the link on their openReview profile
  • most papers belong to this user (12/14)
  • he is the issue submitter, so if we use "first come, first serve" he could reserve the "default" id.

Con:

  • new papers (before new author system is live) will likely get added to this person - someone who has already complained about papers showing up under their name that are not theirs...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this person have id: felix-schneider without -kit?

I’d say “probably yes”, but I don’t fully remember if we ever made a final decision on our future ID policy, it doesn’t appear to be written down in the wiki at least? @mjpost

Con:

* new papers (before new author system is live) will likely get added to this person - someone who has already complained about papers showing up under their name that are not theirs...
  • I don’t think that’s correct — as soon as the name is ambiguous, manual intervention is required during ingestion, there’s no “default association” in that case.
  • I don’t think we should base any decisions on how the old system works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I don’t fully remember if we ever made a final decision on our future ID policy, it doesn’t appear to be written down in the wiki at least?

From the author page plan: https://github.com/acl-org/acl-anthology/wiki/Author-Page-Plan#disambiguation (last sentence before next section)

This means that the first person to have an explicit ID created for their name will "lock in" that ID (e.g. yang-liu) to themselves, while other persons with the same name will need a disambiguator appended to it.

So I thought maybe since the KIT person was the first to ask, he can reserve this ID for himself? Normally when dealing with author page requests right now, I need to reserve the simplest id to the catch-all "May refer to several persons" case because I can't always fully disambiguate the name, but just single out one author from "the rest". So right now, the first person to ask often gets a more complicated ID - unless I can assign each paper to a specific person, like in this case.

I don’t think that’s correct — as soon as the name is ambiguous, manual intervention is required during ingestion, there’s no “default association” in that case.

Interesting, there are some author page requests recently where for an ambiguous name a new paper got assigned to the catch-all ("May refer to several persons") rather than an existing, more specific ID (with degree institution as suffix) because ORCID-matching isn't enabled yet. However, I didn't check when the new paper got ingested and how the ingestion script looked at that point in time.
So I assumed that if there is a new "Felix Schneider" paper and there is a felix-schneider id, that paper will get mapped to this id, even when there is another "Felix Schneider" in name variants. I agree that one shouldn't rely too much on the old system logic when a new system is under way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I thought maybe since the KIT person was the first to ask, he can reserve this ID for himself?

There’s definitely lots of discussion on this exact topic buried in the new-author-system mega-thread, which is why I pinged @mjpost in the hopes that he remembers if we took a decision on that :) (don’t have time to dig it up right now)

I don’t think that’s correct — as soon as the name is ambiguous, manual intervention is required during ingestion, there’s no “default association” in that case.

Interesting, there are some author page requests recently where for an ambiguous name a new paper got assigned to the catch-all ("May refer to several persons") rather than an existing, more specific ID (with degree institution as suffix) because ORCID-matching isn't enabled yet.

I don’t know the ingestion scripts super well either, but what I meant is that under the old system, IDs do not need to get written to the XML (by default) except in ambiguous cases, so when there’s ambiguity, some decisions needs to be taken which ID to choose. It may be that we used to default to the "catch-all" ID when there’s no time to disambiguate manually. In any case, that’s the old system — let’s move on with the assumption that the new system will be in place for the next major ingestion.

<author id="felix-schneider-fsujena"><first>Felix</first><last>Schneider</last></author>
<author><first>Sven</first><last>Sickert</last></author>
<author><first>Phillip</first><last>Brandes</last></author>
<author><first>Sophie</first><last>Marshall</last></author>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See issue 4345 : user originally asked about this in a (still open) metadata correction issue that we might want to close : he tried to add an affiliation to one of his namesake's papers hoping to disambiguate that way. Should we close the open issue on this metadata correction or do we ever ingest affiliations using metadata corrections?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m a bit on the fence on this one, I think the reason we record affiliations is because we sometimes get this data in ingestion materials anyway. However, we don’t currently use it for anything or plan to use it for anything, and we definitely don’t want to encourage users to submit metadata requests for this reason. So actually, I guess I’m tending towards “no”. :)

@weissenh weissenh marked this pull request as ready for review November 19, 2025 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Author Metadata: Felix Schneider

3 participants