-
Notifications
You must be signed in to change notification settings - Fork 367
Disambiguate two Felix Schneider: KIT vs. Uni Jena #6531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
cf. #4369 Felix Schneider (graduated from KIT, now at Zoom) asked for disambiguation
| <author><first>Alexander</first><last>Waibel</last></author> | ||
| <pages>125–130</pages> | ||
| <abstract>This paper describes KIT’submission to the IWSLT 2021 Offline Speech Translation Task. We describe a system in both cascaded condition and end-to-end condition. In the cascaded condition, we investigated different end-to-end architectures for the speech recognition module. For the text segmentation module, we trained a small transformer-based model on high-quality monolingual data. For the translation module, our last year’s neural machine translation model was reused. In the end-to-end condition, we improved our Speech Relative Transformer architecture to reach or even surpass the result of the cascade system.</abstract> | ||
| <url hash="dac417e1">2021.iwslt-1.13</url> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed one paper was missing two authors in metadata compared to the paper, so I added them and added missing hyphens to two other co-author names for the same paper. Compare:
| id: felix-schneider-kit | ||
| orcid: 0009-0006-5226-3023 | ||
| degree: Karlsruhe Institute of Technology | ||
| comment: KIT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this person have id: felix-schneider without -kit?
Pro:
- user doesn't have to change the link on their openReview profile
- most papers belong to this user (12/14)
- he is the issue submitter, so if we use "first come, first serve" he could reserve the "default" id.
Con:
- new papers (before new author system is live) will likely get added to this person - someone who has already complained about papers showing up under their name that are not theirs...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this person have
id: felix-schneiderwithout-kit?
I’d say “probably yes”, but I don’t fully remember if we ever made a final decision on our future ID policy, it doesn’t appear to be written down in the wiki at least? @mjpost
Con:
* new papers (before new author system is live) will likely get added to this person - someone who has already complained about papers showing up under their name that are not theirs...
- I don’t think that’s correct — as soon as the name is ambiguous, manual intervention is required during ingestion, there’s no “default association” in that case.
- I don’t think we should base any decisions on how the old system works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but I don’t fully remember if we ever made a final decision on our future ID policy, it doesn’t appear to be written down in the wiki at least?
From the author page plan: https://github.com/acl-org/acl-anthology/wiki/Author-Page-Plan#disambiguation (last sentence before next section)
This means that the first person to have an explicit ID created for their name will "lock in" that ID (e.g. yang-liu) to themselves, while other persons with the same name will need a disambiguator appended to it.
So I thought maybe since the KIT person was the first to ask, he can reserve this ID for himself? Normally when dealing with author page requests right now, I need to reserve the simplest id to the catch-all "May refer to several persons" case because I can't always fully disambiguate the name, but just single out one author from "the rest". So right now, the first person to ask often gets a more complicated ID - unless I can assign each paper to a specific person, like in this case.
I don’t think that’s correct — as soon as the name is ambiguous, manual intervention is required during ingestion, there’s no “default association” in that case.
Interesting, there are some author page requests recently where for an ambiguous name a new paper got assigned to the catch-all ("May refer to several persons") rather than an existing, more specific ID (with degree institution as suffix) because ORCID-matching isn't enabled yet. However, I didn't check when the new paper got ingested and how the ingestion script looked at that point in time.
So I assumed that if there is a new "Felix Schneider" paper and there is a felix-schneider id, that paper will get mapped to this id, even when there is another "Felix Schneider" in name variants. I agree that one shouldn't rely too much on the old system logic when a new system is under way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I thought maybe since the KIT person was the first to ask, he can reserve this ID for himself?
There’s definitely lots of discussion on this exact topic buried in the new-author-system mega-thread, which is why I pinged @mjpost in the hopes that he remembers if we took a decision on that :) (don’t have time to dig it up right now)
I don’t think that’s correct — as soon as the name is ambiguous, manual intervention is required during ingestion, there’s no “default association” in that case.
Interesting, there are some author page requests recently where for an ambiguous name a new paper got assigned to the catch-all ("May refer to several persons") rather than an existing, more specific ID (with degree institution as suffix) because ORCID-matching isn't enabled yet.
I don’t know the ingestion scripts super well either, but what I meant is that under the old system, IDs do not need to get written to the XML (by default) except in ambiguous cases, so when there’s ambiguity, some decisions needs to be taken which ID to choose. It may be that we used to default to the "catch-all" ID when there’s no time to disambiguate manually. In any case, that’s the old system — let’s move on with the assumption that the new system will be in place for the next major ingestion.
| <author id="felix-schneider-fsujena"><first>Felix</first><last>Schneider</last></author> | ||
| <author><first>Sven</first><last>Sickert</last></author> | ||
| <author><first>Phillip</first><last>Brandes</last></author> | ||
| <author><first>Sophie</first><last>Marshall</last></author> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See issue 4345 : user originally asked about this in a (still open) metadata correction issue that we might want to close : he tried to add an affiliation to one of his namesake's papers hoping to disambiguate that way. Should we close the open issue on this metadata correction or do we ever ingest affiliations using metadata corrections?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m a bit on the fence on this one, I think the reason we record affiliations is because we sometimes get this data in ingestion materials anyway. However, we don’t currently use it for anything or plan to use it for anything, and we definitely don’t want to encourage users to submit metadata requests for this reason. So actually, I guess I’m tending towards “no”. :)
cf. #4369 Felix Schneider (graduated from KIT, now at Zoom) asked for disambiguation
Status quo
one page https://aclanthology.org/people/felix-schneider/ with 14 papers, issue submitter asked for disambiguation because two of these are not theirs
Changes
Introduced two new explicit ids (
felix-schneider-kitandfelix-schneider-fsujena) together with ORCID information. The first person received 12 of the 14 papers, the other one the two remaining ones. There is no catch-all left because all papers could be easily attributed to one of the two.❓ Should I change
felix-schneider-kittofelix-schneider? This will increase likelihood of new papers being again wrongly matched to his name as long as the new author system is not implemented. If that implementation is not too far away, I can do that. This would also mean the user doesn't have to change the ACL Anthology link on their OpenReview profile.I noticed one paper was missing two authors in metadata compared to the paper, so I added them and added missing hyphens to two other co-author names for the same paper. Compare:
❗ Noticed that the user originally asked about this in a (still open) metadata correction issue that we might want to close #4345 : he tried to add an affiliation to one of his namesake's papers hoping to disambiguate that way. Or do we ever ingest affiliations using metadata corrections?
Note: The issue submitter (kit) might to change the Anthology Link displayed on their OpenReview profile unless we decide the id should omit the
-kitpart.Both should be encouraged to submit their ORCID in the future when submitting papers to conferences to aid disambiguation.
Collecting evidence for my changes/verifying information
The issue submitter didn't provide ORCID or degree institution or exact list of his papers. The namesake didn't open any issue.
However, information for these two persons was easy to find.
The XML data, on the other hand, did not include any ORCID information for this name for any of the 14 papers.
Felix Schneider KIT
Felix Schneider Uni Jena
I went through the 14 papers and found consistent information