-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coverage of PB-VN mapping is not a strict subset of VerbNet-derived mapping #7
Comments
I see, yes, there a couple of issues here. First, for semlink, we are also using an external file of pb-vn mappings that was generated for a separate project. It isn't directly from either resource, causing some of the disjoint. The other, more pressing issue, is that PB and VN both seem to have ideas about what they map to. For SemLink, we trust PB: the mappings come from what PB says, and from the external file. It's likely the case that VN has more, valid mappings that we could include. Unfortunately it's probably also likely that they conflict in some places. We'll have to do a little study to find where VN's mappings to PB conflict with PB's to VN, where the disjoint is, and how we can expand coverage. @ghamzak is this something CU could look in to? For now, I think all I can say is that we trust SemLink (and thus PB) wrt. mappings - anything that looked suspicious was removed in the automated process, and the PB mapping should then be valid. |
Thanks for the quick reply. Is there any information on how the PB-VN mapping linked above was generated? We are using these mappings in an analysis for a paper, and while it's straightforward to just point to VN3.4 for the mappings that can be derived from it, I'm a bit worried about using the above without being able to cite their provenance. I'm assuming they're not strictly from PB, since PB only contains PB-VN3.2 mappings and some of the classes have been renamed or split in VN3.4 (the original reason I contacted @ghamzak back in March: I had extracted PB-VN3.2 from PB, and was looking for a mapping from VN3.2 to VN3.4 to compose with it). |
We finished a manual update of all the PB-VN3.4 mappings about 2 years ago and are in the process of incorporating it into a planned new release of PB which is taking quite a bit pinger than we had anticipated. I believe that is where those mappings came from.
Martha
On Jul 27, 2021, at 5:28 AM, Aaron Steven White ***@***.***> wrote:
Thanks for the quick reply. Is there any information on how the PB-VN mapping linked above was generated? We are using these mappings in an analysis for a paper, and while it's straightforward to just point to VN3.4 for the mappings that can be derived from it, I'm a bit worried about using the above without being able to cite their provenance. I'm assuming they're not strictly from PB, since PB only contains PB-VN3.2 mappings and some of the classes have been renamed or split in VN3.4 (the original reason I contacted @ghamzak<https://github.com/ghamzak> back in March: I had extracted PB-VN3.2 from PB, and was looking for a mapping from VN3.2 to VN3.4 to compose with it).
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub<#7 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABB327UCWIPJFH6ZTELCRATTZ2J6XANCNFSM5BAY6S6A>.
|
Thanks for the quick reply, @MarthaSPalmer. |
pinger -> longer!
Whew - maybe too fast!
Martha
On Jul 27, 2021, at 10:37 AM, Aaron Steven White ***@***.******@***.***>> wrote:
Thanks for the quick reply, @MarthaSPalmer<https://github.com/MarthaSPalmer>.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#7 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABB327WUZLWJNHGBLTXNRWLTZ3OE5ANCNFSM5BAY6S6A>.
|
Even with the updated mapping, I am running into mismatches. I believe it should be the case that if there is a semlink mapping from a PB roleset to a VN class in from verbnet import VerbNetParser
verbnet = VerbNetParser(version="3.4")
with open('semlink/instances/pb-vn2.json') as f:
semlink = json.load(f)
verbnet_classes = set(verbnet.verb_classes_numerical_dict)
semlink_classes = {vncls for pbroleset, vnclasses in semlink.items() for vncls in vnclasses} ...and then calculate
At least some of these (e.g. 10.6, 72, 105)—maybe all of them—are classes and subclasses that are only found in VN3.2. Indeed, when doing the analysis I'm trying to use this for originally, these classes were exactly the mismatching ones that triggered my initial request for a VN3.2 to VN3.4. At that point, I actually just went through and hand-corrected the mappings on a by-predicate basis as best I could, but it would be really nice to have a canonical mapping that maps to only VN3.4, since my hand-corrected mapping could be wrong in places. |
It's my understanding that with open('semlink/other_resources/external_vn2pb.json') as f:
external_vn2pb = json.load(f)
# get the numeric identifier for each class
external_vn2pb_classes = {'-'.join(c.split('-')[1:]) for c in external_vn2pb} -
external_vn2pb_classes - verbnet_classes I get 12 classes found in
These all appear to be instances where the base class exists in VN3.4 but the subclass doesn't. Maybe these were cases for which an early version of VN3.4 subclassed an existing class but where that subclass was deleted or promoted to its own class? The above would explain some of the mismatches mentioned in the above post, but there still remain 40 classes in |
Hi, I'm running into the same issues as highlighted above, i.e., classes linked to in |
Sorry for the delay, but I'm looking into it now. One thing is that the current version is based on VN3.3, rather than 3.4. I don't know if that accounts for all of the mismatches though. I can say the external_vn2pb.json was built separately, and not linked the 3.3 even, but the incorrect classes should be filtered outwhen semlink is generated. I'll update when I've found out more. UPDATE: It's pulling 3.4, so that shouldn't be an issue. But it looks like there was a bug where it wasn't correctly filtering/updating incorrect PB mappings. Fixing and rerunning now. UPDATE: It appears that was, in fact, the issue. I implemented your test @aaronstevenwhite, and it now returns 0. This test is now included so we can check if these errors are popping up in the future. Note that the external |
When comparing
pb-vn2.json
to a mapping roleset-class mapping derived from VerbNet3.4 itself, I find that the ProbBank rolesets in the domain of each mapping are not in a subset relation with each other as might be expected.To derive the mapping from VerbNet3.4, I use:
When compared to
pb-vn2.json
...I observe the following counts:
The text was updated successfully, but these errors were encountered: