Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to confirm sense distinctions #243

Open
jmccrae opened this issue Jan 2, 2020 · 10 comments
Open

How to confirm sense distinctions #243

jmccrae opened this issue Jan 2, 2020 · 10 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@jmccrae
Copy link
Member

jmccrae commented Jan 2, 2020

There are many subtle sense distinctions in the WordNet that could either represent sense distinctions not routinely made by English speakers, especially in the case of systematic polysemy or metonymy, where an object is referred to by a related term.

This issue is to capture ideas about how we can make a principled distinction here.

I have two suggestions:

  1. Collocations if the collocations are distinct then we can make a claim that there are distinct senses. This is something that could be measured using sense tagged corpora.
  2. Other dictionaries If we could establish a set of relevant other dictionaries that we trust, we could simply examine all of these and follow the majority view of other lexicographers.

Any other suggestions?

@jmccrae jmccrae added enhancement New feature or request help wanted Extra attention is needed labels Jan 2, 2020
@jmccrae jmccrae added this to the 2021 Release milestone Jan 2, 2020
This was referenced Jan 2, 2020
@arademaker
Copy link
Member

Can you give an example for (1)?

For corpus tagging, I believe we do need to have special attention in the improvement of the glosses. making hard senses appear in the glosses can give opportunity for future annotation of these glosses.

@arademaker
Copy link
Member

I believe we will also have to discuss if we want to keep all PWN original motivations and linguistics decisions or if we are willing to adopt different strategies.

For instance, some of the systematic polysemy are expected and accepted as a consequence of the PWN structure in the original 5 papers. On the other hand, we now have experience from other wordnets such as the German and Polish. Maybe other relations and models are possible. German do not follow the cluster model for adjectives for example.

@jmccrae
Copy link
Member Author

jmccrae commented Jan 2, 2020

For (1), a simple example would be that one sense of "bank" may collocate with "river", "stream", while another sense may collocate with another may collocate with "merchant", "statement", "account". You can then detect two distinct clusters using metrics such as PMI.

I don't think we should fully diverge from PWN unless we have strong evidence that how PWN is performing it is poor (e.g., "satellite adjectives" are not a category that mixes well with the literature) or PWN doesn't have a fixed principal to follow (e.g., which I think is the case for systematic polysemy).

@dcillessen
Copy link

Could we look at other WN projects for instances of polysemy that may have migrated to English WN? Perhaps we could also find relevant information using translation software, or dictionaries geared towards describing English as a foreign language.

@rwingerter55
Copy link

[Off topic] Learning from other wordnets does not necessarily mean we have to diverge from PWN. EuroWordNet's top ontology is an enhancement to the PWN semantic fields and is fully compatible with it.

@rwingerter55
Copy link

@jmccrae, in Issue #445 I followed your suggestion to consult dictionaries. It worked well. This way I could identify a sense of "event" that was present in most dictionaries but not in EWN.

@arademaker
Copy link
Member

arademaker commented Nov 30, 2020

Hi @rwingerter55 , the problem is that dictionaries can differ. What dictionary will have priority? If we adopt the majority approach, we need a fixed list of dictionaries? Will we need to define which makes a dictionary a valid source? I am just thinking about how hard it can be to adopt this criterion in a large.

@jmccrae jmccrae modified the milestones: 2021 Release, 2022 Release Jul 14, 2021
@jmccrae
Copy link
Member Author

jmccrae commented Jul 14, 2021

I am writing a paper on this issue... so there may be some more concrete procedures for the project here

@jmccrae jmccrae removed this from the 2022 Release milestone Aug 16, 2022
@jmccrae
Copy link
Member Author

jmccrae commented May 24, 2023

Note the paper I refer to was published here: https://www.frontiersin.org/articles/10.3389/frai.2022.745626/full

Not sure it solves the issues above though in the end (see next message)

@jmccrae
Copy link
Member Author

jmccrae commented May 24, 2023

I have a proposal for making sense distinctions here: https://github.com/globalwordnet/english-wordnet/blob/issue-243/SYNSET_MERGING.md

Merging and creating new synsets

This document describes procedures in Open English WordNet for merging synsets and for
deducing if there is a need to create a new synset, for a new sense of a word.

Synsets that share a lemma

In the case that we are considering merging two synsets that share a lemma or for the
case of introducing a novel synset, the principle method of inferring if there is a novel
synset is based on graph positions. The graph position is defined by the characteristic
links of the synset, which are as follows

Two synsets with different positions in the graph should not be merged. For example,
similar definitions but clearly distinct hypernyms would not be merged.

An example of a merge based on these properties is given by Issue #911

If it is decided that no merge is necessary, we should normally update definitions
or the characteristic links to make the sense distinction clearer.

Synsets that don't share a lemma

In the case that the synsets don't share a lemma, we are also claiming that there
is synonymy between all the words of the synset. The steps we take to verify this
are as follows

  1. Verify that the synsets would have the same characteristic links (see above)
  2. Collect at least 3 examples for each of these synsets. This can be done by
    using the CoCA corpus and finding
    the first 3 matching examples that fit with this sense
  3. Check that all lemmas can be substituted in all cases without substantial change in meaning

For example Issue #750

An example of 'self-serving' was found in the corpus

the self-serving and greedy Daffy Duck

We substitute with the candidate merge lemma:

the selfish and greedy Daffy Duck

This does not seem to substantially change the meaning so we merged these synsets

@jmccrae jmccrae mentioned this issue Aug 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants