Skip to content

Commit

Permalink
search for words which have the feature/misc name with any value
Browse files Browse the repository at this point in the history
  • Loading branch information
jheinecke committed Jul 20, 2023
1 parent 2b0c982 commit df15c72
Show file tree
Hide file tree
Showing 12 changed files with 1,410 additions and 25 deletions.
4 changes: 4 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# Changes

## Version 2.22.4
* `Feat:VerbForm=` or `Misc:SpaceAfter=` (no value given) searches for words which have the feature/misc name with any value (use `not Feat:VerbForm:` to look for words which do not have the given feature)
* new tests

## Version 2.22.3
* bug concerning --rootdir corrected

Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ The editor provides the following functionalities:
* adding Translit= values to the MISC column (transliterating the FORM column) see section [Transliteration](#transliteration)
* finding similar or identical sentence in a list of CoNLL-U files, see section [Find Similar Sentences](#find-similar-sentences)

Current version: 2.22.3 (see [change history](CHANGES.md))
Current version: 2.22.4 (see [change history](CHANGES.md))

ConlluEditor can also be used as front-end to display the results of dependency parsing in the same way as the editor.
* dependency tree/dependency hedge
Expand Down Expand Up @@ -327,7 +327,7 @@ In order to create a multiword token, use the `compose <wordid> <length>`
command. Click on the multiword token bar (at the bottom of the dependency
tree/graph to open a dialogue which allows to edit or delete the token (i.e. the `n-m` line).

All operations which change the tokenisation of the sentence will create a `incoherent # text and forms` warning. This is because the è# text = ....`
All operations which change the tokenisation of the sentence will create a `incoherent # text and forms` warning. This is because the è# text = ....`
metadata must be coherent with the concatenation of forms (taken into account `SpacesAfter`/`SpaceAfter` fields in the MISC column.
Unless earlier versions, the `# text ...` is no longer updated automatically, but must be adapted manually using the `edit metadata` button.

Expand Down Expand Up @@ -498,7 +498,7 @@ see [Mass Editing](doc/mass_editing.md)

## Metadata editing

The CoNLL-U format provides some special comment lines to indicate whether the current sentence is the beginning of a new document, new paragraph,
The CoNLL-U format provides some special comment lines to indicate whether the current sentence is the beginning of a new document, new paragraph,
the sentence itself, as well as its sentence id, translations (mostly into English) or transliterations.
Clicking on `edit metadata` opens the Metadata dialogue.
For translations, the translations must be prefixed with the language code as shown in the screen shot.
Expand Down
8 changes: 6 additions & 2 deletions doc/mass_editing.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,13 @@ Examples:
* `IsEmpty` (no value, true if the current node is empty)
* `IsMWT` (no value, true if the current node is a MWT)

`Form:`, `Lemma:` and `Xpos:` can contain simple regular expression (only the character ')' cannot be used
`Form:`, `Lemma:` and `Xpos:` can contain simple regular expression (only the character ')' cannot be used.

To check for any Feat or Misc value, leave the value empty:
* `Feat:Gender:` true if the current word has the feature `Gender` with any value

In order to check for the absence of a given Featurename in the Feature or Misc column, use the following:
* `Feat:Gender:` true if the cyurrent word has no feature `Gender`
* `not Feat:Gender:` true if the current word has no feature `Gender`

`EUD` cannot deal (yet) with empty word ids (`n.m`)

Expand Down
15 changes: 9 additions & 6 deletions gui/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
<!--
This library is under the 3-Clause BSD License
Copyright (c) 2018-2022, Orange S.A.
Copyright (c) 2018-2023, Orange S.A.
Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:
Expand Down Expand Up @@ -31,7 +31,7 @@
THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
author Johannes Heinecke
version 2.20.0 as of 13th December 2022
version 2.22.4 as of 20th July 2023
-->

<head>
Expand Down Expand Up @@ -225,7 +225,7 @@
<button class="editbuttons mybutton" id="replacego">search & replace</button>
</div>
</td></tr>

<tr id="grewmatchsearchandreplace"><td>
<div>
<input class="inputfield" placeholder="pattern {N [upos=NOUN]} without {V -[nsubj]-> N}" title="grewmatch pattern" size="70" type="text" id="grewmatchsearchexpression" value="">
Expand Down Expand Up @@ -513,9 +513,12 @@ <h3>Complex search</h3>
</ul>
In order to check for the absence of a given Featurename in the Feature or Misc column, use the following:
<ul>
<li><span class="userinputdoc">Feat:Gender:</span> true if the cyurrent word has no feature <span class="userinputdoc">Gender</span>
<li><span class="userinputdoc">not Feat:Gender:</span> true if the current word has no feature <span class="userinputdoc">Gender</span>.
</ul>
In order to check any Feature (more Misc) value, do not specify the value:
<ul>
<li><span class="userinputdoc">Feat:Gender:</span> true if the current word has feature <span class="userinputdoc">Gender</span> with any value.
</ul>

In addition to key keys listed above, four functions are available to take the context of the token into account:
<ul>
<li> <span class="userinputdoc">child()</span> child of current token</li>
Expand Down Expand Up @@ -546,7 +549,7 @@ <h3>Complex search</h3>
<li> <span class="userinputdoc">@Deprel=prec(@Deprel)</span>: true, if the current word and the preceding word have the same <span class="userinputdoc">deprel</span> value
<li> <span class="userinputdoc">@Xpos=head(head(@Feat:Featname))</span> true if the `XPOS` of the current word has the same value as the feature <span class="userinputdoc">Featname</span> of the head of its head.
<li> <span class="userinputdoc">@Feat:Gender=head(@Feat:Gender) and not Upos:DET</span> true if the head and the current word have the same value for the feature <span class="userinputdoc">Gender</span> and the current word is not a <span class="userinputdoc">DET</span>

</ul>

<h3>Search and replace</h3>

Expand Down
4 changes: 2 additions & 2 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -32,13 +32,13 @@
THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
author Johannes Heinecke
version 2.22.3 as of 4th June 2023
version 2.22.4 as of 20th July 2023
-->

<modelVersion>4.0.0</modelVersion>
<groupId>com.orange.labs</groupId>
<artifactId>ConlluEditor</artifactId>
<version>2.22.3</version>
<version>2.22.4</version>
<packaging>jar</packaging>

<properties>
Expand Down
32 changes: 21 additions & 11 deletions src/main/java/com/orange/labs/conllparser/CEvalVisitor.java
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
@author Johannes Heinecke
@version 2.18.1 as of 29th October 2022
@version 2.22.4 as of 20th July 2023
*/
package com.orange.labs.conllparser;

Expand Down Expand Up @@ -305,10 +305,15 @@ public Boolean visitCheckFeat(ConditionsParser.CheckFeatContext ctx) {
}
boolean rtc ;
if (fv.length == 2) {
rtc = use.matchesFeatureValue(fv[0], fv[1]);
if ("!".equals(fv[1])) {
// feature must not be in word
rtc = !use.getFeatures().containsKey(fv[0]);
} else {
rtc = use.matchesFeatureValue(fv[0], fv[1]);
}
} else {
// feature must not be in word
rtc = !use.getFeatures().containsKey(fv[0]);
// mist must be in word with any key
rtc = use.getFeatures().containsKey(fv[0]);
}
return rtc;
}
Expand All @@ -323,10 +328,15 @@ public Boolean visitCheckMisc(ConditionsParser.CheckMiscContext ctx) {
}
boolean rtc;
if (fv.length == 2) {
rtc = use.matchesMiscValue(fv[0], fv[1]);
if ("!".equals(fv[1])) {
// misc must not be in word
rtc = !use.getMisc().containsKey(fv[0]);
} else {
rtc = use.matchesMiscValue(fv[0], fv[1]);
}
} else {
// feature must not be in word
rtc = !use.getMisc().containsKey(fv[0]);
// mist must be in word with any key
rtc = use.getMisc().containsKey(fv[0]);
}
return rtc;
}
Expand Down Expand Up @@ -459,7 +469,7 @@ public Boolean visitOder(ConditionsParser.OderContext ctx) {
@Override
public Boolean visitValcompare(ConditionsParser.ValcompareContext ctx) {
CGetVisitor getvisitor = new CGetVisitor(cword, wordlists);

//System.err.println("GET VALUES FOR COMPARISON");
//String left = getvisitor.visit(ctx.columnname(0)); // get value of left columnname
//String right = getvisitor.visit(ctx.columnname(1)); // get value of right columnname
Expand All @@ -480,12 +490,12 @@ public Boolean visitValcompare(ConditionsParser.ValcompareContext ctx) {
}
return false;
}


@Override
public Boolean visitValcompatible(ConditionsParser.ValcompatibleContext ctx) {
CGetVisitor getvisitor = new CGetVisitor(cword, wordlists);

//System.err.println("GET VALUES FOR COMPATIBILITY");
//String left = getvisitor.visit(ctx.columnname(0)); // get value of left columnname
//String right = getvisitor.visit(ctx.columnname(1)); // get value of right columnname
Expand Down
2 changes: 1 addition & 1 deletion src/main/java/com/orange/labs/conllparser/ConllWord.java
Original file line number Diff line number Diff line change
Expand Up @@ -1547,7 +1547,7 @@ public boolean hasFeature(String name) {
}
return features.containsKey(name);
}

// check whether feature with value is present
public boolean hasFeature(String name, String val) {
if (features.isEmpty()) {
Expand Down
24 changes: 24 additions & 0 deletions src/test/java/TestConllFile.java
Original file line number Diff line number Diff line change
Expand Up @@ -453,6 +453,30 @@ public void test20repl07() throws IOException, ConllException {
applyRule("Upos:VERB", "feat:\"InlfClass=\"+this(Feat_Number) feat:\"Number=\"", "rule26.conllu");
}

@Test
public void test20repl08() throws IOException, ConllException {
name("repl 08");
applyRule("Feat:VerbForm:Inf", "misc:\"VF=INF\"", "rule27.conllu");
}

@Test
public void test20repl09() throws IOException, ConllException {
name("repl 09");
applyRule("Feat:VerbForm:", "misc:\"VF=ANY\"", "rule28.conllu");
}

@Test
public void test20repl10() throws IOException, ConllException {
name("repl 09");
applyRule("not Feat:Number:", "misc:\"NoNumber=True\"", "rule30.conllu");
}

@Test
public void test20repl11() throws IOException, ConllException {
name("repl 11");
applyRule("Misc:SpaceAfter:", "feat:\"SPACEAFTER=ANY\"", "rule29.conllu");
}

@Test
public void test21value01() throws IOException, ConllException {
name("value 01");
Expand Down
Loading

0 comments on commit df15c72

Please sign in to comment.