Skip to content

Commit

Permalink
check for absent features/misc-keys
Browse files Browse the repository at this point in the history
  • Loading branch information
Johannes Heinecke committed Oct 29, 2022
1 parent 0f091c4 commit d631879
Show file tree
Hide file tree
Showing 6 changed files with 31 additions and 13 deletions.
1 change: 1 addition & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
* more tests for value access
* added compatibility test `~` in addition to strict comparison with `=`
* validator shortcut changed from `=` to `!`
* check for absent features/misc-keys

## Version 2.18.0
* extension to mass-edit/complex search&replace: possibility to search heads/childs etc with same Feature value or same UPOS etc
Expand Down
7 changes: 4 additions & 3 deletions doc/mass_editing.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,17 +30,18 @@ Examples:
* `Id:` (Values: integer)
* `MWT:` (Values: length of the multi-word token `[2-9]`)
* `IsEmpty` (no value, true if the current node is empty)
* `IsMWT` (no value, true if the current node is empty)
* `IsMWT` (no value, true if the current node is a MWT)

`Form:`, `Lemma:` and `Xpos:` can contain simple regular expression (only the character ')' cannot be used
In order to check for the absence of a given Featurename in the Feature or Misc column, use the following:
* `Feat:Gender:` true if the cyurrent word has no feature `Gender`

`EUD` cannot deal (yet) with empty word ids (`n.m`)

`Lemma` and `Form` can have either a regex as argument or a filename of a file which contains a list of forms or lemmas:
`Lemma` and `Form` can have either a regex as argument or a filename of a file which contains a list of forms or lemmas:
* `Lemma:sing.* > misc:"Value=Sing"`
* `Lemma:#mylemmas.txt > misc:"Value=Sing"` (if the file `mylemmas.txt` does not exist, the condition is false)


In addition to key keys listed above, four functions are available to take the context of the token into account:
* `child()` child of current token
* `head()` head of current token
Expand Down
6 changes: 5 additions & 1 deletion gui/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -489,7 +489,11 @@ <h3>Complex search</h3>
<li> <span class="userinputdoc">IsEmpty</span> (no value, true if the current node is empty)</li>
<li> <span class="userinputdoc">IsMWT</span> (no value, true if the current node is a multi token word)</li>
</ul>

In order to check for the absence of a given Featurename in the Feature or Misc column, use the following:
<ul>
<li><span class="userinputdoc">Feat:Gender:</span> true if the cyurrent word has no feature <span class="userinputdoc">Gender</span>
</ul>

In addition to key keys listed above, four functions are available to take the context of the token into account:
<ul>
<li> <span class="userinputdoc">child()</span> child of current token</li>
Expand Down
8 changes: 4 additions & 4 deletions src/main/antlr4/com/orange/labs/conllparser/Conditions.g4
Original file line number Diff line number Diff line change
Expand Up @@ -85,22 +85,22 @@ field



UPOS : 'Upos:' [A-Z]+ ;
UPOS : 'Upos:' [A-Z_]+ ;
LEMMA : 'Lemma:' ~[ \n\t)&|]+ ;
FORM : 'Form:' ~[ \n\t)&|]+ ;
XPOS : 'Xpos:' ~[ \n\t)&|]+ ;
//DEPREL : 'Deprel:' [a-z]+( ':' ~[ \n\t)&|]+)? ;
DEPREL : 'Deprel:' ~[ \n\t)&|]+( ':' ~[ \n\t)&|]+)? ;
FEAT : 'Feat:' [A-Za-z_[\]]+ [:=] [A-Za-z0-9]+ ;
MISC : 'Misc:' [A-Za-z_]+ [:=] ~[ \n\t)&|]+ ;
FEAT : 'Feat:' [A-Za-z_[\]]+ [:=] [A-Za-z0-9]* ;
MISC : 'Misc:' [A-Za-z_]+ [:=] ~[ \n\t)&|]* ;
ID : 'Id:' [1-9][0-9]* ; // no "n.m" nor "n-m" yet
MWT : 'MWT:' [2-9] ; // length of a MWT in tokens
HEADID : 'HeadId:' [+-]?[0-9]+ ;
RELEUD : 'EUD:' ([+-][0-9]+) [:=] [a-z]+( ':' ~[ \n\t)&|]+)? ;
ABSEUD : 'EUD:' ([0-9]+|'*') [:=] [a-z]+( ':' ~[ \n\t)&|]+)? ;

ISEMPTY : 'IsEmpty' ; // emptyword
ISMWT : 'IsMWT' ; // multi word tokenemptyword
ISMWT : 'IsMWT' ; // multi word token

AND : 'and' | '&&' ;
OR : 'or' | '||' ;
Expand Down
18 changes: 15 additions & 3 deletions src/main/java/com/orange/labs/conllparser/CEvalVisitor.java
Original file line number Diff line number Diff line change
Expand Up @@ -303,19 +303,31 @@ public Boolean visitCheckFeat(ConditionsParser.CheckFeatContext ctx) {
if (use == null) {
return false;
}
boolean rtc = use.matchesFeatureValue(fv[0], fv[1]);
boolean rtc ;
if (fv.length == 2) {
rtc = use.matchesFeatureValue(fv[0], fv[1]);
} else {
// feature must not be in word
rtc = !use.getFeatures().containsKey(fv[0]);
}
return rtc;
}

@Override
public Boolean visitCheckMisc(ConditionsParser.CheckMiscContext ctx) {
String text = ctx.MISC().getText();
String[] fv = text.substring(5).split("=");
String[] fv = text.substring(5).split("[:=]");
ConllWord use = getCW();
if (use == null) {
return false;
}
boolean rtc = use.matchesMiscValue(fv[0], fv[1]);
boolean rtc;
if (fv.length == 2) {
rtc = use.matchesMiscValue(fv[0], fv[1]);
} else {
// feature must not be in word
rtc = !use.getMisc().containsKey(fv[0]);
}
return rtc;
}

Expand Down
4 changes: 2 additions & 2 deletions src/main/java/com/orange/labs/conllparser/ConllWord.java
Original file line number Diff line number Diff line change
Expand Up @@ -1441,7 +1441,7 @@ public boolean matchCondition(String condition, Map<String, Set<String>> wordlis
//System.err.println("\n\nEVAL " + this);
return CheckConditions.evaluate(condition, wordlists, this, false); // debug: show tokenisation of condition
} catch (Exception e) {
//e.printStackTrace();
e.printStackTrace();
throw new ConllException(e.getMessage());
}
}
Expand Down Expand Up @@ -1539,7 +1539,7 @@ public boolean anyFeatures() {
return !features.isEmpty();
}

// check whether featyre with value is present
// check whether feature with value is present
public boolean hasFeature(String name, String val) {
if (features.isEmpty()) {
return false;
Expand Down

0 comments on commit d631879

Please sign in to comment.