diff --git a/CHANGES.md b/CHANGES.md index 052dcd4..76b1303 100644 --- a/CHANGES.md +++ b/CHANGES.md @@ -1,5 +1,9 @@ # Changes +## Version 2.22.4 +* `Feat:VerbForm=` or `Misc:SpaceAfter=` (no value given) searches for words which have the feature/misc name with any value (use `not Feat:VerbForm:` to look for words which do not have the given feature) +* new tests + ## Version 2.22.3 * bug concerning --rootdir corrected diff --git a/README.md b/README.md index 972bf5f..5c85ad1 100644 --- a/README.md +++ b/README.md @@ -26,7 +26,7 @@ The editor provides the following functionalities: * adding Translit= values to the MISC column (transliterating the FORM column) see section [Transliteration](#transliteration) * finding similar or identical sentence in a list of CoNLL-U files, see section [Find Similar Sentences](#find-similar-sentences) -Current version: 2.22.3 (see [change history](CHANGES.md)) +Current version: 2.22.4 (see [change history](CHANGES.md)) ConlluEditor can also be used as front-end to display the results of dependency parsing in the same way as the editor. * dependency tree/dependency hedge @@ -327,7 +327,7 @@ In order to create a multiword token, use the `compose ` command. Click on the multiword token bar (at the bottom of the dependency tree/graph to open a dialogue which allows to edit or delete the token (i.e. the `n-m` line). -All operations which change the tokenisation of the sentence will create a `incoherent # text and forms` warning. This is because the è# text = ....` +All operations which change the tokenisation of the sentence will create a `incoherent # text and forms` warning. This is because the è# text = ....` metadata must be coherent with the concatenation of forms (taken into account `SpacesAfter`/`SpaceAfter` fields in the MISC column. Unless earlier versions, the `# text ...` is no longer updated automatically, but must be adapted manually using the `edit metadata` button. @@ -498,7 +498,7 @@ see [Mass Editing](doc/mass_editing.md) ## Metadata editing -The CoNLL-U format provides some special comment lines to indicate whether the current sentence is the beginning of a new document, new paragraph, +The CoNLL-U format provides some special comment lines to indicate whether the current sentence is the beginning of a new document, new paragraph, the sentence itself, as well as its sentence id, translations (mostly into English) or transliterations. Clicking on `edit metadata` opens the Metadata dialogue. For translations, the translations must be prefixed with the language code as shown in the screen shot. diff --git a/doc/mass_editing.md b/doc/mass_editing.md index 2137640..1827523 100644 --- a/doc/mass_editing.md +++ b/doc/mass_editing.md @@ -32,9 +32,13 @@ Examples: * `IsEmpty` (no value, true if the current node is empty) * `IsMWT` (no value, true if the current node is a MWT) -`Form:`, `Lemma:` and `Xpos:` can contain simple regular expression (only the character ')' cannot be used +`Form:`, `Lemma:` and `Xpos:` can contain simple regular expression (only the character ')' cannot be used. + +To check for any Feat or Misc value, leave the value empty: + * `Feat:Gender:` true if the current word has the feature `Gender` with any value + In order to check for the absence of a given Featurename in the Feature or Misc column, use the following: - * `Feat:Gender:` true if the cyurrent word has no feature `Gender` + * `not Feat:Gender:` true if the current word has no feature `Gender` `EUD` cannot deal (yet) with empty word ids (`n.m`) diff --git a/gui/index.html b/gui/index.html index c31571b..d079012 100644 --- a/gui/index.html +++ b/gui/index.html @@ -3,7 +3,7 @@ @@ -225,7 +225,7 @@ - +
@@ -513,9 +513,12 @@

Complex search

In order to check for the absence of a given Featurename in the Feature or Misc column, use the following:
    -
  • Feat:Gender: true if the cyurrent word has no feature Gender +
  • not Feat:Gender: true if the current word has no feature Gender. +
+ In order to check any Feature (more Misc) value, do not specify the value: +
    +
  • Feat:Gender: true if the current word has feature Gender with any value.
- In addition to key keys listed above, four functions are available to take the context of the token into account:
  • child() child of current token
  • @@ -546,7 +549,7 @@

    Complex search

  • @Deprel=prec(@Deprel): true, if the current word and the preceding word have the same deprel value
  • @Xpos=head(head(@Feat:Featname)) true if the `XPOS` of the current word has the same value as the feature Featname of the head of its head.
  • @Feat:Gender=head(@Feat:Gender) and not Upos:DET true if the head and the current word have the same value for the feature Gender and the current word is not a DET - +

Search and replace

diff --git a/pom.xml b/pom.xml index 3464233..c55b671 100644 --- a/pom.xml +++ b/pom.xml @@ -32,13 +32,13 @@ THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. author Johannes Heinecke - version 2.22.3 as of 4th June 2023 + version 2.22.4 as of 20th July 2023 --> 4.0.0 com.orange.labs ConlluEditor - 2.22.3 + 2.22.4 jar diff --git a/src/main/java/com/orange/labs/conllparser/CEvalVisitor.java b/src/main/java/com/orange/labs/conllparser/CEvalVisitor.java index ae2d2b6..4fc4308 100644 --- a/src/main/java/com/orange/labs/conllparser/CEvalVisitor.java +++ b/src/main/java/com/orange/labs/conllparser/CEvalVisitor.java @@ -28,7 +28,7 @@ THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. @author Johannes Heinecke - @version 2.18.1 as of 29th October 2022 + @version 2.22.4 as of 20th July 2023 */ package com.orange.labs.conllparser; @@ -305,10 +305,15 @@ public Boolean visitCheckFeat(ConditionsParser.CheckFeatContext ctx) { } boolean rtc ; if (fv.length == 2) { - rtc = use.matchesFeatureValue(fv[0], fv[1]); + if ("!".equals(fv[1])) { + // feature must not be in word + rtc = !use.getFeatures().containsKey(fv[0]); + } else { + rtc = use.matchesFeatureValue(fv[0], fv[1]); + } } else { - // feature must not be in word - rtc = !use.getFeatures().containsKey(fv[0]); + // mist must be in word with any key + rtc = use.getFeatures().containsKey(fv[0]); } return rtc; } @@ -323,10 +328,15 @@ public Boolean visitCheckMisc(ConditionsParser.CheckMiscContext ctx) { } boolean rtc; if (fv.length == 2) { - rtc = use.matchesMiscValue(fv[0], fv[1]); + if ("!".equals(fv[1])) { + // misc must not be in word + rtc = !use.getMisc().containsKey(fv[0]); + } else { + rtc = use.matchesMiscValue(fv[0], fv[1]); + } } else { - // feature must not be in word - rtc = !use.getMisc().containsKey(fv[0]); + // mist must be in word with any key + rtc = use.getMisc().containsKey(fv[0]); } return rtc; } @@ -459,7 +469,7 @@ public Boolean visitOder(ConditionsParser.OderContext ctx) { @Override public Boolean visitValcompare(ConditionsParser.ValcompareContext ctx) { CGetVisitor getvisitor = new CGetVisitor(cword, wordlists); - + //System.err.println("GET VALUES FOR COMPARISON"); //String left = getvisitor.visit(ctx.columnname(0)); // get value of left columnname //String right = getvisitor.visit(ctx.columnname(1)); // get value of right columnname @@ -480,12 +490,12 @@ public Boolean visitValcompare(ConditionsParser.ValcompareContext ctx) { } return false; } - - + + @Override public Boolean visitValcompatible(ConditionsParser.ValcompatibleContext ctx) { CGetVisitor getvisitor = new CGetVisitor(cword, wordlists); - + //System.err.println("GET VALUES FOR COMPATIBILITY"); //String left = getvisitor.visit(ctx.columnname(0)); // get value of left columnname //String right = getvisitor.visit(ctx.columnname(1)); // get value of right columnname diff --git a/src/main/java/com/orange/labs/conllparser/ConllWord.java b/src/main/java/com/orange/labs/conllparser/ConllWord.java index 80ce92a..dd004d8 100644 --- a/src/main/java/com/orange/labs/conllparser/ConllWord.java +++ b/src/main/java/com/orange/labs/conllparser/ConllWord.java @@ -1547,7 +1547,7 @@ public boolean hasFeature(String name) { } return features.containsKey(name); } - + // check whether feature with value is present public boolean hasFeature(String name, String val) { if (features.isEmpty()) { diff --git a/src/test/java/TestConllFile.java b/src/test/java/TestConllFile.java index b2aac22..d519872 100644 --- a/src/test/java/TestConllFile.java +++ b/src/test/java/TestConllFile.java @@ -453,6 +453,30 @@ public void test20repl07() throws IOException, ConllException { applyRule("Upos:VERB", "feat:\"InlfClass=\"+this(Feat_Number) feat:\"Number=\"", "rule26.conllu"); } + @Test + public void test20repl08() throws IOException, ConllException { + name("repl 08"); + applyRule("Feat:VerbForm:Inf", "misc:\"VF=INF\"", "rule27.conllu"); + } + + @Test + public void test20repl09() throws IOException, ConllException { + name("repl 09"); + applyRule("Feat:VerbForm:", "misc:\"VF=ANY\"", "rule28.conllu"); + } + + @Test + public void test20repl10() throws IOException, ConllException { + name("repl 09"); + applyRule("not Feat:Number:", "misc:\"NoNumber=True\"", "rule30.conllu"); + } + + @Test + public void test20repl11() throws IOException, ConllException { + name("repl 11"); + applyRule("Misc:SpaceAfter:", "feat:\"SPACEAFTER=ANY\"", "rule29.conllu"); + } + @Test public void test21value01() throws IOException, ConllException { name("value 01"); diff --git a/src/test/resources/rule27.conllu b/src/test/resources/rule27.conllu new file mode 100644 index 0000000..47af389 --- /dev/null +++ b/src/test/resources/rule27.conllu @@ -0,0 +1,335 @@ +# sent_id = fr-ud-dev_00001 +# text = Aviator, un film sur la vie de Howard Hughes. +# sentence 0 +1 Aviator Aviator PROPN _ _ 0 root _ SpaceAfter=No +2 , , PUNCT _ _ 1 punct _ _ +3 un un DET _ Definite=Ind|Gender=Masc|Number=Sing|PronType=Art 4 det _ _ +4 film film NOUN _ Gender=Masc|Number=Sing 1 appos _ _ +5 sur sur ADP _ _ 7 case _ _ +6 la le DET _ Definite=Def|Gender=Fem|Number=Sing|PronType=Art 7 det _ _ +7 vie vie NOUN _ Gender=Fem|Number=Sing 4 nmod _ _ +8 de de ADP _ _ 9 case _ _ +9 Howard Howard PROPN _ _ 7 nmod _ _ +10 Hughes Hughes PROPN _ _ 9 flat:name _ SpaceAfter=No +11 . . PUNCT _ _ 1 punct _ _ + +# sent_id = fr-ud-dev_00002 +# text = Les études durent six ans mais leur contenu diffère donc selon les Facultés. +# sentence 1 +1 Les le DET _ Definite=Def|Gender=Fem|Number=Plur|PronType=Art 2 det _ _ +2 études étude NOUN _ Gender=Fem|Number=Plur 3 nsubj _ _ +3 durent durer VERB _ Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin 0 root _ _ +4 six six NUM _ _ 5 nummod _ _ +5 ans an NOUN _ Gender=Masc|Number=Plur 3 obj _ _ +6 mais mais CCONJ _ _ 9 cc _ _ +7 leur son DET _ Gender=Masc|Number=Sing|PronType=Prs 8 nmod:poss _ _ +8 contenu contenu NOUN _ Gender=Masc|Number=Sing 9 nsubj _ _ +9 diffère différer VERB _ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 3 conj _ _ +10 donc donc ADV _ _ 9 advmod _ _ +11 selon selon ADP _ _ 13 case _ _ +12 les le DET _ Definite=Def|Number=Plur|PronType=Art 13 det _ _ +13 Facultés Facultés PROPN _ _ 9 obl _ SpaceAfter=No +14 . . PUNCT _ _ 3 punct _ _ + +# sent_id = fr-ud-dev_00003 +# text = Mais comment faire dans un contexte structurellement raciste ? +# sentence 2 +1 Mais mais CCONJ _ _ 3 cc _ _ +2 comment comment ADV _ _ 3 advmod _ _ +3 faire faire VERB _ VerbForm=Inf 0 root _ VF=INF +4 dans dans ADP _ _ 6 case _ _ +5 un un DET _ Definite=Ind|Gender=Masc|Number=Sing|PronType=Art 6 det _ _ +6 contexte contexte NOUN _ Gender=Masc|Number=Sing 3 obl _ _ +7 structurellement structurellement ADV _ _ 8 advmod _ _ +8 raciste raciste ADJ _ Gender=Masc|Number=Sing 6 amod _ _ +9 ? ? PUNCT _ _ 3 punct _ _ + +# sent_id = fr-ud-dev_00004 +# text = L'« oasis de vie », dans un milieu où règne l'obscurité totale et une pression hydrostatique importante, est riche et varié : les chercheurs y découvrent de nouvelles espèces de bivalves, de poissons, de crustacés, de poulpes dans des zones pensées jusqu'alors désertiques. +# sentence 3 +1 L' le DET _ Definite=Def|Gender=Fem|Number=Sing|PronType=Art 3 det _ SpaceAfter=No +2 « « PUNCT _ _ 3 punct _ _ +3 oasis oasis NOUN _ Gender=Fem|Number=Sing 23 nsubj _ _ +4 de de ADP _ _ 5 case _ _ +5 vie vie NOUN _ Gender=Fem|Number=Sing 3 nmod _ _ +6 » » PUNCT _ _ 3 punct _ SpaceAfter=No +7 , , PUNCT _ _ 3 punct _ _ +8 dans dans ADP _ _ 10 case _ _ +9 un un DET _ Definite=Ind|Gender=Masc|Number=Sing|PronType=Art 10 det _ _ +10 milieu milieu NOUN _ Gender=Masc|Number=Sing 23 obl _ _ +11 où où PRON _ PronType=Rel 12 obl _ _ +12 règne régner VERB _ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 10 acl:relcl _ _ +13 l' le DET _ Definite=Def|Gender=Fem|Number=Sing|PronType=Art 14 det _ SpaceAfter=No +14 obscurité obscurité NOUN _ Gender=Fem|Number=Sing 12 nsubj _ _ +15 totale total ADJ _ Gender=Fem|Number=Sing 14 amod _ _ +16 et et CCONJ _ _ 18 cc _ _ +17 une un DET _ Definite=Ind|Gender=Fem|Number=Sing|PronType=Art 18 det _ _ +18 pression pression NOUN _ Gender=Fem|Number=Sing 14 conj _ _ +19 hydrostatique hydrostatique ADJ _ Gender=Fem|Number=Sing 18 amod _ _ +20 importante important ADJ _ Gender=Fem|Number=Sing 18 amod _ SpaceAfter=No +21 , , PUNCT _ _ 23 punct _ _ +22 est être AUX _ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 23 cop _ _ +23 riche riche ADJ _ Gender=Fem|Number=Sing 0 root _ _ +24 et et CCONJ _ _ 25 cc _ _ +25 varié varié ADJ _ Gender=Masc|Number=Sing 23 conj _ _ +26 : : PUNCT _ _ 23 punct _ _ +27 les le DET _ Definite=Def|Gender=Masc|Number=Plur|PronType=Art 28 det _ _ +28 chercheurs chercheur NOUN _ Gender=Masc|Number=Plur 30 nsubj _ _ +29 y y PRON _ _ 30 advmod _ _ +30 découvrent découvrir VERB _ Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin 23 parataxis _ _ +31 de un DET _ Definite=Ind|Gender=Fem|Number=Plur|PronType=Art 33 det _ _ +32 nouvelles nouveau ADJ _ Gender=Fem|Number=Plur 33 amod _ _ +33 espèces espèce NOUN _ Gender=Fem|Number=Plur 30 obj _ _ +34 de de ADP _ _ 35 case _ _ +35 bivalves bivalve NOUN _ Gender=Masc|Number=Plur 33 nmod _ SpaceAfter=No +36 , , PUNCT _ _ 38 punct _ _ +37 de de ADP _ _ 38 case _ _ +38 poissons poisson NOUN _ Gender=Masc|Number=Plur 35 conj _ SpaceAfter=No +39 , , PUNCT _ _ 41 punct _ _ +40 de de ADP _ _ 41 case _ _ +41 crustacés crustacé NOUN _ Gender=Masc|Number=Plur 35 conj _ SpaceAfter=No +42 , , PUNCT _ _ 44 punct _ _ +43 de de ADP _ _ 44 case _ _ +44 poulpes poulpe NOUN _ Gender=Masc|Number=Plur 35 conj _ _ +45 dans dans ADP _ _ 47 case _ _ +46 des un DET _ Definite=Ind|Gender=Fem|Number=Plur|PronType=Art 47 det _ _ +47 zones zone NOUN _ Gender=Fem|Number=Plur 30 obl _ _ +48 pensées penser VERB _ Gender=Fem|Number=Plur|Tense=Past|VerbForm=Part 47 acl _ _ +49 jusqu' jusque ADP _ _ 50 case _ SpaceAfter=No +50 alors alors ADV _ _ 48 advmod _ _ +51 désertiques désertique ADJ _ Gender=Fem|Number=Plur 48 amod _ SpaceAfter=No +52 . . PUNCT _ _ 23 punct _ _ + +# sent_id = fr-ud-train_00002 +# text = L'œuvre est située dans la galerie des batailles, dans le château de Versailles. +# sentence 4 +1 L' le DET _ Definite=Def|Gender=Fem|Number=Sing|PronType=Art 2 det _ SpaceAfter=No +2 œuvre œuvre NOUN _ Gender=Fem|Number=Sing 4 nsubj _ _ +3 est être AUX _ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 4 aux _ _ +4 située situer VERB _ Gender=Fem|Number=Sing|Tense=Past|VerbForm=Part 0 root _ SpaceAfter=\s\t\s +5 dans dans ADP _ _ 7 case _ _ +6 la le DET _ Definite=Def|Gender=Fem|Number=Sing|PronType=Art 7 det _ SpaceAfter=  +7 galerie galerie NOUN _ Gender=Fem|Number=Sing 4 obl _ _ +8-9 des _ _ _ _ _ _ _ _ +8 de de ADP _ _ 10 case _ _ +9 les le DET _ Definite=Def|Gender=Fem|Number=Plur|PronType=Art 10 det _ _ +10 batailles bataille NOUN _ Gender=Fem|Number=Plur 7 nmod _ SpaceAfter=No +11 , , PUNCT _ _ 4 punct _ _ +12 dans dans ADP _ _ 14 case _ _ +13 le le DET _ Definite=Def|Gender=Masc|Number=Sing|PronType=Art 14 det _ _ +14 château château NOUN _ Gender=Masc|Number=Sing 4 obl _ _ +15 de de ADP _ _ 16 case _ _ +16 Versailles Versailles PROPN _ _ 14 nmod _ SpaceAfter=No +17 . . PUNCT _ _ 4 punct _ _ + +# sent_id = fr-ud-train_00024 +# text = Les experts sont unanimes pour dater ce manuscrit du VIe siècle. +# sentence 5 +1 Les le DET _ Definite=Def|Gender=Masc|Number=Plur|PronType=Art 2 det _ _ +2 experts expert NOUN _ Gender=Masc|Number=Plur 3 nsubj _ _ +3 sont être VERB _ Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin 0 root _ _ +4 unanimes unanime ADJ _ Gender=Masc|Number=Plur 3 amod _ _ +5 pour pour ADP _ _ 6 mark _ _ +6 dater dater VERB _ VerbForm=Inf 3 advcl _ VF=INF +7 ce ce DET _ Gender=Masc|Number=Sing|PronType=Dem 8 det _ _ +8 manuscrit manuscrit NOUN _ Gender=Masc|Number=Sing 6 obj _ _ +9-10 du _ _ _ _ _ _ _ _ +9 de de ADP _ _ 12 case _ _ +10 le le DET _ Definite=Def|Gender=Masc|Number=Sing|PronType=Art 12 det _ _ +11 VIe VIe ADJ _ Gender=Masc|Number=Sing|NumType=Ord 12 amod _ _ +12 siècle siècle NOUN _ Gender=Masc|Number=Sing 8 nmod _ SpaceAfter=No +13 . . PUNCT _ _ 3 punct _ _ + +# sent_id = conlueditor-test-6 +# text = ils ont visité le Musée du Louvre. +# sentence 6 +1 ils il PRON PERS_NOM Gender=Masc|Number=Plur|Person=3|PronType=Prs 3 nsubj _ _ +2 ont avoir AUX AUXA Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin 3 aux _ _ +3 visité visiter VERB PARTP Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part 0 root _ _ +4 le le DET ART Definite=Def|Gender=Masc|Number=Sing|PronType=Art 5 det _ _ +5 Musée musée NOUN NOUN Gender=Masc|Number=Sing 3 obj _ _ +6-7 du _ _ _ _ _ _ _ _ +6 de de ADP ADP _ 8 case _ _ +7 le le DET ART Definite=Def|Gender=Masc|Number=Sing|PronType=Art 8 det _ _ +8 Louvre Louvre PROPN PROPN _ 5 nmod _ SpaceAfter=No +9 . . PUNCT PUNCT _ 3 punct _ _ + +# sent_id = conlueditor-test-7 +# text = la souris a mangé le fromage qui pue. +# sentence 7 +1 la le DET ART Definite=Def|Gender=Fem|Number=Sing|PronType=Art 2 det _ _ +2 souris souris NOUN NOUN Gender=Fem|Number=Sing 4 nsubj _ _ +3 a avoir AUX AUXA Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 4 aux _ _ +4 mangé manger VERB PARTP Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part 0 root _ _ +5 le le DET ART Definite=Def|Gender=Masc|Number=Sing|PronType=Art 6 det _ _ +6 fromage fromage NOUN NOUN Gender=Masc|Number=Sing 4 obj _ _ +7 qui qui PRON REL PronType=Rel 8 nsubj _ _ +8 pue puer VERB VERB Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin 6 acl:relcl _ SpaceAfter=No +9 . . PUNCT PUNCT _ 4 punct _ _ + +# sent_id = conlueditor-test-8 +# text = il habite à Los Angeles. +# sentence 8 +1 il il PRON PERS_NOM Gender=Masc|Number=Sing|Person=3|PronType=Prs 2 nsubj _ _ +2 habite habiter VERB VERB Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 0 root _ _ +3 à à ADP ADP _ 4 case _ _ +4 Los Los PROPN PROPN _ 2 obl _ _ +5 Angeles Angeles PROPN PROPN _ 4 flat:name _ SpaceAfter=No +6 . . PUNCT PUNCT _ 2 punct _ _ + +# sent_id = conlueditor-test-9 +# text = Il aime bien le Miroir aux Alouettes. +# sentence 9 +1 Il il PRON PERS_NOM Gender=Masc|Number=Sing|Person=3|PronType=Prs 2 nsubj _ _ +2 aime aimer VERB VERB Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 0 root _ _ +3 bien bien ADV ADV _ 2 advmod _ _ +4 le le DET ART Definite=Def|Gender=Masc|Number=Sing|PronType=Art 5 det _ _ +5 Miroir miroir NOUN NOUN Gender=Masc|Number=Sing 2 obj _ _ +6-7 aux _ _ _ _ _ _ _ _ +6 à à ADP ADP _ 8 case _ _ +7 les le DET ART Definite=Def|Gender=Fem|Number=Plur|PronType=Art 8 det _ _ +8 Alouettes alouette NOUN NOUN Gender=Fem|Number=Plur 2 obl _ SpaceAfter=No +9 . . PUNCT PUNCT _ 2 punct _ _ + +# sent_id = conlueditor-test-10-only-tokens +# text = Une exposition au musée du Louvre et un voyage au village catalan de Gosol +# sentence 10 +1 Une _ _ _ _ _ _ _ _ +2 exposition _ _ _ _ _ _ _ _ +3-4 au _ _ _ _ _ _ _ _ +3 à _ _ _ _ _ _ _ _ +4 le _ _ _ _ _ _ _ _ +5 musée _ _ _ _ _ _ _ _ +6-7 du _ _ _ _ _ _ _ _ +6 de _ _ _ _ _ _ _ _ +7 le _ _ _ _ _ _ _ _ +8 Louvre _ _ _ _ _ _ _ _ +9 et _ _ _ _ _ _ _ _ +10 un _ _ _ _ _ _ _ _ +11 voyage _ _ _ _ _ _ _ _ +12-13 au _ _ _ _ _ _ _ _ +12 à _ _ _ _ _ _ _ _ +13 le _ _ _ _ _ _ _ _ +14 village _ _ _ _ _ _ _ _ +15 catalan _ _ _ _ _ _ _ _ +16 de _ _ _ _ _ _ _ _ +17 Gosol _ _ _ _ _ _ _ _ + +# sent_id = conlueditor-test-11-eud +# text = Sam bought and prepared dinner +# sentence 11 +1 Sam Sam PROPN _ _ 2 nsubj 2:nsubj|4:nsubj _ +2 bought buy VERB _ _ 0 root _ _ +3 and and CCONJ _ _ 4 cc _ _ +4 prepared prepare VERB _ _ 2 conj _ _ +5 dinner dinner NOUN _ _ 2 obj 2:obj|4:obj _ + +# sent_id = conlueditor-test-12-eud-bad-ids +# text = Sam persuaded Kim to fix dinner +# sentence 12 +1 Sam Sam PROPN _ _ 2 nsubj _ _ +2 persuaded persuade VERB _ _ 0 root _ _ +3 Kim Kim PROPN _ _ 2 obj 2:obj|5:nsubj _ +4 to to ADP _ _ 5 mark _ _ +5 fix fix VERB _ _ 2 xcomp _ _ +6 dinner dinner NOUN _ _ 5 obj _ _ + +# sent_id = ellipsis1 +# text = Sam fixed lunch and Kim dinner +# sentence 13 +1 Sam Sam PROPN NNP Number=Sing 2 nsubj 2:nsubj _ +2 fixed fix VERB VBD Mood=Ind|Tense=Past|VerbForm=Fin 0 root 0:root _ +3 lunch lunch NOUN NN Number=Sing 2 obj 2:obj _ +4 and and CCONJ CC _ 5 cc 5:cc|5.1:cc _ +5 Kim Kim PROPN NNP Number=Sing 2 conj 2:conj|5.1:nsubj _ +5.1 fixed fix VERB VBD Mood=Ind|Tense=Past|VerbForm=Fin _ _ 2:conj _ +6 dinner dinner NOUN NN Number=Sing 5 orphan 5.1:obj _ + +# sent_id = ellipsis2 +# text = Mary organised bred and John beer +# sentence 14 +1 Mary Mary PROPN _ _ 2 nsubj 2:nsubj _ +2 organised organise VERB _ _ 0 root 0:root _ +3 bred bred NOUN _ _ 2 obj 2:obj _ +4 and and CCONJ _ _ 5 cc 5:cc|5.1:cc _ +5 John John PROPN _ _ 2 conj 2:conj|5.1:nsubj _ +5.1 organised _ _ _ _ _ _ 2:conj _ +6 beer beer NOUN _ _ 5 orphan 5.1:obj _ + +# sent_id = unannotated +# text = Une exposition au musée du Louvre +# sentence 15 +1 Une _ _ _ _ _ _ _ _ +2 exposition _ _ _ _ _ _ _ _ +3-4 au _ _ _ _ _ _ _ _ +3 à _ _ _ _ _ _ _ _ +4 le _ _ _ _ _ _ _ _ +5 musée _ _ _ _ _ _ _ _ +6-7 du _ _ _ _ _ _ _ _ +6 de _ _ _ _ _ _ _ _ +7 le _ _ _ _ _ _ _ _ +8 Louvre _ _ _ _ _ _ _ _ + +# sent_id = sv-ud-train-767 +# text = Om du köper en halv liter mjölk för 0:78, en limpa för 2:57 och ett halvt kilo margarin för 2:87 gör detta sammanlagt 6:22. +# sentence 16 +1 Om om SCONJ SN _ 3 mark 3:mark _ +2 du du PRON PN|UTR|SIN|DEF|SUB Case=Nom|Definite=Def|Gender=Com|Number=Sing|PronType=Prs 3 nsubj 3:nsubj|10.1:nsubj|15.1:nsubj _ +3 köper köpa VERB VB|PRS|AKT Mood=Ind|Tense=Pres|VerbForm=Fin|Voice=Act 22 advcl 22:advcl:om _ +4 en en DET DT|UTR|SIN|IND Definite=Ind|Gender=Com|Number=Sing|PronType=Art 6 det 6:det _ +5 halv halv ADJ JJ|POS|UTR|SIN|IND|NOM Case=Nom|Definite=Ind|Degree=Pos|Gender=Com|Number=Sing 4 fixed 4:fixed _ +6 liter liter NOUN NN|UTR|SIN|IND|NOM Case=Nom|Definite=Ind|Gender=Com|Number=Sing 7 nmod 7:nmod _ +7 mjölk mjölk NOUN NN|UTR|SIN|IND|NOM Case=Nom|Definite=Ind|Gender=Com|Number=Sing 3 obj 3:obj _ +8 för för ADP PP _ 9 case 9:case _ +9 0:78 0:78 NUM RG|NOM Case=Nom|NumType=Card 3 obl 3:obl:för SpaceAfter=No +10 , , PUNCT MID _ 12 punct 12:punct _ +10.1 köper köpa VERB VB|PRS|AKT Mood=Ind|Tense=Pres|VerbForm=Fin|Voice=Act _ _ 3:conj|22:advcl:om _ +11 en en DET DT|UTR|SIN|IND Definite=Ind|Gender=Com|Number=Sing|PronType=Art 12 det 12:det _ +12 limpa limpa NOUN NN|UTR|SIN|IND|NOM Case=Nom|Definite=Ind|Gender=Com|Number=Sing 3 conj 10.1:obj Enhanced=obj +13 för för ADP PP _ 14 case 14:case _ +14 2:57 2:57 NUM RG|NOM Case=Nom|NumType=Card 12 orphan 10.1:obl:för Enhanced=obl +15 och och CCONJ KN _ 19 cc 15.1:cc _ +15.1 köper köpa VERB VB|PRS|AKT Mood=Ind|Tense=Pres|VerbForm=Fin|Voice=Act _ _ 3:conj:och _ +16 ett en DET DT|NEU|SIN|IND Definite=Ind|Gender=Neut|Number=Sing|PronType=Art 18 det 18:det _ +17 halvt halv ADJ JJ|POS|NEU|SIN|IND|NOM Case=Nom|Definite=Ind|Degree=Pos|Gender=Neut|Number=Sing 16 fixed 16:fixed _ +18 kilo kilo NOUN NN|NEU|SIN|IND|NOM Case=Nom|Definite=Ind|Gender=Neut|Number=Sing 19 nmod 19:nmod _ +19 margarin margarin NOUN NN|NEU|SIN|IND|NOM Case=Nom|Definite=Ind|Gender=Neut|Number=Sing 12 conj 15.1:obj Enhanced=obj +20 för för ADP PP _ 21 case 21:case _ +21 2:87 2:87 NUM RG|NOM Case=Nom|NumType=Card 19 orphan 15.1:obl:för Enhanced=obl +22 gör göra VERB VB|PRS|AKT Mood=Ind|Tense=Pres|VerbForm=Fin|Voice=Act 0 root 0:root _ +23 detta denna PRON PN|NEU|SIN|DEF|SUB/OBJ Definite=Def|Gender=Neut|Number=Sing|PronType=Dem 22 nsubj 22:nsubj _ +24 sammanlagt sammanlagd ADV AB _ 22 advmod 22:advmod _ +25 6:22 6:22 NUM RG|NOM Case=Nom|NumType=Card 22 nummod 22:nummod SpaceAfter=No +26 . . PUNCT MAD _ 22 punct 22:punct _ + +# sent_id = gl_ctg-ud-dev.conllu 625 +# text = Dáselle nova redacción a o punto 2 +# sentence 17 +# used to test adding MWT 1-3 and 6-7 +1 Dá dar VERB VMIP3S0 _ 0 root _ Treeler=sentence|SpaceAfter=No +2 se se PRON PP3CN000 _ 1 nsubj _ Treeler=suj|SpaceAfter=No +3 lle lle PRON PP3CSD00 _ 1 nsubj _ Treeler=suj +4 nova novo ADJ AQ0FS0 _ 5 amod _ Treeler=s.a +5 redacción redacción NOUN NCFS000 _ 1 obj _ Treeler=cd +6 a a ADP SPS00 _ 1 iobj _ Treeler=ci +7 o o DET DA0MS0 _ 8 det _ Treeler=spec +8 punto punto NOUN NCMS000 _ 6 nmod _ ToDo=nmod|Treeler=sn +9 2 2 NUM Z _ 8 nmod _ Treeler=sn + +# sent_id = fi_tdt-ud-train.conllu b712.11 +# text = Sekaan 1 purkillinen kermaviiliä ja ja sitten jaoin perustahnan 4 lautaselle. +# sentence 18 +0.1 Laitoin laittaa VERB _ Mood=Ind|Number=Sing|Person=1|Tense=Past|VerbForm=Fin|Voice=Act _ _ 0:root _ +1 Sekaan sekaan ADV Adv _ 3 orphan 0.1:advmod _ +2 1 1 NUM Num NumType=Card 3 nummod 3:nummod _ +3 purkillinen purkillinen NOUN N Case=Nom|Number=Sing 0 root 0:root|0.1:obj _ +4 kermaviiliä kerma#viili NOUN N Case=Par|Number=Sing 3 nmod 3:nmod:par _ +5 ja ja CCONJ C _ 8 cc 8:cc _ +6 ja ja CCONJ C _ 8 cc 8:cc _ +7 sitten sitten ADV Adv _ 8 advmod 8:advmod _ +8 jaoin jakaa VERB V Mood=Ind|Number=Sing|Person=1|Tense=Past|VerbForm=Fin|Voice=Act 3 conj 0.1:conj|3:conj _ +9 perustahnan perus#tahna NOUN N Case=Gen|Number=Sing 8 obj 8:obj _ +10 4 4 NUM Num NumType=Card 11 nummod 11:nummod _ +11 lautaselle lautanen NOUN N Case=All|Number=Sing 8 obl 8:obl:all SpaceAfter=No +12 . . PUNCT Punct _ 3 punct 0.1:punct|3:punct _ + diff --git a/src/test/resources/rule28.conllu b/src/test/resources/rule28.conllu new file mode 100644 index 0000000..c2c1789 --- /dev/null +++ b/src/test/resources/rule28.conllu @@ -0,0 +1,335 @@ +# sent_id = fr-ud-dev_00001 +# text = Aviator, un film sur la vie de Howard Hughes. +# sentence 0 +1 Aviator Aviator PROPN _ _ 0 root _ SpaceAfter=No +2 , , PUNCT _ _ 1 punct _ _ +3 un un DET _ Definite=Ind|Gender=Masc|Number=Sing|PronType=Art 4 det _ _ +4 film film NOUN _ Gender=Masc|Number=Sing 1 appos _ _ +5 sur sur ADP _ _ 7 case _ _ +6 la le DET _ Definite=Def|Gender=Fem|Number=Sing|PronType=Art 7 det _ _ +7 vie vie NOUN _ Gender=Fem|Number=Sing 4 nmod _ _ +8 de de ADP _ _ 9 case _ _ +9 Howard Howard PROPN _ _ 7 nmod _ _ +10 Hughes Hughes PROPN _ _ 9 flat:name _ SpaceAfter=No +11 . . PUNCT _ _ 1 punct _ _ + +# sent_id = fr-ud-dev_00002 +# text = Les études durent six ans mais leur contenu diffère donc selon les Facultés. +# sentence 1 +1 Les le DET _ Definite=Def|Gender=Fem|Number=Plur|PronType=Art 2 det _ _ +2 études étude NOUN _ Gender=Fem|Number=Plur 3 nsubj _ _ +3 durent durer VERB _ Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin 0 root _ VF=ANY +4 six six NUM _ _ 5 nummod _ _ +5 ans an NOUN _ Gender=Masc|Number=Plur 3 obj _ _ +6 mais mais CCONJ _ _ 9 cc _ _ +7 leur son DET _ Gender=Masc|Number=Sing|PronType=Prs 8 nmod:poss _ _ +8 contenu contenu NOUN _ Gender=Masc|Number=Sing 9 nsubj _ _ +9 diffère différer VERB _ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 3 conj _ VF=ANY +10 donc donc ADV _ _ 9 advmod _ _ +11 selon selon ADP _ _ 13 case _ _ +12 les le DET _ Definite=Def|Number=Plur|PronType=Art 13 det _ _ +13 Facultés Facultés PROPN _ _ 9 obl _ SpaceAfter=No +14 . . PUNCT _ _ 3 punct _ _ + +# sent_id = fr-ud-dev_00003 +# text = Mais comment faire dans un contexte structurellement raciste ? +# sentence 2 +1 Mais mais CCONJ _ _ 3 cc _ _ +2 comment comment ADV _ _ 3 advmod _ _ +3 faire faire VERB _ VerbForm=Inf 0 root _ VF=ANY +4 dans dans ADP _ _ 6 case _ _ +5 un un DET _ Definite=Ind|Gender=Masc|Number=Sing|PronType=Art 6 det _ _ +6 contexte contexte NOUN _ Gender=Masc|Number=Sing 3 obl _ _ +7 structurellement structurellement ADV _ _ 8 advmod _ _ +8 raciste raciste ADJ _ Gender=Masc|Number=Sing 6 amod _ _ +9 ? ? PUNCT _ _ 3 punct _ _ + +# sent_id = fr-ud-dev_00004 +# text = L'« oasis de vie », dans un milieu où règne l'obscurité totale et une pression hydrostatique importante, est riche et varié : les chercheurs y découvrent de nouvelles espèces de bivalves, de poissons, de crustacés, de poulpes dans des zones pensées jusqu'alors désertiques. +# sentence 3 +1 L' le DET _ Definite=Def|Gender=Fem|Number=Sing|PronType=Art 3 det _ SpaceAfter=No +2 « « PUNCT _ _ 3 punct _ _ +3 oasis oasis NOUN _ Gender=Fem|Number=Sing 23 nsubj _ _ +4 de de ADP _ _ 5 case _ _ +5 vie vie NOUN _ Gender=Fem|Number=Sing 3 nmod _ _ +6 » » PUNCT _ _ 3 punct _ SpaceAfter=No +7 , , PUNCT _ _ 3 punct _ _ +8 dans dans ADP _ _ 10 case _ _ +9 un un DET _ Definite=Ind|Gender=Masc|Number=Sing|PronType=Art 10 det _ _ +10 milieu milieu NOUN _ Gender=Masc|Number=Sing 23 obl _ _ +11 où où PRON _ PronType=Rel 12 obl _ _ +12 règne régner VERB _ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 10 acl:relcl _ VF=ANY +13 l' le DET _ Definite=Def|Gender=Fem|Number=Sing|PronType=Art 14 det _ SpaceAfter=No +14 obscurité obscurité NOUN _ Gender=Fem|Number=Sing 12 nsubj _ _ +15 totale total ADJ _ Gender=Fem|Number=Sing 14 amod _ _ +16 et et CCONJ _ _ 18 cc _ _ +17 une un DET _ Definite=Ind|Gender=Fem|Number=Sing|PronType=Art 18 det _ _ +18 pression pression NOUN _ Gender=Fem|Number=Sing 14 conj _ _ +19 hydrostatique hydrostatique ADJ _ Gender=Fem|Number=Sing 18 amod _ _ +20 importante important ADJ _ Gender=Fem|Number=Sing 18 amod _ SpaceAfter=No +21 , , PUNCT _ _ 23 punct _ _ +22 est être AUX _ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 23 cop _ VF=ANY +23 riche riche ADJ _ Gender=Fem|Number=Sing 0 root _ _ +24 et et CCONJ _ _ 25 cc _ _ +25 varié varié ADJ _ Gender=Masc|Number=Sing 23 conj _ _ +26 : : PUNCT _ _ 23 punct _ _ +27 les le DET _ Definite=Def|Gender=Masc|Number=Plur|PronType=Art 28 det _ _ +28 chercheurs chercheur NOUN _ Gender=Masc|Number=Plur 30 nsubj _ _ +29 y y PRON _ _ 30 advmod _ _ +30 découvrent découvrir VERB _ Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin 23 parataxis _ VF=ANY +31 de un DET _ Definite=Ind|Gender=Fem|Number=Plur|PronType=Art 33 det _ _ +32 nouvelles nouveau ADJ _ Gender=Fem|Number=Plur 33 amod _ _ +33 espèces espèce NOUN _ Gender=Fem|Number=Plur 30 obj _ _ +34 de de ADP _ _ 35 case _ _ +35 bivalves bivalve NOUN _ Gender=Masc|Number=Plur 33 nmod _ SpaceAfter=No +36 , , PUNCT _ _ 38 punct _ _ +37 de de ADP _ _ 38 case _ _ +38 poissons poisson NOUN _ Gender=Masc|Number=Plur 35 conj _ SpaceAfter=No +39 , , PUNCT _ _ 41 punct _ _ +40 de de ADP _ _ 41 case _ _ +41 crustacés crustacé NOUN _ Gender=Masc|Number=Plur 35 conj _ SpaceAfter=No +42 , , PUNCT _ _ 44 punct _ _ +43 de de ADP _ _ 44 case _ _ +44 poulpes poulpe NOUN _ Gender=Masc|Number=Plur 35 conj _ _ +45 dans dans ADP _ _ 47 case _ _ +46 des un DET _ Definite=Ind|Gender=Fem|Number=Plur|PronType=Art 47 det _ _ +47 zones zone NOUN _ Gender=Fem|Number=Plur 30 obl _ _ +48 pensées penser VERB _ Gender=Fem|Number=Plur|Tense=Past|VerbForm=Part 47 acl _ VF=ANY +49 jusqu' jusque ADP _ _ 50 case _ SpaceAfter=No +50 alors alors ADV _ _ 48 advmod _ _ +51 désertiques désertique ADJ _ Gender=Fem|Number=Plur 48 amod _ SpaceAfter=No +52 . . PUNCT _ _ 23 punct _ _ + +# sent_id = fr-ud-train_00002 +# text = L'œuvre est située dans la galerie des batailles, dans le château de Versailles. +# sentence 4 +1 L' le DET _ Definite=Def|Gender=Fem|Number=Sing|PronType=Art 2 det _ SpaceAfter=No +2 œuvre œuvre NOUN _ Gender=Fem|Number=Sing 4 nsubj _ _ +3 est être AUX _ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 4 aux _ VF=ANY +4 située situer VERB _ Gender=Fem|Number=Sing|Tense=Past|VerbForm=Part 0 root _ SpaceAfter=\s\t\s|VF=ANY +5 dans dans ADP _ _ 7 case _ _ +6 la le DET _ Definite=Def|Gender=Fem|Number=Sing|PronType=Art 7 det _ SpaceAfter=  +7 galerie galerie NOUN _ Gender=Fem|Number=Sing 4 obl _ _ +8-9 des _ _ _ _ _ _ _ _ +8 de de ADP _ _ 10 case _ _ +9 les le DET _ Definite=Def|Gender=Fem|Number=Plur|PronType=Art 10 det _ _ +10 batailles bataille NOUN _ Gender=Fem|Number=Plur 7 nmod _ SpaceAfter=No +11 , , PUNCT _ _ 4 punct _ _ +12 dans dans ADP _ _ 14 case _ _ +13 le le DET _ Definite=Def|Gender=Masc|Number=Sing|PronType=Art 14 det _ _ +14 château château NOUN _ Gender=Masc|Number=Sing 4 obl _ _ +15 de de ADP _ _ 16 case _ _ +16 Versailles Versailles PROPN _ _ 14 nmod _ SpaceAfter=No +17 . . PUNCT _ _ 4 punct _ _ + +# sent_id = fr-ud-train_00024 +# text = Les experts sont unanimes pour dater ce manuscrit du VIe siècle. +# sentence 5 +1 Les le DET _ Definite=Def|Gender=Masc|Number=Plur|PronType=Art 2 det _ _ +2 experts expert NOUN _ Gender=Masc|Number=Plur 3 nsubj _ _ +3 sont être VERB _ Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin 0 root _ VF=ANY +4 unanimes unanime ADJ _ Gender=Masc|Number=Plur 3 amod _ _ +5 pour pour ADP _ _ 6 mark _ _ +6 dater dater VERB _ VerbForm=Inf 3 advcl _ VF=ANY +7 ce ce DET _ Gender=Masc|Number=Sing|PronType=Dem 8 det _ _ +8 manuscrit manuscrit NOUN _ Gender=Masc|Number=Sing 6 obj _ _ +9-10 du _ _ _ _ _ _ _ _ +9 de de ADP _ _ 12 case _ _ +10 le le DET _ Definite=Def|Gender=Masc|Number=Sing|PronType=Art 12 det _ _ +11 VIe VIe ADJ _ Gender=Masc|Number=Sing|NumType=Ord 12 amod _ _ +12 siècle siècle NOUN _ Gender=Masc|Number=Sing 8 nmod _ SpaceAfter=No +13 . . PUNCT _ _ 3 punct _ _ + +# sent_id = conlueditor-test-6 +# text = ils ont visité le Musée du Louvre. +# sentence 6 +1 ils il PRON PERS_NOM Gender=Masc|Number=Plur|Person=3|PronType=Prs 3 nsubj _ _ +2 ont avoir AUX AUXA Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin 3 aux _ VF=ANY +3 visité visiter VERB PARTP Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part 0 root _ VF=ANY +4 le le DET ART Definite=Def|Gender=Masc|Number=Sing|PronType=Art 5 det _ _ +5 Musée musée NOUN NOUN Gender=Masc|Number=Sing 3 obj _ _ +6-7 du _ _ _ _ _ _ _ _ +6 de de ADP ADP _ 8 case _ _ +7 le le DET ART Definite=Def|Gender=Masc|Number=Sing|PronType=Art 8 det _ _ +8 Louvre Louvre PROPN PROPN _ 5 nmod _ SpaceAfter=No +9 . . PUNCT PUNCT _ 3 punct _ _ + +# sent_id = conlueditor-test-7 +# text = la souris a mangé le fromage qui pue. +# sentence 7 +1 la le DET ART Definite=Def|Gender=Fem|Number=Sing|PronType=Art 2 det _ _ +2 souris souris NOUN NOUN Gender=Fem|Number=Sing 4 nsubj _ _ +3 a avoir AUX AUXA Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 4 aux _ VF=ANY +4 mangé manger VERB PARTP Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part 0 root _ VF=ANY +5 le le DET ART Definite=Def|Gender=Masc|Number=Sing|PronType=Art 6 det _ _ +6 fromage fromage NOUN NOUN Gender=Masc|Number=Sing 4 obj _ _ +7 qui qui PRON REL PronType=Rel 8 nsubj _ _ +8 pue puer VERB VERB Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin 6 acl:relcl _ SpaceAfter=No|VF=ANY +9 . . PUNCT PUNCT _ 4 punct _ _ + +# sent_id = conlueditor-test-8 +# text = il habite à Los Angeles. +# sentence 8 +1 il il PRON PERS_NOM Gender=Masc|Number=Sing|Person=3|PronType=Prs 2 nsubj _ _ +2 habite habiter VERB VERB Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 0 root _ VF=ANY +3 à à ADP ADP _ 4 case _ _ +4 Los Los PROPN PROPN _ 2 obl _ _ +5 Angeles Angeles PROPN PROPN _ 4 flat:name _ SpaceAfter=No +6 . . PUNCT PUNCT _ 2 punct _ _ + +# sent_id = conlueditor-test-9 +# text = Il aime bien le Miroir aux Alouettes. +# sentence 9 +1 Il il PRON PERS_NOM Gender=Masc|Number=Sing|Person=3|PronType=Prs 2 nsubj _ _ +2 aime aimer VERB VERB Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 0 root _ VF=ANY +3 bien bien ADV ADV _ 2 advmod _ _ +4 le le DET ART Definite=Def|Gender=Masc|Number=Sing|PronType=Art 5 det _ _ +5 Miroir miroir NOUN NOUN Gender=Masc|Number=Sing 2 obj _ _ +6-7 aux _ _ _ _ _ _ _ _ +6 à à ADP ADP _ 8 case _ _ +7 les le DET ART Definite=Def|Gender=Fem|Number=Plur|PronType=Art 8 det _ _ +8 Alouettes alouette NOUN NOUN Gender=Fem|Number=Plur 2 obl _ SpaceAfter=No +9 . . PUNCT PUNCT _ 2 punct _ _ + +# sent_id = conlueditor-test-10-only-tokens +# text = Une exposition au musée du Louvre et un voyage au village catalan de Gosol +# sentence 10 +1 Une _ _ _ _ _ _ _ _ +2 exposition _ _ _ _ _ _ _ _ +3-4 au _ _ _ _ _ _ _ _ +3 à _ _ _ _ _ _ _ _ +4 le _ _ _ _ _ _ _ _ +5 musée _ _ _ _ _ _ _ _ +6-7 du _ _ _ _ _ _ _ _ +6 de _ _ _ _ _ _ _ _ +7 le _ _ _ _ _ _ _ _ +8 Louvre _ _ _ _ _ _ _ _ +9 et _ _ _ _ _ _ _ _ +10 un _ _ _ _ _ _ _ _ +11 voyage _ _ _ _ _ _ _ _ +12-13 au _ _ _ _ _ _ _ _ +12 à _ _ _ _ _ _ _ _ +13 le _ _ _ _ _ _ _ _ +14 village _ _ _ _ _ _ _ _ +15 catalan _ _ _ _ _ _ _ _ +16 de _ _ _ _ _ _ _ _ +17 Gosol _ _ _ _ _ _ _ _ + +# sent_id = conlueditor-test-11-eud +# text = Sam bought and prepared dinner +# sentence 11 +1 Sam Sam PROPN _ _ 2 nsubj 2:nsubj|4:nsubj _ +2 bought buy VERB _ _ 0 root _ _ +3 and and CCONJ _ _ 4 cc _ _ +4 prepared prepare VERB _ _ 2 conj _ _ +5 dinner dinner NOUN _ _ 2 obj 2:obj|4:obj _ + +# sent_id = conlueditor-test-12-eud-bad-ids +# text = Sam persuaded Kim to fix dinner +# sentence 12 +1 Sam Sam PROPN _ _ 2 nsubj _ _ +2 persuaded persuade VERB _ _ 0 root _ _ +3 Kim Kim PROPN _ _ 2 obj 2:obj|5:nsubj _ +4 to to ADP _ _ 5 mark _ _ +5 fix fix VERB _ _ 2 xcomp _ _ +6 dinner dinner NOUN _ _ 5 obj _ _ + +# sent_id = ellipsis1 +# text = Sam fixed lunch and Kim dinner +# sentence 13 +1 Sam Sam PROPN NNP Number=Sing 2 nsubj 2:nsubj _ +2 fixed fix VERB VBD Mood=Ind|Tense=Past|VerbForm=Fin 0 root 0:root VF=ANY +3 lunch lunch NOUN NN Number=Sing 2 obj 2:obj _ +4 and and CCONJ CC _ 5 cc 5:cc|5.1:cc _ +5 Kim Kim PROPN NNP Number=Sing 2 conj 2:conj|5.1:nsubj _ +5.1 fixed fix VERB VBD Mood=Ind|Tense=Past|VerbForm=Fin _ _ 2:conj VF=ANY +6 dinner dinner NOUN NN Number=Sing 5 orphan 5.1:obj _ + +# sent_id = ellipsis2 +# text = Mary organised bred and John beer +# sentence 14 +1 Mary Mary PROPN _ _ 2 nsubj 2:nsubj _ +2 organised organise VERB _ _ 0 root 0:root _ +3 bred bred NOUN _ _ 2 obj 2:obj _ +4 and and CCONJ _ _ 5 cc 5:cc|5.1:cc _ +5 John John PROPN _ _ 2 conj 2:conj|5.1:nsubj _ +5.1 organised _ _ _ _ _ _ 2:conj _ +6 beer beer NOUN _ _ 5 orphan 5.1:obj _ + +# sent_id = unannotated +# text = Une exposition au musée du Louvre +# sentence 15 +1 Une _ _ _ _ _ _ _ _ +2 exposition _ _ _ _ _ _ _ _ +3-4 au _ _ _ _ _ _ _ _ +3 à _ _ _ _ _ _ _ _ +4 le _ _ _ _ _ _ _ _ +5 musée _ _ _ _ _ _ _ _ +6-7 du _ _ _ _ _ _ _ _ +6 de _ _ _ _ _ _ _ _ +7 le _ _ _ _ _ _ _ _ +8 Louvre _ _ _ _ _ _ _ _ + +# sent_id = sv-ud-train-767 +# text = Om du köper en halv liter mjölk för 0:78, en limpa för 2:57 och ett halvt kilo margarin för 2:87 gör detta sammanlagt 6:22. +# sentence 16 +1 Om om SCONJ SN _ 3 mark 3:mark _ +2 du du PRON PN|UTR|SIN|DEF|SUB Case=Nom|Definite=Def|Gender=Com|Number=Sing|PronType=Prs 3 nsubj 3:nsubj|10.1:nsubj|15.1:nsubj _ +3 köper köpa VERB VB|PRS|AKT Mood=Ind|Tense=Pres|VerbForm=Fin|Voice=Act 22 advcl 22:advcl:om VF=ANY +4 en en DET DT|UTR|SIN|IND Definite=Ind|Gender=Com|Number=Sing|PronType=Art 6 det 6:det _ +5 halv halv ADJ JJ|POS|UTR|SIN|IND|NOM Case=Nom|Definite=Ind|Degree=Pos|Gender=Com|Number=Sing 4 fixed 4:fixed _ +6 liter liter NOUN NN|UTR|SIN|IND|NOM Case=Nom|Definite=Ind|Gender=Com|Number=Sing 7 nmod 7:nmod _ +7 mjölk mjölk NOUN NN|UTR|SIN|IND|NOM Case=Nom|Definite=Ind|Gender=Com|Number=Sing 3 obj 3:obj _ +8 för för ADP PP _ 9 case 9:case _ +9 0:78 0:78 NUM RG|NOM Case=Nom|NumType=Card 3 obl 3:obl:för SpaceAfter=No +10 , , PUNCT MID _ 12 punct 12:punct _ +10.1 köper köpa VERB VB|PRS|AKT Mood=Ind|Tense=Pres|VerbForm=Fin|Voice=Act _ _ 3:conj|22:advcl:om VF=ANY +11 en en DET DT|UTR|SIN|IND Definite=Ind|Gender=Com|Number=Sing|PronType=Art 12 det 12:det _ +12 limpa limpa NOUN NN|UTR|SIN|IND|NOM Case=Nom|Definite=Ind|Gender=Com|Number=Sing 3 conj 10.1:obj Enhanced=obj +13 för för ADP PP _ 14 case 14:case _ +14 2:57 2:57 NUM RG|NOM Case=Nom|NumType=Card 12 orphan 10.1:obl:för Enhanced=obl +15 och och CCONJ KN _ 19 cc 15.1:cc _ +15.1 köper köpa VERB VB|PRS|AKT Mood=Ind|Tense=Pres|VerbForm=Fin|Voice=Act _ _ 3:conj:och VF=ANY +16 ett en DET DT|NEU|SIN|IND Definite=Ind|Gender=Neut|Number=Sing|PronType=Art 18 det 18:det _ +17 halvt halv ADJ JJ|POS|NEU|SIN|IND|NOM Case=Nom|Definite=Ind|Degree=Pos|Gender=Neut|Number=Sing 16 fixed 16:fixed _ +18 kilo kilo NOUN NN|NEU|SIN|IND|NOM Case=Nom|Definite=Ind|Gender=Neut|Number=Sing 19 nmod 19:nmod _ +19 margarin margarin NOUN NN|NEU|SIN|IND|NOM Case=Nom|Definite=Ind|Gender=Neut|Number=Sing 12 conj 15.1:obj Enhanced=obj +20 för för ADP PP _ 21 case 21:case _ +21 2:87 2:87 NUM RG|NOM Case=Nom|NumType=Card 19 orphan 15.1:obl:för Enhanced=obl +22 gör göra VERB VB|PRS|AKT Mood=Ind|Tense=Pres|VerbForm=Fin|Voice=Act 0 root 0:root VF=ANY +23 detta denna PRON PN|NEU|SIN|DEF|SUB/OBJ Definite=Def|Gender=Neut|Number=Sing|PronType=Dem 22 nsubj 22:nsubj _ +24 sammanlagt sammanlagd ADV AB _ 22 advmod 22:advmod _ +25 6:22 6:22 NUM RG|NOM Case=Nom|NumType=Card 22 nummod 22:nummod SpaceAfter=No +26 . . PUNCT MAD _ 22 punct 22:punct _ + +# sent_id = gl_ctg-ud-dev.conllu 625 +# text = Dáselle nova redacción a o punto 2 +# sentence 17 +# used to test adding MWT 1-3 and 6-7 +1 Dá dar VERB VMIP3S0 _ 0 root _ Treeler=sentence|SpaceAfter=No +2 se se PRON PP3CN000 _ 1 nsubj _ Treeler=suj|SpaceAfter=No +3 lle lle PRON PP3CSD00 _ 1 nsubj _ Treeler=suj +4 nova novo ADJ AQ0FS0 _ 5 amod _ Treeler=s.a +5 redacción redacción NOUN NCFS000 _ 1 obj _ Treeler=cd +6 a a ADP SPS00 _ 1 iobj _ Treeler=ci +7 o o DET DA0MS0 _ 8 det _ Treeler=spec +8 punto punto NOUN NCMS000 _ 6 nmod _ ToDo=nmod|Treeler=sn +9 2 2 NUM Z _ 8 nmod _ Treeler=sn + +# sent_id = fi_tdt-ud-train.conllu b712.11 +# text = Sekaan 1 purkillinen kermaviiliä ja ja sitten jaoin perustahnan 4 lautaselle. +# sentence 18 +0.1 Laitoin laittaa VERB _ Mood=Ind|Number=Sing|Person=1|Tense=Past|VerbForm=Fin|Voice=Act _ _ 0:root VF=ANY +1 Sekaan sekaan ADV Adv _ 3 orphan 0.1:advmod _ +2 1 1 NUM Num NumType=Card 3 nummod 3:nummod _ +3 purkillinen purkillinen NOUN N Case=Nom|Number=Sing 0 root 0:root|0.1:obj _ +4 kermaviiliä kerma#viili NOUN N Case=Par|Number=Sing 3 nmod 3:nmod:par _ +5 ja ja CCONJ C _ 8 cc 8:cc _ +6 ja ja CCONJ C _ 8 cc 8:cc _ +7 sitten sitten ADV Adv _ 8 advmod 8:advmod _ +8 jaoin jakaa VERB V Mood=Ind|Number=Sing|Person=1|Tense=Past|VerbForm=Fin|Voice=Act 3 conj 0.1:conj|3:conj VF=ANY +9 perustahnan perus#tahna NOUN N Case=Gen|Number=Sing 8 obj 8:obj _ +10 4 4 NUM Num NumType=Card 11 nummod 11:nummod _ +11 lautaselle lautanen NOUN N Case=All|Number=Sing 8 obl 8:obl:all SpaceAfter=No +12 . . PUNCT Punct _ 3 punct 0.1:punct|3:punct _ + diff --git a/src/test/resources/rule29.conllu b/src/test/resources/rule29.conllu new file mode 100644 index 0000000..a29c61a --- /dev/null +++ b/src/test/resources/rule29.conllu @@ -0,0 +1,335 @@ +# sent_id = fr-ud-dev_00001 +# text = Aviator, un film sur la vie de Howard Hughes. +# sentence 0 +1 Aviator Aviator PROPN _ SPACEAFTER=ANY 0 root _ SpaceAfter=No +2 , , PUNCT _ _ 1 punct _ _ +3 un un DET _ Definite=Ind|Gender=Masc|Number=Sing|PronType=Art 4 det _ _ +4 film film NOUN _ Gender=Masc|Number=Sing 1 appos _ _ +5 sur sur ADP _ _ 7 case _ _ +6 la le DET _ Definite=Def|Gender=Fem|Number=Sing|PronType=Art 7 det _ _ +7 vie vie NOUN _ Gender=Fem|Number=Sing 4 nmod _ _ +8 de de ADP _ _ 9 case _ _ +9 Howard Howard PROPN _ _ 7 nmod _ _ +10 Hughes Hughes PROPN _ SPACEAFTER=ANY 9 flat:name _ SpaceAfter=No +11 . . PUNCT _ _ 1 punct _ _ + +# sent_id = fr-ud-dev_00002 +# text = Les études durent six ans mais leur contenu diffère donc selon les Facultés. +# sentence 1 +1 Les le DET _ Definite=Def|Gender=Fem|Number=Plur|PronType=Art 2 det _ _ +2 études étude NOUN _ Gender=Fem|Number=Plur 3 nsubj _ _ +3 durent durer VERB _ Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin 0 root _ _ +4 six six NUM _ _ 5 nummod _ _ +5 ans an NOUN _ Gender=Masc|Number=Plur 3 obj _ _ +6 mais mais CCONJ _ _ 9 cc _ _ +7 leur son DET _ Gender=Masc|Number=Sing|PronType=Prs 8 nmod:poss _ _ +8 contenu contenu NOUN _ Gender=Masc|Number=Sing 9 nsubj _ _ +9 diffère différer VERB _ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 3 conj _ _ +10 donc donc ADV _ _ 9 advmod _ _ +11 selon selon ADP _ _ 13 case _ _ +12 les le DET _ Definite=Def|Number=Plur|PronType=Art 13 det _ _ +13 Facultés Facultés PROPN _ SPACEAFTER=ANY 9 obl _ SpaceAfter=No +14 . . PUNCT _ _ 3 punct _ _ + +# sent_id = fr-ud-dev_00003 +# text = Mais comment faire dans un contexte structurellement raciste ? +# sentence 2 +1 Mais mais CCONJ _ _ 3 cc _ _ +2 comment comment ADV _ _ 3 advmod _ _ +3 faire faire VERB _ VerbForm=Inf 0 root _ _ +4 dans dans ADP _ _ 6 case _ _ +5 un un DET _ Definite=Ind|Gender=Masc|Number=Sing|PronType=Art 6 det _ _ +6 contexte contexte NOUN _ Gender=Masc|Number=Sing 3 obl _ _ +7 structurellement structurellement ADV _ _ 8 advmod _ _ +8 raciste raciste ADJ _ Gender=Masc|Number=Sing 6 amod _ _ +9 ? ? PUNCT _ _ 3 punct _ _ + +# sent_id = fr-ud-dev_00004 +# text = L'« oasis de vie », dans un milieu où règne l'obscurité totale et une pression hydrostatique importante, est riche et varié : les chercheurs y découvrent de nouvelles espèces de bivalves, de poissons, de crustacés, de poulpes dans des zones pensées jusqu'alors désertiques. +# sentence 3 +1 L' le DET _ Definite=Def|Gender=Fem|Number=Sing|PronType=Art|SPACEAFTER=ANY 3 det _ SpaceAfter=No +2 « « PUNCT _ _ 3 punct _ _ +3 oasis oasis NOUN _ Gender=Fem|Number=Sing 23 nsubj _ _ +4 de de ADP _ _ 5 case _ _ +5 vie vie NOUN _ Gender=Fem|Number=Sing 3 nmod _ _ +6 » » PUNCT _ SPACEAFTER=ANY 3 punct _ SpaceAfter=No +7 , , PUNCT _ _ 3 punct _ _ +8 dans dans ADP _ _ 10 case _ _ +9 un un DET _ Definite=Ind|Gender=Masc|Number=Sing|PronType=Art 10 det _ _ +10 milieu milieu NOUN _ Gender=Masc|Number=Sing 23 obl _ _ +11 où où PRON _ PronType=Rel 12 obl _ _ +12 règne régner VERB _ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 10 acl:relcl _ _ +13 l' le DET _ Definite=Def|Gender=Fem|Number=Sing|PronType=Art|SPACEAFTER=ANY 14 det _ SpaceAfter=No +14 obscurité obscurité NOUN _ Gender=Fem|Number=Sing 12 nsubj _ _ +15 totale total ADJ _ Gender=Fem|Number=Sing 14 amod _ _ +16 et et CCONJ _ _ 18 cc _ _ +17 une un DET _ Definite=Ind|Gender=Fem|Number=Sing|PronType=Art 18 det _ _ +18 pression pression NOUN _ Gender=Fem|Number=Sing 14 conj _ _ +19 hydrostatique hydrostatique ADJ _ Gender=Fem|Number=Sing 18 amod _ _ +20 importante important ADJ _ Gender=Fem|Number=Sing|SPACEAFTER=ANY 18 amod _ SpaceAfter=No +21 , , PUNCT _ _ 23 punct _ _ +22 est être AUX _ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 23 cop _ _ +23 riche riche ADJ _ Gender=Fem|Number=Sing 0 root _ _ +24 et et CCONJ _ _ 25 cc _ _ +25 varié varié ADJ _ Gender=Masc|Number=Sing 23 conj _ _ +26 : : PUNCT _ _ 23 punct _ _ +27 les le DET _ Definite=Def|Gender=Masc|Number=Plur|PronType=Art 28 det _ _ +28 chercheurs chercheur NOUN _ Gender=Masc|Number=Plur 30 nsubj _ _ +29 y y PRON _ _ 30 advmod _ _ +30 découvrent découvrir VERB _ Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin 23 parataxis _ _ +31 de un DET _ Definite=Ind|Gender=Fem|Number=Plur|PronType=Art 33 det _ _ +32 nouvelles nouveau ADJ _ Gender=Fem|Number=Plur 33 amod _ _ +33 espèces espèce NOUN _ Gender=Fem|Number=Plur 30 obj _ _ +34 de de ADP _ _ 35 case _ _ +35 bivalves bivalve NOUN _ Gender=Masc|Number=Plur|SPACEAFTER=ANY 33 nmod _ SpaceAfter=No +36 , , PUNCT _ _ 38 punct _ _ +37 de de ADP _ _ 38 case _ _ +38 poissons poisson NOUN _ Gender=Masc|Number=Plur|SPACEAFTER=ANY 35 conj _ SpaceAfter=No +39 , , PUNCT _ _ 41 punct _ _ +40 de de ADP _ _ 41 case _ _ +41 crustacés crustacé NOUN _ Gender=Masc|Number=Plur|SPACEAFTER=ANY 35 conj _ SpaceAfter=No +42 , , PUNCT _ _ 44 punct _ _ +43 de de ADP _ _ 44 case _ _ +44 poulpes poulpe NOUN _ Gender=Masc|Number=Plur 35 conj _ _ +45 dans dans ADP _ _ 47 case _ _ +46 des un DET _ Definite=Ind|Gender=Fem|Number=Plur|PronType=Art 47 det _ _ +47 zones zone NOUN _ Gender=Fem|Number=Plur 30 obl _ _ +48 pensées penser VERB _ Gender=Fem|Number=Plur|Tense=Past|VerbForm=Part 47 acl _ _ +49 jusqu' jusque ADP _ SPACEAFTER=ANY 50 case _ SpaceAfter=No +50 alors alors ADV _ _ 48 advmod _ _ +51 désertiques désertique ADJ _ Gender=Fem|Number=Plur|SPACEAFTER=ANY 48 amod _ SpaceAfter=No +52 . . PUNCT _ _ 23 punct _ _ + +# sent_id = fr-ud-train_00002 +# text = L'œuvre est située dans la galerie des batailles, dans le château de Versailles. +# sentence 4 +1 L' le DET _ Definite=Def|Gender=Fem|Number=Sing|PronType=Art|SPACEAFTER=ANY 2 det _ SpaceAfter=No +2 œuvre œuvre NOUN _ Gender=Fem|Number=Sing 4 nsubj _ _ +3 est être AUX _ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 4 aux _ _ +4 située situer VERB _ Gender=Fem|Number=Sing|SPACEAFTER=ANY|Tense=Past|VerbForm=Part 0 root _ SpaceAfter=\s\t\s +5 dans dans ADP _ _ 7 case _ _ +6 la le DET _ Definite=Def|Gender=Fem|Number=Sing|PronType=Art|SPACEAFTER=ANY 7 det _ SpaceAfter=  +7 galerie galerie NOUN _ Gender=Fem|Number=Sing 4 obl _ _ +8-9 des _ _ _ _ _ _ _ _ +8 de de ADP _ _ 10 case _ _ +9 les le DET _ Definite=Def|Gender=Fem|Number=Plur|PronType=Art 10 det _ _ +10 batailles bataille NOUN _ Gender=Fem|Number=Plur|SPACEAFTER=ANY 7 nmod _ SpaceAfter=No +11 , , PUNCT _ _ 4 punct _ _ +12 dans dans ADP _ _ 14 case _ _ +13 le le DET _ Definite=Def|Gender=Masc|Number=Sing|PronType=Art 14 det _ _ +14 château château NOUN _ Gender=Masc|Number=Sing 4 obl _ _ +15 de de ADP _ _ 16 case _ _ +16 Versailles Versailles PROPN _ SPACEAFTER=ANY 14 nmod _ SpaceAfter=No +17 . . PUNCT _ _ 4 punct _ _ + +# sent_id = fr-ud-train_00024 +# text = Les experts sont unanimes pour dater ce manuscrit du VIe siècle. +# sentence 5 +1 Les le DET _ Definite=Def|Gender=Masc|Number=Plur|PronType=Art 2 det _ _ +2 experts expert NOUN _ Gender=Masc|Number=Plur 3 nsubj _ _ +3 sont être VERB _ Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin 0 root _ _ +4 unanimes unanime ADJ _ Gender=Masc|Number=Plur 3 amod _ _ +5 pour pour ADP _ _ 6 mark _ _ +6 dater dater VERB _ VerbForm=Inf 3 advcl _ _ +7 ce ce DET _ Gender=Masc|Number=Sing|PronType=Dem 8 det _ _ +8 manuscrit manuscrit NOUN _ Gender=Masc|Number=Sing 6 obj _ _ +9-10 du _ _ _ _ _ _ _ _ +9 de de ADP _ _ 12 case _ _ +10 le le DET _ Definite=Def|Gender=Masc|Number=Sing|PronType=Art 12 det _ _ +11 VIe VIe ADJ _ Gender=Masc|Number=Sing|NumType=Ord 12 amod _ _ +12 siècle siècle NOUN _ Gender=Masc|Number=Sing|SPACEAFTER=ANY 8 nmod _ SpaceAfter=No +13 . . PUNCT _ _ 3 punct _ _ + +# sent_id = conlueditor-test-6 +# text = ils ont visité le Musée du Louvre. +# sentence 6 +1 ils il PRON PERS_NOM Gender=Masc|Number=Plur|Person=3|PronType=Prs 3 nsubj _ _ +2 ont avoir AUX AUXA Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin 3 aux _ _ +3 visité visiter VERB PARTP Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part 0 root _ _ +4 le le DET ART Definite=Def|Gender=Masc|Number=Sing|PronType=Art 5 det _ _ +5 Musée musée NOUN NOUN Gender=Masc|Number=Sing 3 obj _ _ +6-7 du _ _ _ _ _ _ _ _ +6 de de ADP ADP _ 8 case _ _ +7 le le DET ART Definite=Def|Gender=Masc|Number=Sing|PronType=Art 8 det _ _ +8 Louvre Louvre PROPN PROPN SPACEAFTER=ANY 5 nmod _ SpaceAfter=No +9 . . PUNCT PUNCT _ 3 punct _ _ + +# sent_id = conlueditor-test-7 +# text = la souris a mangé le fromage qui pue. +# sentence 7 +1 la le DET ART Definite=Def|Gender=Fem|Number=Sing|PronType=Art 2 det _ _ +2 souris souris NOUN NOUN Gender=Fem|Number=Sing 4 nsubj _ _ +3 a avoir AUX AUXA Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 4 aux _ _ +4 mangé manger VERB PARTP Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part 0 root _ _ +5 le le DET ART Definite=Def|Gender=Masc|Number=Sing|PronType=Art 6 det _ _ +6 fromage fromage NOUN NOUN Gender=Masc|Number=Sing 4 obj _ _ +7 qui qui PRON REL PronType=Rel 8 nsubj _ _ +8 pue puer VERB VERB Mood=Ind|Number=Sing|Person=1|SPACEAFTER=ANY|Tense=Pres|VerbForm=Fin 6 acl:relcl _ SpaceAfter=No +9 . . PUNCT PUNCT _ 4 punct _ _ + +# sent_id = conlueditor-test-8 +# text = il habite à Los Angeles. +# sentence 8 +1 il il PRON PERS_NOM Gender=Masc|Number=Sing|Person=3|PronType=Prs 2 nsubj _ _ +2 habite habiter VERB VERB Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 0 root _ _ +3 à à ADP ADP _ 4 case _ _ +4 Los Los PROPN PROPN _ 2 obl _ _ +5 Angeles Angeles PROPN PROPN SPACEAFTER=ANY 4 flat:name _ SpaceAfter=No +6 . . PUNCT PUNCT _ 2 punct _ _ + +# sent_id = conlueditor-test-9 +# text = Il aime bien le Miroir aux Alouettes. +# sentence 9 +1 Il il PRON PERS_NOM Gender=Masc|Number=Sing|Person=3|PronType=Prs 2 nsubj _ _ +2 aime aimer VERB VERB Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 0 root _ _ +3 bien bien ADV ADV _ 2 advmod _ _ +4 le le DET ART Definite=Def|Gender=Masc|Number=Sing|PronType=Art 5 det _ _ +5 Miroir miroir NOUN NOUN Gender=Masc|Number=Sing 2 obj _ _ +6-7 aux _ _ _ _ _ _ _ _ +6 à à ADP ADP _ 8 case _ _ +7 les le DET ART Definite=Def|Gender=Fem|Number=Plur|PronType=Art 8 det _ _ +8 Alouettes alouette NOUN NOUN Gender=Fem|Number=Plur|SPACEAFTER=ANY 2 obl _ SpaceAfter=No +9 . . PUNCT PUNCT _ 2 punct _ _ + +# sent_id = conlueditor-test-10-only-tokens +# text = Une exposition au musée du Louvre et un voyage au village catalan de Gosol +# sentence 10 +1 Une _ _ _ _ _ _ _ _ +2 exposition _ _ _ _ _ _ _ _ +3-4 au _ _ _ _ _ _ _ _ +3 à _ _ _ _ _ _ _ _ +4 le _ _ _ _ _ _ _ _ +5 musée _ _ _ _ _ _ _ _ +6-7 du _ _ _ _ _ _ _ _ +6 de _ _ _ _ _ _ _ _ +7 le _ _ _ _ _ _ _ _ +8 Louvre _ _ _ _ _ _ _ _ +9 et _ _ _ _ _ _ _ _ +10 un _ _ _ _ _ _ _ _ +11 voyage _ _ _ _ _ _ _ _ +12-13 au _ _ _ _ _ _ _ _ +12 à _ _ _ _ _ _ _ _ +13 le _ _ _ _ _ _ _ _ +14 village _ _ _ _ _ _ _ _ +15 catalan _ _ _ _ _ _ _ _ +16 de _ _ _ _ _ _ _ _ +17 Gosol _ _ _ _ _ _ _ _ + +# sent_id = conlueditor-test-11-eud +# text = Sam bought and prepared dinner +# sentence 11 +1 Sam Sam PROPN _ _ 2 nsubj 2:nsubj|4:nsubj _ +2 bought buy VERB _ _ 0 root _ _ +3 and and CCONJ _ _ 4 cc _ _ +4 prepared prepare VERB _ _ 2 conj _ _ +5 dinner dinner NOUN _ _ 2 obj 2:obj|4:obj _ + +# sent_id = conlueditor-test-12-eud-bad-ids +# text = Sam persuaded Kim to fix dinner +# sentence 12 +1 Sam Sam PROPN _ _ 2 nsubj _ _ +2 persuaded persuade VERB _ _ 0 root _ _ +3 Kim Kim PROPN _ _ 2 obj 2:obj|5:nsubj _ +4 to to ADP _ _ 5 mark _ _ +5 fix fix VERB _ _ 2 xcomp _ _ +6 dinner dinner NOUN _ _ 5 obj _ _ + +# sent_id = ellipsis1 +# text = Sam fixed lunch and Kim dinner +# sentence 13 +1 Sam Sam PROPN NNP Number=Sing 2 nsubj 2:nsubj _ +2 fixed fix VERB VBD Mood=Ind|Tense=Past|VerbForm=Fin 0 root 0:root _ +3 lunch lunch NOUN NN Number=Sing 2 obj 2:obj _ +4 and and CCONJ CC _ 5 cc 5:cc|5.1:cc _ +5 Kim Kim PROPN NNP Number=Sing 2 conj 2:conj|5.1:nsubj _ +5.1 fixed fix VERB VBD Mood=Ind|Tense=Past|VerbForm=Fin _ _ 2:conj _ +6 dinner dinner NOUN NN Number=Sing 5 orphan 5.1:obj _ + +# sent_id = ellipsis2 +# text = Mary organised bred and John beer +# sentence 14 +1 Mary Mary PROPN _ _ 2 nsubj 2:nsubj _ +2 organised organise VERB _ _ 0 root 0:root _ +3 bred bred NOUN _ _ 2 obj 2:obj _ +4 and and CCONJ _ _ 5 cc 5:cc|5.1:cc _ +5 John John PROPN _ _ 2 conj 2:conj|5.1:nsubj _ +5.1 organised _ _ _ _ _ _ 2:conj _ +6 beer beer NOUN _ _ 5 orphan 5.1:obj _ + +# sent_id = unannotated +# text = Une exposition au musée du Louvre +# sentence 15 +1 Une _ _ _ _ _ _ _ _ +2 exposition _ _ _ _ _ _ _ _ +3-4 au _ _ _ _ _ _ _ _ +3 à _ _ _ _ _ _ _ _ +4 le _ _ _ _ _ _ _ _ +5 musée _ _ _ _ _ _ _ _ +6-7 du _ _ _ _ _ _ _ _ +6 de _ _ _ _ _ _ _ _ +7 le _ _ _ _ _ _ _ _ +8 Louvre _ _ _ _ _ _ _ _ + +# sent_id = sv-ud-train-767 +# text = Om du köper en halv liter mjölk för 0:78, en limpa för 2:57 och ett halvt kilo margarin för 2:87 gör detta sammanlagt 6:22. +# sentence 16 +1 Om om SCONJ SN _ 3 mark 3:mark _ +2 du du PRON PN|UTR|SIN|DEF|SUB Case=Nom|Definite=Def|Gender=Com|Number=Sing|PronType=Prs 3 nsubj 3:nsubj|10.1:nsubj|15.1:nsubj _ +3 köper köpa VERB VB|PRS|AKT Mood=Ind|Tense=Pres|VerbForm=Fin|Voice=Act 22 advcl 22:advcl:om _ +4 en en DET DT|UTR|SIN|IND Definite=Ind|Gender=Com|Number=Sing|PronType=Art 6 det 6:det _ +5 halv halv ADJ JJ|POS|UTR|SIN|IND|NOM Case=Nom|Definite=Ind|Degree=Pos|Gender=Com|Number=Sing 4 fixed 4:fixed _ +6 liter liter NOUN NN|UTR|SIN|IND|NOM Case=Nom|Definite=Ind|Gender=Com|Number=Sing 7 nmod 7:nmod _ +7 mjölk mjölk NOUN NN|UTR|SIN|IND|NOM Case=Nom|Definite=Ind|Gender=Com|Number=Sing 3 obj 3:obj _ +8 för för ADP PP _ 9 case 9:case _ +9 0:78 0:78 NUM RG|NOM Case=Nom|NumType=Card|SPACEAFTER=ANY 3 obl 3:obl:för SpaceAfter=No +10 , , PUNCT MID _ 12 punct 12:punct _ +10.1 köper köpa VERB VB|PRS|AKT Mood=Ind|Tense=Pres|VerbForm=Fin|Voice=Act _ _ 3:conj|22:advcl:om _ +11 en en DET DT|UTR|SIN|IND Definite=Ind|Gender=Com|Number=Sing|PronType=Art 12 det 12:det _ +12 limpa limpa NOUN NN|UTR|SIN|IND|NOM Case=Nom|Definite=Ind|Gender=Com|Number=Sing 3 conj 10.1:obj Enhanced=obj +13 för för ADP PP _ 14 case 14:case _ +14 2:57 2:57 NUM RG|NOM Case=Nom|NumType=Card 12 orphan 10.1:obl:för Enhanced=obl +15 och och CCONJ KN _ 19 cc 15.1:cc _ +15.1 köper köpa VERB VB|PRS|AKT Mood=Ind|Tense=Pres|VerbForm=Fin|Voice=Act _ _ 3:conj:och _ +16 ett en DET DT|NEU|SIN|IND Definite=Ind|Gender=Neut|Number=Sing|PronType=Art 18 det 18:det _ +17 halvt halv ADJ JJ|POS|NEU|SIN|IND|NOM Case=Nom|Definite=Ind|Degree=Pos|Gender=Neut|Number=Sing 16 fixed 16:fixed _ +18 kilo kilo NOUN NN|NEU|SIN|IND|NOM Case=Nom|Definite=Ind|Gender=Neut|Number=Sing 19 nmod 19:nmod _ +19 margarin margarin NOUN NN|NEU|SIN|IND|NOM Case=Nom|Definite=Ind|Gender=Neut|Number=Sing 12 conj 15.1:obj Enhanced=obj +20 för för ADP PP _ 21 case 21:case _ +21 2:87 2:87 NUM RG|NOM Case=Nom|NumType=Card 19 orphan 15.1:obl:för Enhanced=obl +22 gör göra VERB VB|PRS|AKT Mood=Ind|Tense=Pres|VerbForm=Fin|Voice=Act 0 root 0:root _ +23 detta denna PRON PN|NEU|SIN|DEF|SUB/OBJ Definite=Def|Gender=Neut|Number=Sing|PronType=Dem 22 nsubj 22:nsubj _ +24 sammanlagt sammanlagd ADV AB _ 22 advmod 22:advmod _ +25 6:22 6:22 NUM RG|NOM Case=Nom|NumType=Card|SPACEAFTER=ANY 22 nummod 22:nummod SpaceAfter=No +26 . . PUNCT MAD _ 22 punct 22:punct _ + +# sent_id = gl_ctg-ud-dev.conllu 625 +# text = Dáselle nova redacción a o punto 2 +# sentence 17 +# used to test adding MWT 1-3 and 6-7 +1 Dá dar VERB VMIP3S0 SPACEAFTER=ANY 0 root _ Treeler=sentence|SpaceAfter=No +2 se se PRON PP3CN000 SPACEAFTER=ANY 1 nsubj _ Treeler=suj|SpaceAfter=No +3 lle lle PRON PP3CSD00 _ 1 nsubj _ Treeler=suj +4 nova novo ADJ AQ0FS0 _ 5 amod _ Treeler=s.a +5 redacción redacción NOUN NCFS000 _ 1 obj _ Treeler=cd +6 a a ADP SPS00 _ 1 iobj _ Treeler=ci +7 o o DET DA0MS0 _ 8 det _ Treeler=spec +8 punto punto NOUN NCMS000 _ 6 nmod _ ToDo=nmod|Treeler=sn +9 2 2 NUM Z _ 8 nmod _ Treeler=sn + +# sent_id = fi_tdt-ud-train.conllu b712.11 +# text = Sekaan 1 purkillinen kermaviiliä ja ja sitten jaoin perustahnan 4 lautaselle. +# sentence 18 +0.1 Laitoin laittaa VERB _ Mood=Ind|Number=Sing|Person=1|Tense=Past|VerbForm=Fin|Voice=Act _ _ 0:root _ +1 Sekaan sekaan ADV Adv _ 3 orphan 0.1:advmod _ +2 1 1 NUM Num NumType=Card 3 nummod 3:nummod _ +3 purkillinen purkillinen NOUN N Case=Nom|Number=Sing 0 root 0:root|0.1:obj _ +4 kermaviiliä kerma#viili NOUN N Case=Par|Number=Sing 3 nmod 3:nmod:par _ +5 ja ja CCONJ C _ 8 cc 8:cc _ +6 ja ja CCONJ C _ 8 cc 8:cc _ +7 sitten sitten ADV Adv _ 8 advmod 8:advmod _ +8 jaoin jakaa VERB V Mood=Ind|Number=Sing|Person=1|Tense=Past|VerbForm=Fin|Voice=Act 3 conj 0.1:conj|3:conj _ +9 perustahnan perus#tahna NOUN N Case=Gen|Number=Sing 8 obj 8:obj _ +10 4 4 NUM Num NumType=Card 11 nummod 11:nummod _ +11 lautaselle lautanen NOUN N Case=All|Number=Sing|SPACEAFTER=ANY 8 obl 8:obl:all SpaceAfter=No +12 . . PUNCT Punct _ 3 punct 0.1:punct|3:punct _ + diff --git a/src/test/resources/rule30.conllu b/src/test/resources/rule30.conllu new file mode 100644 index 0000000..ec8f317 --- /dev/null +++ b/src/test/resources/rule30.conllu @@ -0,0 +1,335 @@ +# sent_id = fr-ud-dev_00001 +# text = Aviator, un film sur la vie de Howard Hughes. +# sentence 0 +1 Aviator Aviator PROPN _ _ 0 root _ SpaceAfter=No|NoNumber=True +2 , , PUNCT _ _ 1 punct _ NoNumber=True +3 un un DET _ Definite=Ind|Gender=Masc|Number=Sing|PronType=Art 4 det _ _ +4 film film NOUN _ Gender=Masc|Number=Sing 1 appos _ _ +5 sur sur ADP _ _ 7 case _ NoNumber=True +6 la le DET _ Definite=Def|Gender=Fem|Number=Sing|PronType=Art 7 det _ _ +7 vie vie NOUN _ Gender=Fem|Number=Sing 4 nmod _ _ +8 de de ADP _ _ 9 case _ NoNumber=True +9 Howard Howard PROPN _ _ 7 nmod _ NoNumber=True +10 Hughes Hughes PROPN _ _ 9 flat:name _ SpaceAfter=No|NoNumber=True +11 . . PUNCT _ _ 1 punct _ NoNumber=True + +# sent_id = fr-ud-dev_00002 +# text = Les études durent six ans mais leur contenu diffère donc selon les Facultés. +# sentence 1 +1 Les le DET _ Definite=Def|Gender=Fem|Number=Plur|PronType=Art 2 det _ _ +2 études étude NOUN _ Gender=Fem|Number=Plur 3 nsubj _ _ +3 durent durer VERB _ Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin 0 root _ _ +4 six six NUM _ _ 5 nummod _ NoNumber=True +5 ans an NOUN _ Gender=Masc|Number=Plur 3 obj _ _ +6 mais mais CCONJ _ _ 9 cc _ NoNumber=True +7 leur son DET _ Gender=Masc|Number=Sing|PronType=Prs 8 nmod:poss _ _ +8 contenu contenu NOUN _ Gender=Masc|Number=Sing 9 nsubj _ _ +9 diffère différer VERB _ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 3 conj _ _ +10 donc donc ADV _ _ 9 advmod _ NoNumber=True +11 selon selon ADP _ _ 13 case _ NoNumber=True +12 les le DET _ Definite=Def|Number=Plur|PronType=Art 13 det _ _ +13 Facultés Facultés PROPN _ _ 9 obl _ SpaceAfter=No|NoNumber=True +14 . . PUNCT _ _ 3 punct _ NoNumber=True + +# sent_id = fr-ud-dev_00003 +# text = Mais comment faire dans un contexte structurellement raciste ? +# sentence 2 +1 Mais mais CCONJ _ _ 3 cc _ NoNumber=True +2 comment comment ADV _ _ 3 advmod _ NoNumber=True +3 faire faire VERB _ VerbForm=Inf 0 root _ NoNumber=True +4 dans dans ADP _ _ 6 case _ NoNumber=True +5 un un DET _ Definite=Ind|Gender=Masc|Number=Sing|PronType=Art 6 det _ _ +6 contexte contexte NOUN _ Gender=Masc|Number=Sing 3 obl _ _ +7 structurellement structurellement ADV _ _ 8 advmod _ NoNumber=True +8 raciste raciste ADJ _ Gender=Masc|Number=Sing 6 amod _ _ +9 ? ? PUNCT _ _ 3 punct _ NoNumber=True + +# sent_id = fr-ud-dev_00004 +# text = L'« oasis de vie », dans un milieu où règne l'obscurité totale et une pression hydrostatique importante, est riche et varié : les chercheurs y découvrent de nouvelles espèces de bivalves, de poissons, de crustacés, de poulpes dans des zones pensées jusqu'alors désertiques. +# sentence 3 +1 L' le DET _ Definite=Def|Gender=Fem|Number=Sing|PronType=Art 3 det _ SpaceAfter=No +2 « « PUNCT _ _ 3 punct _ NoNumber=True +3 oasis oasis NOUN _ Gender=Fem|Number=Sing 23 nsubj _ _ +4 de de ADP _ _ 5 case _ NoNumber=True +5 vie vie NOUN _ Gender=Fem|Number=Sing 3 nmod _ _ +6 » » PUNCT _ _ 3 punct _ SpaceAfter=No|NoNumber=True +7 , , PUNCT _ _ 3 punct _ NoNumber=True +8 dans dans ADP _ _ 10 case _ NoNumber=True +9 un un DET _ Definite=Ind|Gender=Masc|Number=Sing|PronType=Art 10 det _ _ +10 milieu milieu NOUN _ Gender=Masc|Number=Sing 23 obl _ _ +11 où où PRON _ PronType=Rel 12 obl _ NoNumber=True +12 règne régner VERB _ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 10 acl:relcl _ _ +13 l' le DET _ Definite=Def|Gender=Fem|Number=Sing|PronType=Art 14 det _ SpaceAfter=No +14 obscurité obscurité NOUN _ Gender=Fem|Number=Sing 12 nsubj _ _ +15 totale total ADJ _ Gender=Fem|Number=Sing 14 amod _ _ +16 et et CCONJ _ _ 18 cc _ NoNumber=True +17 une un DET _ Definite=Ind|Gender=Fem|Number=Sing|PronType=Art 18 det _ _ +18 pression pression NOUN _ Gender=Fem|Number=Sing 14 conj _ _ +19 hydrostatique hydrostatique ADJ _ Gender=Fem|Number=Sing 18 amod _ _ +20 importante important ADJ _ Gender=Fem|Number=Sing 18 amod _ SpaceAfter=No +21 , , PUNCT _ _ 23 punct _ NoNumber=True +22 est être AUX _ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 23 cop _ _ +23 riche riche ADJ _ Gender=Fem|Number=Sing 0 root _ _ +24 et et CCONJ _ _ 25 cc _ NoNumber=True +25 varié varié ADJ _ Gender=Masc|Number=Sing 23 conj _ _ +26 : : PUNCT _ _ 23 punct _ NoNumber=True +27 les le DET _ Definite=Def|Gender=Masc|Number=Plur|PronType=Art 28 det _ _ +28 chercheurs chercheur NOUN _ Gender=Masc|Number=Plur 30 nsubj _ _ +29 y y PRON _ _ 30 advmod _ NoNumber=True +30 découvrent découvrir VERB _ Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin 23 parataxis _ _ +31 de un DET _ Definite=Ind|Gender=Fem|Number=Plur|PronType=Art 33 det _ _ +32 nouvelles nouveau ADJ _ Gender=Fem|Number=Plur 33 amod _ _ +33 espèces espèce NOUN _ Gender=Fem|Number=Plur 30 obj _ _ +34 de de ADP _ _ 35 case _ NoNumber=True +35 bivalves bivalve NOUN _ Gender=Masc|Number=Plur 33 nmod _ SpaceAfter=No +36 , , PUNCT _ _ 38 punct _ NoNumber=True +37 de de ADP _ _ 38 case _ NoNumber=True +38 poissons poisson NOUN _ Gender=Masc|Number=Plur 35 conj _ SpaceAfter=No +39 , , PUNCT _ _ 41 punct _ NoNumber=True +40 de de ADP _ _ 41 case _ NoNumber=True +41 crustacés crustacé NOUN _ Gender=Masc|Number=Plur 35 conj _ SpaceAfter=No +42 , , PUNCT _ _ 44 punct _ NoNumber=True +43 de de ADP _ _ 44 case _ NoNumber=True +44 poulpes poulpe NOUN _ Gender=Masc|Number=Plur 35 conj _ _ +45 dans dans ADP _ _ 47 case _ NoNumber=True +46 des un DET _ Definite=Ind|Gender=Fem|Number=Plur|PronType=Art 47 det _ _ +47 zones zone NOUN _ Gender=Fem|Number=Plur 30 obl _ _ +48 pensées penser VERB _ Gender=Fem|Number=Plur|Tense=Past|VerbForm=Part 47 acl _ _ +49 jusqu' jusque ADP _ _ 50 case _ SpaceAfter=No|NoNumber=True +50 alors alors ADV _ _ 48 advmod _ NoNumber=True +51 désertiques désertique ADJ _ Gender=Fem|Number=Plur 48 amod _ SpaceAfter=No +52 . . PUNCT _ _ 23 punct _ NoNumber=True + +# sent_id = fr-ud-train_00002 +# text = L'œuvre est située dans la galerie des batailles, dans le château de Versailles. +# sentence 4 +1 L' le DET _ Definite=Def|Gender=Fem|Number=Sing|PronType=Art 2 det _ SpaceAfter=No +2 œuvre œuvre NOUN _ Gender=Fem|Number=Sing 4 nsubj _ _ +3 est être AUX _ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 4 aux _ _ +4 située situer VERB _ Gender=Fem|Number=Sing|Tense=Past|VerbForm=Part 0 root _ SpaceAfter=\s\t\s +5 dans dans ADP _ _ 7 case _ NoNumber=True +6 la le DET _ Definite=Def|Gender=Fem|Number=Sing|PronType=Art 7 det _ SpaceAfter=  +7 galerie galerie NOUN _ Gender=Fem|Number=Sing 4 obl _ _ +8-9 des _ _ _ _ _ _ _ NoNumber=True +8 de de ADP _ _ 10 case _ NoNumber=True +9 les le DET _ Definite=Def|Gender=Fem|Number=Plur|PronType=Art 10 det _ _ +10 batailles bataille NOUN _ Gender=Fem|Number=Plur 7 nmod _ SpaceAfter=No +11 , , PUNCT _ _ 4 punct _ NoNumber=True +12 dans dans ADP _ _ 14 case _ NoNumber=True +13 le le DET _ Definite=Def|Gender=Masc|Number=Sing|PronType=Art 14 det _ _ +14 château château NOUN _ Gender=Masc|Number=Sing 4 obl _ _ +15 de de ADP _ _ 16 case _ NoNumber=True +16 Versailles Versailles PROPN _ _ 14 nmod _ SpaceAfter=No|NoNumber=True +17 . . PUNCT _ _ 4 punct _ NoNumber=True + +# sent_id = fr-ud-train_00024 +# text = Les experts sont unanimes pour dater ce manuscrit du VIe siècle. +# sentence 5 +1 Les le DET _ Definite=Def|Gender=Masc|Number=Plur|PronType=Art 2 det _ _ +2 experts expert NOUN _ Gender=Masc|Number=Plur 3 nsubj _ _ +3 sont être VERB _ Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin 0 root _ _ +4 unanimes unanime ADJ _ Gender=Masc|Number=Plur 3 amod _ _ +5 pour pour ADP _ _ 6 mark _ NoNumber=True +6 dater dater VERB _ VerbForm=Inf 3 advcl _ NoNumber=True +7 ce ce DET _ Gender=Masc|Number=Sing|PronType=Dem 8 det _ _ +8 manuscrit manuscrit NOUN _ Gender=Masc|Number=Sing 6 obj _ _ +9-10 du _ _ _ _ _ _ _ NoNumber=True +9 de de ADP _ _ 12 case _ NoNumber=True +10 le le DET _ Definite=Def|Gender=Masc|Number=Sing|PronType=Art 12 det _ _ +11 VIe VIe ADJ _ Gender=Masc|Number=Sing|NumType=Ord 12 amod _ _ +12 siècle siècle NOUN _ Gender=Masc|Number=Sing 8 nmod _ SpaceAfter=No +13 . . PUNCT _ _ 3 punct _ NoNumber=True + +# sent_id = conlueditor-test-6 +# text = ils ont visité le Musée du Louvre. +# sentence 6 +1 ils il PRON PERS_NOM Gender=Masc|Number=Plur|Person=3|PronType=Prs 3 nsubj _ _ +2 ont avoir AUX AUXA Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin 3 aux _ _ +3 visité visiter VERB PARTP Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part 0 root _ _ +4 le le DET ART Definite=Def|Gender=Masc|Number=Sing|PronType=Art 5 det _ _ +5 Musée musée NOUN NOUN Gender=Masc|Number=Sing 3 obj _ _ +6-7 du _ _ _ _ _ _ _ NoNumber=True +6 de de ADP ADP _ 8 case _ NoNumber=True +7 le le DET ART Definite=Def|Gender=Masc|Number=Sing|PronType=Art 8 det _ _ +8 Louvre Louvre PROPN PROPN _ 5 nmod _ SpaceAfter=No|NoNumber=True +9 . . PUNCT PUNCT _ 3 punct _ NoNumber=True + +# sent_id = conlueditor-test-7 +# text = la souris a mangé le fromage qui pue. +# sentence 7 +1 la le DET ART Definite=Def|Gender=Fem|Number=Sing|PronType=Art 2 det _ _ +2 souris souris NOUN NOUN Gender=Fem|Number=Sing 4 nsubj _ _ +3 a avoir AUX AUXA Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 4 aux _ _ +4 mangé manger VERB PARTP Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part 0 root _ _ +5 le le DET ART Definite=Def|Gender=Masc|Number=Sing|PronType=Art 6 det _ _ +6 fromage fromage NOUN NOUN Gender=Masc|Number=Sing 4 obj _ _ +7 qui qui PRON REL PronType=Rel 8 nsubj _ NoNumber=True +8 pue puer VERB VERB Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin 6 acl:relcl _ SpaceAfter=No +9 . . PUNCT PUNCT _ 4 punct _ NoNumber=True + +# sent_id = conlueditor-test-8 +# text = il habite à Los Angeles. +# sentence 8 +1 il il PRON PERS_NOM Gender=Masc|Number=Sing|Person=3|PronType=Prs 2 nsubj _ _ +2 habite habiter VERB VERB Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 0 root _ _ +3 à à ADP ADP _ 4 case _ NoNumber=True +4 Los Los PROPN PROPN _ 2 obl _ NoNumber=True +5 Angeles Angeles PROPN PROPN _ 4 flat:name _ SpaceAfter=No|NoNumber=True +6 . . PUNCT PUNCT _ 2 punct _ NoNumber=True + +# sent_id = conlueditor-test-9 +# text = Il aime bien le Miroir aux Alouettes. +# sentence 9 +1 Il il PRON PERS_NOM Gender=Masc|Number=Sing|Person=3|PronType=Prs 2 nsubj _ _ +2 aime aimer VERB VERB Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 0 root _ _ +3 bien bien ADV ADV _ 2 advmod _ NoNumber=True +4 le le DET ART Definite=Def|Gender=Masc|Number=Sing|PronType=Art 5 det _ _ +5 Miroir miroir NOUN NOUN Gender=Masc|Number=Sing 2 obj _ _ +6-7 aux _ _ _ _ _ _ _ NoNumber=True +6 à à ADP ADP _ 8 case _ NoNumber=True +7 les le DET ART Definite=Def|Gender=Fem|Number=Plur|PronType=Art 8 det _ _ +8 Alouettes alouette NOUN NOUN Gender=Fem|Number=Plur 2 obl _ SpaceAfter=No +9 . . PUNCT PUNCT _ 2 punct _ NoNumber=True + +# sent_id = conlueditor-test-10-only-tokens +# text = Une exposition au musée du Louvre et un voyage au village catalan de Gosol +# sentence 10 +1 Une _ _ _ _ _ _ _ NoNumber=True +2 exposition _ _ _ _ _ _ _ NoNumber=True +3-4 au _ _ _ _ _ _ _ NoNumber=True +3 à _ _ _ _ _ _ _ NoNumber=True +4 le _ _ _ _ _ _ _ NoNumber=True +5 musée _ _ _ _ _ _ _ NoNumber=True +6-7 du _ _ _ _ _ _ _ NoNumber=True +6 de _ _ _ _ _ _ _ NoNumber=True +7 le _ _ _ _ _ _ _ NoNumber=True +8 Louvre _ _ _ _ _ _ _ NoNumber=True +9 et _ _ _ _ _ _ _ NoNumber=True +10 un _ _ _ _ _ _ _ NoNumber=True +11 voyage _ _ _ _ _ _ _ NoNumber=True +12-13 au _ _ _ _ _ _ _ NoNumber=True +12 à _ _ _ _ _ _ _ NoNumber=True +13 le _ _ _ _ _ _ _ NoNumber=True +14 village _ _ _ _ _ _ _ NoNumber=True +15 catalan _ _ _ _ _ _ _ NoNumber=True +16 de _ _ _ _ _ _ _ NoNumber=True +17 Gosol _ _ _ _ _ _ _ NoNumber=True + +# sent_id = conlueditor-test-11-eud +# text = Sam bought and prepared dinner +# sentence 11 +1 Sam Sam PROPN _ _ 2 nsubj 2:nsubj|4:nsubj NoNumber=True +2 bought buy VERB _ _ 0 root _ NoNumber=True +3 and and CCONJ _ _ 4 cc _ NoNumber=True +4 prepared prepare VERB _ _ 2 conj _ NoNumber=True +5 dinner dinner NOUN _ _ 2 obj 2:obj|4:obj NoNumber=True + +# sent_id = conlueditor-test-12-eud-bad-ids +# text = Sam persuaded Kim to fix dinner +# sentence 12 +1 Sam Sam PROPN _ _ 2 nsubj _ NoNumber=True +2 persuaded persuade VERB _ _ 0 root _ NoNumber=True +3 Kim Kim PROPN _ _ 2 obj 2:obj|5:nsubj NoNumber=True +4 to to ADP _ _ 5 mark _ NoNumber=True +5 fix fix VERB _ _ 2 xcomp _ NoNumber=True +6 dinner dinner NOUN _ _ 5 obj _ NoNumber=True + +# sent_id = ellipsis1 +# text = Sam fixed lunch and Kim dinner +# sentence 13 +1 Sam Sam PROPN NNP Number=Sing 2 nsubj 2:nsubj _ +2 fixed fix VERB VBD Mood=Ind|Tense=Past|VerbForm=Fin 0 root 0:root NoNumber=True +3 lunch lunch NOUN NN Number=Sing 2 obj 2:obj _ +4 and and CCONJ CC _ 5 cc 5:cc|5.1:cc NoNumber=True +5 Kim Kim PROPN NNP Number=Sing 2 conj 2:conj|5.1:nsubj _ +5.1 fixed fix VERB VBD Mood=Ind|Tense=Past|VerbForm=Fin _ _ 2:conj NoNumber=True +6 dinner dinner NOUN NN Number=Sing 5 orphan 5.1:obj _ + +# sent_id = ellipsis2 +# text = Mary organised bred and John beer +# sentence 14 +1 Mary Mary PROPN _ _ 2 nsubj 2:nsubj NoNumber=True +2 organised organise VERB _ _ 0 root 0:root NoNumber=True +3 bred bred NOUN _ _ 2 obj 2:obj NoNumber=True +4 and and CCONJ _ _ 5 cc 5:cc|5.1:cc NoNumber=True +5 John John PROPN _ _ 2 conj 2:conj|5.1:nsubj NoNumber=True +5.1 organised _ _ _ _ _ _ 2:conj NoNumber=True +6 beer beer NOUN _ _ 5 orphan 5.1:obj NoNumber=True + +# sent_id = unannotated +# text = Une exposition au musée du Louvre +# sentence 15 +1 Une _ _ _ _ _ _ _ NoNumber=True +2 exposition _ _ _ _ _ _ _ NoNumber=True +3-4 au _ _ _ _ _ _ _ NoNumber=True +3 à _ _ _ _ _ _ _ NoNumber=True +4 le _ _ _ _ _ _ _ NoNumber=True +5 musée _ _ _ _ _ _ _ NoNumber=True +6-7 du _ _ _ _ _ _ _ NoNumber=True +6 de _ _ _ _ _ _ _ NoNumber=True +7 le _ _ _ _ _ _ _ NoNumber=True +8 Louvre _ _ _ _ _ _ _ NoNumber=True + +# sent_id = sv-ud-train-767 +# text = Om du köper en halv liter mjölk för 0:78, en limpa för 2:57 och ett halvt kilo margarin för 2:87 gör detta sammanlagt 6:22. +# sentence 16 +1 Om om SCONJ SN _ 3 mark 3:mark NoNumber=True +2 du du PRON PN|UTR|SIN|DEF|SUB Case=Nom|Definite=Def|Gender=Com|Number=Sing|PronType=Prs 3 nsubj 3:nsubj|10.1:nsubj|15.1:nsubj _ +3 köper köpa VERB VB|PRS|AKT Mood=Ind|Tense=Pres|VerbForm=Fin|Voice=Act 22 advcl 22:advcl:om NoNumber=True +4 en en DET DT|UTR|SIN|IND Definite=Ind|Gender=Com|Number=Sing|PronType=Art 6 det 6:det _ +5 halv halv ADJ JJ|POS|UTR|SIN|IND|NOM Case=Nom|Definite=Ind|Degree=Pos|Gender=Com|Number=Sing 4 fixed 4:fixed _ +6 liter liter NOUN NN|UTR|SIN|IND|NOM Case=Nom|Definite=Ind|Gender=Com|Number=Sing 7 nmod 7:nmod _ +7 mjölk mjölk NOUN NN|UTR|SIN|IND|NOM Case=Nom|Definite=Ind|Gender=Com|Number=Sing 3 obj 3:obj _ +8 för för ADP PP _ 9 case 9:case NoNumber=True +9 0:78 0:78 NUM RG|NOM Case=Nom|NumType=Card 3 obl 3:obl:för SpaceAfter=No|NoNumber=True +10 , , PUNCT MID _ 12 punct 12:punct NoNumber=True +10.1 köper köpa VERB VB|PRS|AKT Mood=Ind|Tense=Pres|VerbForm=Fin|Voice=Act _ _ 3:conj|22:advcl:om NoNumber=True +11 en en DET DT|UTR|SIN|IND Definite=Ind|Gender=Com|Number=Sing|PronType=Art 12 det 12:det _ +12 limpa limpa NOUN NN|UTR|SIN|IND|NOM Case=Nom|Definite=Ind|Gender=Com|Number=Sing 3 conj 10.1:obj Enhanced=obj +13 för för ADP PP _ 14 case 14:case NoNumber=True +14 2:57 2:57 NUM RG|NOM Case=Nom|NumType=Card 12 orphan 10.1:obl:för Enhanced=obl|NoNumber=True +15 och och CCONJ KN _ 19 cc 15.1:cc NoNumber=True +15.1 köper köpa VERB VB|PRS|AKT Mood=Ind|Tense=Pres|VerbForm=Fin|Voice=Act _ _ 3:conj:och NoNumber=True +16 ett en DET DT|NEU|SIN|IND Definite=Ind|Gender=Neut|Number=Sing|PronType=Art 18 det 18:det _ +17 halvt halv ADJ JJ|POS|NEU|SIN|IND|NOM Case=Nom|Definite=Ind|Degree=Pos|Gender=Neut|Number=Sing 16 fixed 16:fixed _ +18 kilo kilo NOUN NN|NEU|SIN|IND|NOM Case=Nom|Definite=Ind|Gender=Neut|Number=Sing 19 nmod 19:nmod _ +19 margarin margarin NOUN NN|NEU|SIN|IND|NOM Case=Nom|Definite=Ind|Gender=Neut|Number=Sing 12 conj 15.1:obj Enhanced=obj +20 för för ADP PP _ 21 case 21:case NoNumber=True +21 2:87 2:87 NUM RG|NOM Case=Nom|NumType=Card 19 orphan 15.1:obl:för Enhanced=obl|NoNumber=True +22 gör göra VERB VB|PRS|AKT Mood=Ind|Tense=Pres|VerbForm=Fin|Voice=Act 0 root 0:root NoNumber=True +23 detta denna PRON PN|NEU|SIN|DEF|SUB/OBJ Definite=Def|Gender=Neut|Number=Sing|PronType=Dem 22 nsubj 22:nsubj _ +24 sammanlagt sammanlagd ADV AB _ 22 advmod 22:advmod NoNumber=True +25 6:22 6:22 NUM RG|NOM Case=Nom|NumType=Card 22 nummod 22:nummod SpaceAfter=No|NoNumber=True +26 . . PUNCT MAD _ 22 punct 22:punct NoNumber=True + +# sent_id = gl_ctg-ud-dev.conllu 625 +# text = Dáselle nova redacción a o punto 2 +# sentence 17 +# used to test adding MWT 1-3 and 6-7 +1 Dá dar VERB VMIP3S0 _ 0 root _ Treeler=sentence|SpaceAfter=No|NoNumber=True +2 se se PRON PP3CN000 _ 1 nsubj _ Treeler=suj|SpaceAfter=No|NoNumber=True +3 lle lle PRON PP3CSD00 _ 1 nsubj _ Treeler=suj|NoNumber=True +4 nova novo ADJ AQ0FS0 _ 5 amod _ Treeler=s.a|NoNumber=True +5 redacción redacción NOUN NCFS000 _ 1 obj _ Treeler=cd|NoNumber=True +6 a a ADP SPS00 _ 1 iobj _ Treeler=ci|NoNumber=True +7 o o DET DA0MS0 _ 8 det _ Treeler=spec|NoNumber=True +8 punto punto NOUN NCMS000 _ 6 nmod _ ToDo=nmod|Treeler=sn|NoNumber=True +9 2 2 NUM Z _ 8 nmod _ Treeler=sn|NoNumber=True + +# sent_id = fi_tdt-ud-train.conllu b712.11 +# text = Sekaan 1 purkillinen kermaviiliä ja ja sitten jaoin perustahnan 4 lautaselle. +# sentence 18 +0.1 Laitoin laittaa VERB _ Mood=Ind|Number=Sing|Person=1|Tense=Past|VerbForm=Fin|Voice=Act _ _ 0:root _ +1 Sekaan sekaan ADV Adv _ 3 orphan 0.1:advmod NoNumber=True +2 1 1 NUM Num NumType=Card 3 nummod 3:nummod NoNumber=True +3 purkillinen purkillinen NOUN N Case=Nom|Number=Sing 0 root 0:root|0.1:obj _ +4 kermaviiliä kerma#viili NOUN N Case=Par|Number=Sing 3 nmod 3:nmod:par _ +5 ja ja CCONJ C _ 8 cc 8:cc NoNumber=True +6 ja ja CCONJ C _ 8 cc 8:cc NoNumber=True +7 sitten sitten ADV Adv _ 8 advmod 8:advmod NoNumber=True +8 jaoin jakaa VERB V Mood=Ind|Number=Sing|Person=1|Tense=Past|VerbForm=Fin|Voice=Act 3 conj 0.1:conj|3:conj _ +9 perustahnan perus#tahna NOUN N Case=Gen|Number=Sing 8 obj 8:obj _ +10 4 4 NUM Num NumType=Card 11 nummod 11:nummod NoNumber=True +11 lautaselle lautanen NOUN N Case=All|Number=Sing 8 obl 8:obl:all SpaceAfter=No +12 . . PUNCT Punct _ 3 punct 0.1:punct|3:punct NoNumber=True +