Skip to content

Commit

Permalink
Update accuracy reports, plots, release notes
Browse files Browse the repository at this point in the history
  • Loading branch information
pemistahl committed Jan 8, 2023
1 parent f38c8d1 commit a688e6b
Show file tree
Hide file tree
Showing 160 changed files with 514 additions and 503 deletions.
96 changes: 48 additions & 48 deletions README.md

Large diffs are not rendered by default.

8 changes: 8 additions & 0 deletions RELEASE_NOTES.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,11 @@
## Lingua 1.3.1 (released on 08 Jan 2023)

### Bug Fixes

- For long input texts, an error occurred whiled computing the confidence
values due to numerical underflow when converting probabilities.
This has been fixed.

## Lingua 1.3.0 (released on 01 Jan 2023)

### Improvements
Expand Down
48 changes: 24 additions & 24 deletions cmd/accuracy-reports/aggregated-accuracy-values.csv
Original file line number Diff line number Diff line change
Expand Up @@ -3,51 +3,51 @@ Afrikaans,51,21,39,92,55,22,46,98,64,38,62,93,79,58,81,97
Albanian,NaN,NaN,NaN,NaN,55,18,48,98,80,54,86,99,88,69,95,100
Arabic,89,77,91,99,90,79,92,100,94,88,96,99,98,96,99,100
Armenian,NaN,NaN,NaN,NaN,99,100,100,97,100,100,100,100,100,100,100,100
Azerbaijani,65,45,58,91,81,62,82,99,82,71,78,96,90,77,92,99
Basque,NaN,NaN,NaN,NaN,62,33,62,92,74,56,76,91,84,71,87,93
Azerbaijani,64,45,58,91,81,62,82,99,82,71,78,96,90,77,92,99
Basque,NaN,NaN,NaN,NaN,62,33,62,92,75,56,76,92,84,71,87,93
Belarusian,81,64,80,98,84,67,86,100,92,80,95,100,97,92,99,100
Bengali,100,100,100,100,99,98,99,99,100,100,100,100,100,100,100,100
Bokmal,34,15,29,60,NaN,NaN,NaN,NaN,49,27,47,74,58,39,59,75
Bosnian,NaN,NaN,NaN,NaN,33,19,28,52,29,23,29,36,35,29,35,40
Bokmal,34,15,28,60,NaN,NaN,NaN,NaN,50,27,47,75,58,39,59,77
Bosnian,NaN,NaN,NaN,NaN,33,19,28,52,29,23,29,36,35,29,35,41
Bulgarian,61,37,57,89,70,45,66,98,78,56,81,96,87,70,91,99
Catalan,NaN,NaN,NaN,NaN,48,19,42,84,58,33,60,81,70,51,74,86
Catalan,NaN,NaN,NaN,NaN,48,19,42,84,58,33,60,82,70,51,74,87
Chinese,100,100,100,100,92,92,83,100,100,100,100,100,100,100,100,100
Croatian,55,28,44,91,42,26,42,58,60,36,57,85,72,53,74,90
Croatian,55,28,44,91,42,26,42,58,60,36,57,86,73,53,74,90
Czech,50,31,46,71,64,39,65,88,71,54,72,87,80,66,84,91
Danish,47,24,38,79,58,26,54,95,70,45,70,95,81,61,84,98
Dutch,47,22,36,82,58,29,47,97,64,36,61,94,77,55,81,96
English,49,17,36,94,54,22,44,97,62,29,62,96,81,55,89,99
Esperanto,52,25,45,88,57,22,51,98,66,44,61,92,84,67,85,98
English,49,17,35,94,54,22,44,97,63,29,62,97,81,55,89,99
Esperanto,52,25,45,88,57,22,51,98,66,44,61,93,84,67,85,98
Estonian,61,36,53,94,70,41,69,99,83,62,88,99,92,80,96,100
Finnish,71,45,70,98,80,58,84,99,91,77,95,100,96,90,98,100
French,64,37,59,97,55,22,49,94,77,52,83,97,89,74,94,99
French,64,37,59,97,55,22,49,94,77,52,83,98,89,74,94,99
Ganda,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,84,65,87,100,91,79,95,100
Georgian,100,100,100,100,98,99,100,96,100,100,100,100,100,100,100,100
German,65,38,60,97,66,40,62,98,80,57,84,99,89,74,94,100
Greek,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100
Gujarati,100,100,100,100,100,99,100,100,100,100,100,100,100,100,100,100
Hebrew,90,76,94,99,NaN,NaN,NaN,NaN,100,100,100,100,100,100,100,100
Hindi,52,27,40,88,58,34,45,95,33,11,20,67,73,61,64,93
Hungarian,62,38,53,95,76,53,76,99,90,77,94,100,95,87,98,100
Hindi,52,27,40,88,58,34,45,95,33,11,20,67,73,61,64,95
Hungarian,62,37,53,95,76,53,76,99,90,77,94,100,95,87,98,100
Icelandic,NaN,NaN,NaN,NaN,71,42,70,99,88,72,92,99,93,83,97,100
Indonesian,67,39,66,95,46,26,45,66,48,25,46,72,60,39,61,81
Indonesian,67,39,66,95,46,26,45,66,47,25,46,71,61,39,61,83
Irish,NaN,NaN,NaN,NaN,67,42,66,94,85,70,90,95,91,82,94,96
Italian,56,25,47,96,62,31,57,98,71,42,74,98,87,69,92,100
Japanese,99,100,100,97,98,97,96,100,100,100,100,100,100,100,100,100
Kazakh,NaN,NaN,NaN,NaN,82,62,83,99,90,78,93,99,92,80,96,99
Korean,100,100,100,100,99,100,100,98,100,100,100,100,100,100,100,100
Latin,NaN,NaN,NaN,NaN,62,44,58,83,73,49,76,93,87,72,93,97
Latin,NaN,NaN,NaN,NaN,62,44,58,83,73,49,76,94,87,72,93,97
Latvian,59,36,54,87,75,51,77,98,87,75,90,97,93,85,97,99
Lithuanian,62,38,56,92,72,42,75,99,87,76,89,98,95,86,98,100
Macedonian,62,39,55,94,60,30,54,97,72,52,70,95,84,66,86,99
Malay,NaN,NaN,NaN,NaN,22,11,22,34,31,22,36,36,31,26,38,30
Malay,NaN,NaN,NaN,NaN,22,11,22,34,31,22,36,35,31,26,38,28
Maori,NaN,NaN,NaN,NaN,52,22,43,91,82,62,87,98,91,82,92,99
Marathi,73,52,74,93,84,69,84,98,41,20,30,72,85,74,85,96
Mongolian,NaN,NaN,NaN,NaN,83,63,87,99,96,89,98,99,97,93,99,99
Nynorsk,34,10,24,69,NaN,NaN,NaN,NaN,52,25,49,81,66,41,66,90
Marathi,73,52,74,93,84,69,84,98,39,16,30,72,85,74,85,96
Mongolian,NaN,NaN,NaN,NaN,83,63,87,99,95,89,98,99,97,93,99,99
Nynorsk,34,10,24,69,NaN,NaN,NaN,NaN,52,25,49,81,66,41,66,91
Persian,70,46,66,99,76,57,70,99,80,62,80,98,90,78,94,100
Polish,66,45,59,94,77,51,80,99,90,77,93,99,95,85,98,100
Portuguese,57,26,48,96,53,21,40,97,69,42,70,95,81,59,85,98
Portuguese,57,26,48,96,53,21,40,97,69,42,70,95,81,59,85,99
Punjabi,100,100,100,100,100,99,100,100,100,100,100,100,100,100,100,100
Romanian,59,34,52,90,53,24,48,88,72,49,74,94,87,69,92,99
Russian,53,40,52,68,71,48,72,93,78,59,84,92,90,76,95,98
Expand All @@ -56,21 +56,21 @@ Shona,68,44,65,95,76,51,79,99,81,56,86,100,91,78,96,100
Slovak,NaN,NaN,NaN,NaN,63,32,61,96,75,49,78,97,84,64,90,99
Slovene,48,25,38,81,63,29,60,99,67,39,68,93,82,61,87,99
Somali,68,38,66,99,69,38,70,100,85,64,90,100,92,82,96,100
Sotho,NaN,NaN,NaN,NaN,49,15,33,98,72,43,75,97,85,67,90,99
Sotho,NaN,NaN,NaN,NaN,49,15,33,98,72,43,75,97,86,67,90,100
Spanish,48,19,33,93,48,16,32,96,56,26,49,94,70,44,69,97
Swahili,NaN,NaN,NaN,NaN,57,25,49,98,70,43,68,97,81,60,84,98
Swedish,49,24,39,83,61,30,56,96,72,46,76,95,84,64,88,99
Tagalog,52,23,43,90,NaN,NaN,NaN,NaN,66,36,67,96,78,52,83,99
Swedish,49,24,40,83,61,30,56,96,72,46,76,94,84,64,88,99
Tagalog,52,23,43,90,NaN,NaN,NaN,NaN,66,36,67,96,78,52,83,98
Tamil,100,100,100,100,100,100,100,99,100,100,100,100,100,100,100,100
Telugu,100,100,100,100,99,99,100,99,100,100,100,100,100,100,100,100
Thai,100,100,100,99,99,100,100,98,99,100,100,98,99,100,100,98
Thai,100,100,100,99,99,100,100,98,100,100,100,100,100,100,100,100
Tsonga,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,72,46,73,97,84,66,89,98
Tswana,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,71,44,73,96,84,65,88,99
Turkish,54,26,44,92,69,41,70,97,87,71,91,99,94,84,98,100
Turkish,54,26,44,92,69,41,70,97,87,71,91,100,94,84,98,100
Ukrainian,72,53,71,93,81,62,83,98,86,75,92,93,92,84,97,95
Urdu,57,31,46,94,61,39,53,92,80,65,78,96,91,80,94,98
Vietnamese,73,36,85,97,66,26,74,99,87,76,87,98,91,79,94,99
Welsh,NaN,NaN,NaN,NaN,69,43,66,98,82,61,87,99,91,78,96,99
Xhosa,NaN,NaN,NaN,NaN,66,40,65,92,69,45,67,94,82,64,85,98
Yoruba,22,11,14,41,15,5,11,28,62,33,61,93,75,50,77,97
Yoruba,22,11,14,41,15,5,11,28,62,33,61,92,74,50,77,96
Zulu,70,44,68,98,63,35,63,92,70,45,72,94,81,62,83,97
6 changes: 3 additions & 3 deletions cmd/accuracy-reports/lingua-high-accuracy/Afrikaans.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
##### Afrikaans #####

>>> Accuracy on average: 78.73%
>>> Accuracy on average: 78.77%

>> Detection of 1000 single words (average length: 8 chars)
Accuracy: 58.50%
Expand All @@ -11,6 +11,6 @@ Accuracy: 80.90%
Erroneously classified as Dutch: 11.00%, English: 1.30%, German: 1.10%, Latin: 0.80%, Danish: 0.70%, Bokmal: 0.40%, Estonian: 0.40%, Nynorsk: 0.30%, Sotho: 0.30%, Yoruba: 0.30%, Finnish: 0.20%, Ganda: 0.20%, Italian: 0.20%, Swedish: 0.20%, Tsonga: 0.20%, Welsh: 0.20%, Bosnian: 0.10%, Catalan: 0.10%, Esperanto: 0.10%, French: 0.10%, Hungarian: 0.10%, Malay: 0.10%, Portuguese: 0.10%, Shona: 0.10%, Spanish: 0.10%, Swahili: 0.10%, Tagalog: 0.10%, Tswana: 0.10%, Turkish: 0.10%

>> Detection of 1000 sentences (average length: 102 chars)
Accuracy: 96.80%
Erroneously classified as Dutch: 2.60%, German: 0.20%, Danish: 0.10%, English: 0.10%, Latin: 0.10%, Sotho: 0.10%
Accuracy: 96.90%
Erroneously classified as Dutch: 2.50%, German: 0.20%, Danish: 0.10%, English: 0.10%, Latin: 0.10%, Sotho: 0.10%

6 changes: 3 additions & 3 deletions cmd/accuracy-reports/lingua-high-accuracy/Albanian.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
##### Albanian #####

>>> Accuracy on average: 87.67%
>>> Accuracy on average: 87.73%

>> Detection of 1000 single words (average length: 8 chars)
Accuracy: 68.70%
Expand All @@ -11,6 +11,6 @@ Accuracy: 94.80%
Erroneously classified as English: 0.60%, Italian: 0.60%, Latin: 0.60%, Esperanto: 0.50%, Basque: 0.30%, French: 0.30%, Swahili: 0.30%, Catalan: 0.20%, Indonesian: 0.20%, Tsonga: 0.20%, Tswana: 0.20%, Bokmal: 0.10%, Croatian: 0.10%, Estonian: 0.10%, German: 0.10%, Nynorsk: 0.10%, Romanian: 0.10%, Shona: 0.10%, Slovak: 0.10%, Slovene: 0.10%, Swedish: 0.10%, Xhosa: 0.10%, Yoruba: 0.10%

>> Detection of 1000 sentences (average length: 118 chars)
Accuracy: 99.50%
Erroneously classified as Latin: 0.20%, Esperanto: 0.10%, French: 0.10%, Tsonga: 0.10%
Accuracy: 99.70%
Erroneously classified as Esperanto: 0.10%, Latin: 0.10%, Swahili: 0.10%

6 changes: 3 additions & 3 deletions cmd/accuracy-reports/lingua-high-accuracy/Azerbaijani.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
##### Azerbaijani #####

>>> Accuracy on average: 89.57%
>>> Accuracy on average: 89.70%

>> Detection of 1000 single words (average length: 8 chars)
Accuracy: 77.40%
Expand All @@ -11,6 +11,6 @@ Accuracy: 92.30%
Erroneously classified as Turkish: 4.70%, Italian: 0.30%, Albanian: 0.20%, Basque: 0.20%, Esperanto: 0.20%, Indonesian: 0.20%, Latin: 0.20%, Shona: 0.20%, Swahili: 0.20%, Bosnian: 0.10%, Danish: 0.10%, Dutch: 0.10%, German: 0.10%, Latvian: 0.10%, Malay: 0.10%, Polish: 0.10%, Somali: 0.10%, Swedish: 0.10%, Tagalog: 0.10%, Tswana: 0.10%, Xhosa: 0.10%, Zulu: 0.10%

>> Detection of 1000 sentences (average length: 107 chars)
Accuracy: 99.00%
Erroneously classified as Turkish: 0.80%, Sotho: 0.10%, Tagalog: 0.10%
Accuracy: 99.40%
Erroneously classified as Turkish: 0.50%, Tagalog: 0.10%

6 changes: 3 additions & 3 deletions cmd/accuracy-reports/lingua-high-accuracy/Basque.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
##### Basque #####

>>> Accuracy on average: 83.67%
>>> Accuracy on average: 83.73%

>> Detection of 1000 single words (average length: 9 chars)
Accuracy: 71.00%
Expand All @@ -11,6 +11,6 @@ Accuracy: 87.40%
Erroneously classified as Latin: 2.70%, Yoruba: 1.30%, Esperanto: 0.90%, Spanish: 0.90%, Swahili: 0.70%, French: 0.60%, English: 0.50%, Catalan: 0.40%, Italian: 0.40%, Portuguese: 0.40%, Albanian: 0.30%, Nynorsk: 0.30%, Slovak: 0.30%, Tsonga: 0.30%, Welsh: 0.30%, Croatian: 0.20%, Dutch: 0.20%, German: 0.20%, Icelandic: 0.20%, Malay: 0.20%, Afrikaans: 0.10%, Bosnian: 0.10%, Estonian: 0.10%, Finnish: 0.10%, Indonesian: 0.10%, Irish: 0.10%, Lithuanian: 0.10%, Polish: 0.10%, Shona: 0.10%, Swedish: 0.10%, Tagalog: 0.10%, Tswana: 0.10%, Xhosa: 0.10%

>> Detection of 1000 sentences (average length: 102 chars)
Accuracy: 92.60%
Erroneously classified as Latin: 6.00%, Esperanto: 0.20%, Italian: 0.20%, Lithuanian: 0.20%, Tagalog: 0.20%, Yoruba: 0.20%, Bokmal: 0.10%, English: 0.10%, German: 0.10%, Spanish: 0.10%
Accuracy: 92.80%
Erroneously classified as Latin: 5.90%, Esperanto: 0.20%, Italian: 0.20%, Lithuanian: 0.20%, Yoruba: 0.20%, Bokmal: 0.10%, English: 0.10%, German: 0.10%, Spanish: 0.10%, Tagalog: 0.10%

6 changes: 3 additions & 3 deletions cmd/accuracy-reports/lingua-high-accuracy/Bokmal.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
##### Bokmal #####

>>> Accuracy on average: 57.63%
>>> Accuracy on average: 58.03%

>> Detection of 1000 single words (average length: 9 chars)
Accuracy: 38.80%
Expand All @@ -11,6 +11,6 @@ Accuracy: 58.70%
Erroneously classified as Nynorsk: 23.70%, Danish: 12.70%, Swedish: 1.40%, German: 0.60%, English: 0.50%, Esperanto: 0.40%, French: 0.40%, Dutch: 0.30%, Latin: 0.20%, Tagalog: 0.20%, Basque: 0.10%, Finnish: 0.10%, Icelandic: 0.10%, Italian: 0.10%, Portuguese: 0.10%, Sotho: 0.10%, Swahili: 0.10%, Tswana: 0.10%, Xhosa: 0.10%

>> Detection of 1000 sentences (average length: 98 chars)
Accuracy: 75.40%
Erroneously classified as Nynorsk: 22.10%, Danish: 2.20%, Dutch: 0.10%, English: 0.10%, Swedish: 0.10%
Accuracy: 76.60%
Erroneously classified as Nynorsk: 21.00%, Danish: 2.00%, Swedish: 0.20%, Dutch: 0.10%, English: 0.10%

6 changes: 3 additions & 3 deletions cmd/accuracy-reports/lingua-high-accuracy/Bosnian.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
##### Bosnian #####

>>> Accuracy on average: 34.57%
>>> Accuracy on average: 34.87%

>> Detection of 1000 single words (average length: 8 chars)
Accuracy: 29.00%
Expand All @@ -11,6 +11,6 @@ Accuracy: 34.70%
Erroneously classified as Croatian: 53.70%, Slovene: 4.60%, Yoruba: 0.90%, English: 0.70%, Esperanto: 0.70%, Albanian: 0.40%, Latin: 0.40%, German: 0.30%, Afrikaans: 0.20%, Basque: 0.20%, Czech: 0.20%, Indonesian: 0.20%, Lithuanian: 0.20%, Malay: 0.20%, Nynorsk: 0.20%, Polish: 0.20%, Shona: 0.20%, Swahili: 0.20%, Swedish: 0.20%, Tagalog: 0.20%, Bokmal: 0.10%, Danish: 0.10%, Estonian: 0.10%, Ganda: 0.10%, Italian: 0.10%, Romanian: 0.10%, Slovak: 0.10%, Sotho: 0.10%, Tsonga: 0.10%, Tswana: 0.10%, Turkish: 0.10%, Xhosa: 0.10%

>> Detection of 1000 sentences (average length: 105 chars)
Accuracy: 40.00%
Erroneously classified as Croatian: 58.50%, Slovene: 0.40%, Welsh: 0.20%, Czech: 0.10%, Esperanto: 0.10%, French: 0.10%, Latin: 0.10%, Malay: 0.10%, Shona: 0.10%, Slovak: 0.10%, Swahili: 0.10%, Zulu: 0.10%
Accuracy: 40.90%
Erroneously classified as Croatian: 57.90%, Slovene: 0.30%, Czech: 0.10%, Esperanto: 0.10%, French: 0.10%, Latin: 0.10%, Malay: 0.10%, Shona: 0.10%, Slovak: 0.10%, Welsh: 0.10%, Zulu: 0.10%

2 changes: 1 addition & 1 deletion cmd/accuracy-reports/lingua-high-accuracy/Bulgarian.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,5 @@ Erroneously classified as Macedonian: 4.00%, Russian: 3.50%, Serbian: 1.00%, Ukr

>> Detection of 1000 sentences (average length: 89 chars)
Accuracy: 98.80%
Erroneously classified as Russian: 0.50%, Macedonian: 0.30%, Serbian: 0.20%, English: 0.10%, Ukrainian: 0.10%
Erroneously classified as Russian: 0.50%, Macedonian: 0.20%, Serbian: 0.20%, Afrikaans: 0.10%, English: 0.10%, Ukrainian: 0.10%

6 changes: 3 additions & 3 deletions cmd/accuracy-reports/lingua-high-accuracy/Catalan.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
##### Catalan #####

>>> Accuracy on average: 70.17%
>>> Accuracy on average: 70.37%

>> Detection of 1000 single words (average length: 8 chars)
Accuracy: 50.60%
Expand All @@ -11,6 +11,6 @@ Accuracy: 73.90%
Erroneously classified as Spanish: 8.90%, Portuguese: 3.70%, French: 2.90%, Yoruba: 2.20%, Italian: 1.90%, Latin: 1.90%, English: 1.60%, Romanian: 0.40%, Swahili: 0.30%, Basque: 0.20%, Dutch: 0.20%, Esperanto: 0.20%, Irish: 0.20%, Slovak: 0.20%, Tagalog: 0.20%, Welsh: 0.20%, Afrikaans: 0.10%, Albanian: 0.10%, Finnish: 0.10%, German: 0.10%, Hungarian: 0.10%, Lithuanian: 0.10%, Nynorsk: 0.10%, Tsonga: 0.10%, Vietnamese: 0.10%

>> Detection of 1000 sentences (average length: 103 chars)
Accuracy: 86.00%
Erroneously classified as Spanish: 6.60%, English: 1.80%, Latin: 1.40%, French: 1.00%, Yoruba: 0.60%, Portuguese: 0.40%, Basque: 0.30%, Italian: 0.30%, Romanian: 0.20%, Swahili: 0.20%, Tagalog: 0.20%, Vietnamese: 0.20%, Danish: 0.10%, Esperanto: 0.10%, Finnish: 0.10%, German: 0.10%, Malay: 0.10%, Slovene: 0.10%, Tswana: 0.10%, Xhosa: 0.10%
Accuracy: 86.60%
Erroneously classified as Spanish: 6.60%, English: 1.80%, French: 0.90%, Latin: 0.80%, Yoruba: 0.60%, Portuguese: 0.50%, Italian: 0.40%, Basque: 0.30%, Finnish: 0.20%, German: 0.20%, Swahili: 0.20%, Tagalog: 0.20%, Vietnamese: 0.20%, Danish: 0.10%, Malay: 0.10%, Romanian: 0.10%, Slovene: 0.10%, Welsh: 0.10%

6 changes: 3 additions & 3 deletions cmd/accuracy-reports/lingua-high-accuracy/Croatian.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
##### Croatian #####

>>> Accuracy on average: 72.40%
>>> Accuracy on average: 72.70%

>> Detection of 1000 single words (average length: 8 chars)
Accuracy: 53.40%
Expand All @@ -11,6 +11,6 @@ Accuracy: 74.30%
Erroneously classified as Bosnian: 19.00%, Slovene: 3.50%, Slovak: 0.70%, English: 0.50%, Basque: 0.20%, Latin: 0.20%, Lithuanian: 0.20%, Polish: 0.20%, Swahili: 0.20%, Turkish: 0.20%, Afrikaans: 0.10%, Albanian: 0.10%, Czech: 0.10%, Esperanto: 0.10%, Italian: 0.10%, Nynorsk: 0.10%, Portuguese: 0.10%, Romanian: 0.10%

>> Detection of 1000 sentences (average length: 127 chars)
Accuracy: 89.50%
Erroneously classified as Bosnian: 10.30%, Latin: 0.10%, Shona: 0.10%
Accuracy: 90.40%
Erroneously classified as Bosnian: 9.50%, Shona: 0.10%

6 changes: 3 additions & 3 deletions cmd/accuracy-reports/lingua-high-accuracy/Czech.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
##### Czech #####

>>> Accuracy on average: 80.23%
>>> Accuracy on average: 80.37%

>> Detection of 1000 single words (average length: 8 chars)
Accuracy: 65.50%
Expand All @@ -11,6 +11,6 @@ Accuracy: 84.50%
Erroneously classified as Slovak: 8.30%, Polish: 1.10%, Bosnian: 1.00%, Slovene: 0.60%, Latin: 0.50%, Hungarian: 0.40%, Croatian: 0.30%, Esperanto: 0.30%, Malay: 0.30%, Basque: 0.20%, Dutch: 0.20%, English: 0.20%, Estonian: 0.20%, Lithuanian: 0.20%, Romanian: 0.20%, Tsonga: 0.20%, Danish: 0.10%, Finnish: 0.10%, Indonesian: 0.10%, Nynorsk: 0.10%, Portuguese: 0.10%, Shona: 0.10%, Sotho: 0.10%, Spanish: 0.10%, Swedish: 0.10%, Tswana: 0.10%, Turkish: 0.10%, Xhosa: 0.10%, Yoruba: 0.10%

>> Detection of 1000 sentences (average length: 93 chars)
Accuracy: 90.70%
Erroneously classified as Slovak: 4.10%, Bosnian: 0.80%, Latin: 0.60%, English: 0.50%, Croatian: 0.30%, Romanian: 0.30%, Slovene: 0.30%, Catalan: 0.20%, Danish: 0.20%, German: 0.20%, Polish: 0.20%, Sotho: 0.20%, Swahili: 0.20%, Swedish: 0.20%, Yoruba: 0.20%, Afrikaans: 0.10%, Finnish: 0.10%, French: 0.10%, Hungarian: 0.10%, Nynorsk: 0.10%, Tsonga: 0.10%, Tswana: 0.10%, Welsh: 0.10%
Accuracy: 91.10%
Erroneously classified as Slovak: 4.00%, Bosnian: 0.70%, Latin: 0.70%, English: 0.40%, Croatian: 0.30%, Slovene: 0.30%, Swahili: 0.30%, Catalan: 0.20%, Danish: 0.20%, Finnish: 0.20%, German: 0.20%, Polish: 0.20%, Romanian: 0.20%, Swedish: 0.20%, Tswana: 0.20%, French: 0.10%, Hungarian: 0.10%, Nynorsk: 0.10%, Sotho: 0.10%, Welsh: 0.10%, Yoruba: 0.10%

6 changes: 3 additions & 3 deletions cmd/accuracy-reports/lingua-high-accuracy/Danish.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
##### Danish #####

>>> Accuracy on average: 80.97%
>>> Accuracy on average: 81.00%

>> Detection of 1000 single words (average length: 8 chars)
Accuracy: 61.20%
Expand All @@ -11,6 +11,6 @@ Accuracy: 83.90%
Erroneously classified as Bokmal: 6.70%, Nynorsk: 2.50%, Swedish: 1.00%, German: 0.90%, Latin: 0.90%, English: 0.60%, Afrikaans: 0.50%, Dutch: 0.40%, Esperanto: 0.40%, French: 0.40%, Basque: 0.30%, Italian: 0.30%, Shona: 0.20%, Finnish: 0.10%, Indonesian: 0.10%, Malay: 0.10%, Romanian: 0.10%, Slovak: 0.10%, Swahili: 0.10%, Tagalog: 0.10%, Tswana: 0.10%, Welsh: 0.10%, Zulu: 0.10%

>> Detection of 1000 sentences (average length: 112 chars)
Accuracy: 97.80%
Erroneously classified as Nynorsk: 1.20%, Bokmal: 0.90%, Malay: 0.10%
Accuracy: 97.90%
Erroneously classified as Nynorsk: 1.10%, Bokmal: 1.00%

6 changes: 3 additions & 3 deletions cmd/accuracy-reports/lingua-high-accuracy/Dutch.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
##### Dutch #####

>>> Accuracy on average: 77.33%
>>> Accuracy on average: 77.40%

>> Detection of 1000 single words (average length: 9 chars)
Accuracy: 55.10%
Expand All @@ -11,6 +11,6 @@ Accuracy: 80.70%
Erroneously classified as Afrikaans: 5.80%, Latin: 2.00%, German: 1.90%, English: 1.60%, French: 1.10%, Bokmal: 1.00%, Welsh: 0.70%, Romanian: 0.50%, Shona: 0.50%, Tagalog: 0.50%, Italian: 0.40%, Spanish: 0.40%, Danish: 0.30%, Swedish: 0.30%, Albanian: 0.20%, Esperanto: 0.20%, Nynorsk: 0.20%, Portuguese: 0.20%, Swahili: 0.20%, Tsonga: 0.20%, Tswana: 0.20%, Yoruba: 0.20%, Catalan: 0.10%, Croatian: 0.10%, Estonian: 0.10%, Ganda: 0.10%, Hungarian: 0.10%, Icelandic: 0.10%, Zulu: 0.10%

>> Detection of 1000 sentences (average length: 107 chars)
Accuracy: 96.20%
Erroneously classified as Afrikaans: 1.60%, Latin: 1.10%, Welsh: 0.40%, German: 0.20%, Italian: 0.20%, Bokmal: 0.10%, English: 0.10%, French: 0.10%
Accuracy: 96.40%
Erroneously classified as Afrikaans: 1.50%, Latin: 1.10%, German: 0.30%, Italian: 0.20%, Welsh: 0.20%, Bokmal: 0.10%, English: 0.10%, French: 0.10%

6 changes: 3 additions & 3 deletions cmd/accuracy-reports/lingua-high-accuracy/English.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
##### English #####

>>> Accuracy on average: 80.80%
>>> Accuracy on average: 80.87%

>> Detection of 1000 single words (average length: 8 chars)
Accuracy: 54.70%
Expand All @@ -11,6 +11,6 @@ Accuracy: 88.60%
Erroneously classified as Latin: 2.90%, French: 2.30%, Catalan: 0.70%, Dutch: 0.40%, German: 0.40%, Tsonga: 0.40%, Welsh: 0.40%, Danish: 0.30%, Ganda: 0.30%, Italian: 0.30%, Romanian: 0.30%, Bokmal: 0.20%, Esperanto: 0.20%, Nynorsk: 0.20%, Portuguese: 0.20%, Shona: 0.20%, Swedish: 0.20%, Tagalog: 0.20%, Yoruba: 0.20%, Zulu: 0.20%, Afrikaans: 0.10%, Estonian: 0.10%, Indonesian: 0.10%, Irish: 0.10%, Maori: 0.10%, Somali: 0.10%, Sotho: 0.10%, Spanish: 0.10%, Xhosa: 0.10%

>> Detection of 1000 sentences (average length: 108 chars)
Accuracy: 99.10%
Erroneously classified as Dutch: 0.40%, Latin: 0.20%, Nynorsk: 0.20%, Basque: 0.10%
Accuracy: 99.30%
Erroneously classified as Dutch: 0.40%, Basque: 0.10%, Latin: 0.10%, Nynorsk: 0.10%

Loading

0 comments on commit a688e6b

Please sign in to comment.