diff --git a/CHANGELOG.md b/CHANGELOG.md index 49c1a8845..c13c41d55 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -26,7 +26,7 @@ - Utils: Add spaCy's Korean sentence recognizer, word tokenizer, part-of-speech tagger, lemmatizer, and dependency parser - Utils: Add spaCy's Malay word tokenizer - Utils: Add spaCy's Slovenian sentence recognizer, part-of-speech tagger, lemmatizer, and dependency parser -- Work Area: Add Profiler - Readability - Bormuth's Cloze Mean / Bormuth's Grade Placement / Coleman's Readability Formula +- Work Area: Add Profiler - Readability - Bormuth's Cloze Mean / Bormuth's Grade Placement / Coleman's Readability Formula / Dale-Chall Readability Formula (New) ### ✨ Improvements - Utils: Update Wordless's sentence and sentence segment splitters diff --git a/doc/doc_eng.md b/doc/doc_eng.md index 03a17bbae..0d4578b47 100644 --- a/doc/doc_eng.md +++ b/doc/doc_eng.md @@ -320,8 +320,6 @@ All statistics are grouped into 5 tables for better readability: Readability, Co - **3.1.5.6 Count of n-character-long Tokens %**
The percentage of the number of n-character-long tokens in each file out of the total number of n-character-long tokens in all files, where n = 1, 2, 3, etc. -![Profiler - Table](/doc/work_area/profiler_table.png) - ### [3.2 Concordancer](#doc) In *Concordancer*, you can search for tokens in different files and generate concordance lines. You can adjust settings for data generation via **Generation Settings**. @@ -369,9 +367,6 @@ You can generate concordance plots for all search terms. You can modify the sett - **3.2.13 File**
The name of the file where the **Node** is found. -![Concordancer - Table](/doc/work_area/concordancer_table.png) -![Concordancer - Figure](/doc/work_area/concordancer_fig.png) - ### [3.3 Parallel Concordancer](#doc) **Notes:** @@ -391,8 +386,6 @@ After the parallel concordance lines are generated and displayed in the table, y - **3.3.3 Parallel Units**
The parallel unit (paragraph) where the search term is found in each file. -![Parallel Concordancer - Table](/doc/work_area/concordancer_parallel_table.png) - ### [3.4 Dependency Parser](#doc) **Note:** Added in *Wordless* 3.0.0 @@ -465,10 +458,6 @@ You can generate line charts or word clouds for wordlists using any statistics. - **3.5.8 Number of Files Found %**
The percentage of the number of files in which the token appears at least once out of the total number of files that are cureently selected. -![Wordlist Generator - Table](/doc/work_area/wordlist_generator_table.png) -![Wordlist Generator - Figure - Line Chart](/doc/work_area/wordlist_generator_fig_line_chart.png) -![Wordlist Generator - Figure - Word Cloud](/doc/work_area/wordlist_generator_fig_word_cloud.png) - ### [3.6 N-gram Generator](#doc) **Note:** Renamed from **N-gram** to **N-gram Generator** in *Wordless* 2.2.0 @@ -500,8 +489,6 @@ You can further filter the results as you see fit by clicking **Filter Results** - **3.6.7 Number of Files Found %**
The percentage of the number of files in which the n-gram appears at least once out of the total number of files that are currently selected. -![N-gram Generator - Table](/doc/work_area/ngram_generator_table.png) - ### [3.7 Collocation Extractor](#doc) **Note:** Renamed from **Collocation** to **Collocation Extractor** in *Wordless* 2.2.0 @@ -547,9 +534,6 @@ You can further filter the results as you see fit by clicking **Filter Results** - **3.7.11 Number of Files Found %**
The percentage of the number of files in which the node and the collocating token co-occur at least once out of the total number of files that are currently selected. -![Collocation Extractor - Table](/doc/work_area/collocation_extractor_table.png) -![Collocation Extractor - Figure - Network Graph](/doc/work_area/collocation_extractor_fig_network_graph.png) - ### [3.8 Colligation Extractor](#doc) **Note:** Renamed from **Colligation** to **Colligation Extractor** in *Wordless* 2.2.0 @@ -597,8 +581,6 @@ You can further filter the results as you see fit by clicking **Filter Results** - **3.8.11 Number of Files Found %**
The percentage of the number of files in which the node and the collocating part of speech co-occur at least once out of the total number of file that are currently selected. -![Colligation Extractor - Table](/doc/work_area/colligation_extractor_table.png) - ### [3.9 Keyword Extractor](#doc) **Note:** This module was originally named **Keyword** before *Wordless* 2.2 @@ -641,8 +623,6 @@ You can further filter the results as you see fit by clicking **Filter Results** - **3.9.10 Number of Files Found %**
The percentage of the number of files in which the keyword appears at least once out of the total number of files that are currently selected. -![Keyword Extractor - Table](/doc/work_area/keyword_extractor_table.png) - ## [4 Appendixes](#doc) @@ -905,8 +885,8 @@ It should be noted that some readability measures are **language-specific**, or These variables are used in the following formulas:
**NumSentences**: Number of sentences in the text or sample
**NumWords**: Number of words in the text or sample
-**NumWordsDale₇₆₉**: Number of words outside the Dale list of 769 easy words ([Dale, 1931](#ref-dale-1931))
-**NumWordsDale₃₀₀₀**: Number of words outside the Dale list of 3000 easy words ([Dale & Chall, 1948b](#ref-dale-chall-1948b))
+**NumWordsDale₇₆₉**: Number of words outside the Dale list of 769 easy words ([Dale, 1931](#ref-dale-1931))
+**NumWordsDale₃₀₀₀**: Number of words outside the Dale list of 3000 easy words ([Dale & Chall, 1948b](#ref-dale-chall-1948b))
**NumWords1Syl**: Number of monosyllabic words
**NumWords3PlusSyls**: Number of words with 3 or more syllables
**NumSyls**: Number of syllable in the text or sample
@@ -943,8 +923,11 @@ Coleman's Readability Formula: Cloze \; %_4 &= 1.04 \times \left(\frac{NumWords1Syl}{NumWords} \times 100\right) + 1.06 \times \left(\frac{NumSentences}{NumWords} \times 100\right) + 0.56 \times \left(\frac{NumProns}{NumWords} \times 100\right) - 0.36 \times \left(\frac{NumPreps}{NumWords} \times 100\right) - 26.01 \end{align*} -Dale-Chall Readability Score: - X_{c50} = 0.1579 \times \frac{NumWordsDale_{3000}}{NumWords} + 0.0496 \times \frac{NumWords}{NumSentences} + 3.6365 +Dale-Chall Readability Formula: + {X_{c50} = 0.1579 \times \left(\frac{NumWordsDale_{3000}}{NumWords} \times 100\right) + 0.0496 \times \frac{NumWords}{NumSentences} + 3.6365} + +Dale-Chall Readability Formula (New): + X_{c50} = 64 - 0.95 \times \left(\frac{NumWordsDale_{3000}}{NumWords} \times 100\right) - 0.69 \times \frac{NumWords}{NumSentences} Devereux Readability Index: Grade \; Placement = 1.56 \times \frac{NumCharsAll}{NumWords} + 0.19 \times \frac{NumWords}{NumSentences} - 6.49 @@ -1026,7 +1009,8 @@ Measure of Readability|Formula Bormuth's Cloze Mean & Grade Placement
([Bormuth, 1969](#ref-bormuth-1969))|![Formula](/doc/measures/readability/bormuths_cloze_mean_gp.svg)
where **C** is the cloze criterion score, whose value could be changed via **Menu → Preferences → Settings → Measures → Readability → Bormuth's Grade Placement - Cloze criterion score**

* This measure applies only to **English texts**. Coleman-Liau Index
([Coleman & Liau, 1975](#ref-coleman-liau-1975))|![Formula](/doc/measures/readability/coleman_liau_index.svg) Coleman's Readability Formula¹
([Coleman et al., 1976](#ref-coleman-et-al-1976))|![Formula](/doc/measures/readability/colemans_readability_formula.svg)
where **NumProns** is the number of pronouns and **NumPreps** is the number of Prepositions

* This measure applies only to **English texts**.
* This measure has 4 variants, which you could select via **Menu → Preferences → Settings → Measures → Readability → Coleman's Readability Formula → Variant**. -Dale-Chall Readability Score
([Dale & Chall, 1948a](#ref-dale-chall-1948a))|![Formula](/doc/measures/readability/x_c50.svg)

* This measure applies only to **English texts**. +Dale-Chall Readability Formula
([Dale & Chall, 1948a](#ref-dale-chall-1948a); [Dale & Chall, 1948b](#ref-dale-chall-1948b))|![Formula](/doc/measures/readability/x_c50.svg)

* This measure applies only to **English texts**. +Dale-Chall Readability Formula (New)
([Chall & Dale, 1995](#ref-chall-dale-1995))|![Formula](/doc/measures/readability/x_c50_new.svg)

* This measure applies only to **English texts**. Devereux Readability Index
([Smith, 1961](#ref-smith-1961))|![Formula](/doc/measures/readability/devereux_readability_index.svg) Flesch-Kincaid Grade Level¹
([Kincaid et al., 1975](#ref-kincaid-et-al-1975))|![Formula](/doc/measures/readability/flesch_kincaid_grade_level.svg) Flesch Reading Ease¹
([Flesch, 1948](#ref-flesch-1948)
Dutch: [Douma, 1960](#ref-douma-1960); [Brouwer, 1963](#ref-brouwer-1963)
French: [Kandel & Moles, 1958](#ref-kandel-moles-1958)
German: [Amstad, 1978](#ref-amstad-1978)
Italian: [Franchina & Vacca, 1986](#ref-franchina-vacca-1986)
Russian: [Oborneva, 2006](#ref-oborneva-2006)
Spanish: [Fernández Huerta, 1959](#ref-fernandez-huerta-1959); [Szigriszt Pazos, 1993](#ref-szigrisze-pazos-1993))|![Formula](/doc/measures/readability/re.svg)

* This measure has multiple variants for some languages, which you could select via **Menu → Preferences → Settings → Measures → Readability → Flesch Reading Ease**. @@ -1393,129 +1377,131 @@ Cubic Association Ratio
([Daille, 1994](#ref-daille-1994), [1995](#ref-daille [9] [**^**](#ref-carrolls-d2)[**^**](#ref-carrolls-um) Carroll, J. B. (1970). An alternative to Juilland’s usage coefficient for lexical frequencies and a proposal for a standard frequency index. *Computer Studies in the Humanities and Verbal Behaviour*, *3*(2), 61–65. https://doi.org/10.1002/j.2333-8504.1970.tb00778.x
[10] [**^**](#ref-rgl) Caylor, J. S., Sticht, T. G., Fox, L. C., & Ford, J. P. (1973). *Methodologies for determining reading requirements of military occupational specialties*. Human Resource Research Organization. https://files.eric.ed.gov/fulltext/ED074343.pdf
+ +[11] [**^**](#ref-dale-chall-readability-formula-new) Chall, J. S., & Dale, E. (1995). *Readability revisited: The new Dale-Chall readability formula*. Brookline Books.
-[11] [**^**](#ref-squared-phi-coeff) Church, K. W., & Gale, W. A. (1991, September 29–October 1). Concordances for parallel text [Paper presentation]. Using Corpora: Seventh Annual Conference of the UW Centre for the New OED and Text Research, St. Catherine's College, Oxford, United Kingdom.
+[12] [**^**](#ref-squared-phi-coeff) Church, K. W., & Gale, W. A. (1991, September 29–October 1). Concordances for parallel text [Paper presentation]. Using Corpora: Seventh Annual Conference of the UW Centre for the New OED and Text Research, St. Catherine's College, Oxford, United Kingdom.
-[12] [**^**](#ref-students-t-test-1-sample) Church, K., Gale, W., Hanks, P., & Hindle, D. (1991). Using statistics in lexical analysis. In U. Zernik (Ed.), *Lexical acquisition: Exploiting on-line resources to build a lexicon* (pp. 115–164). Psychology Press.
+[13] [**^**](#ref-students-t-test-1-sample) Church, K., Gale, W., Hanks, P., & Hindle, D. (1991). Using statistics in lexical analysis. In U. Zernik (Ed.), *Lexical acquisition: Exploiting on-line resources to build a lexicon* (pp. 115–164). Psychology Press.
-[13] [**^**](#ref-pmi) Church, K. W., & Hanks, P. (1990). Word association norms, mutual information, and lexicography. *Computational Linguistics*, *16*(1), 22–29.
+[14] [**^**](#ref-pmi) Church, K. W., & Hanks, P. (1990). Word association norms, mutual information, and lexicography. *Computational Linguistics*, *16*(1), 22–29.
-[14] [**^**](#ref-coleman-liau-index) Coleman, M., & Liau, T. L. (1975). A computer readability formula designed for machine scoring. *Journal of Applied Psychology*, *60*(2), 283–284. https://doi.org/10.1037/h0076540
+[15] [**^**](#ref-coleman-liau-index) Coleman, M., & Liau, T. L. (1975). A computer readability formula designed for machine scoring. *Journal of Applied Psychology*, *60*(2), 283–284. https://doi.org/10.1037/h0076540
-[15] [**^**](#ref-formula-de-crawford) Crawford, A. N. (1985). Fórmula y gráfico para determinar la comprensibilidad de textos de nivel primario en castellano. *Lectura y Vida*, *6*(4). http://www.lecturayvida.fahce.unlp.edu.ar/numeros/a6n4/06_04_Crawford.pdf
+[16] [**^**](#ref-formula-de-crawford) Crawford, A. N. (1985). Fórmula y gráfico para determinar la comprensibilidad de textos de nivel primario en castellano. *Lectura y Vida*, *6*(4). http://www.lecturayvida.fahce.unlp.edu.ar/numeros/a6n4/06_04_Crawford.pdf
-[16] [**^**](#ref-im3) Daille, B. (1994). *Approche mixte pour l'extraction automatique de terminologie: statistiques lexicales et filtres linguistiques* [Doctoral thesis, Paris Diderot University]. Béatrice Daille. http://www.bdaille.com/index.php?option=com_docman&task=doc_download&gid=8&Itemid=
+[17] [**^**](#ref-im3) Daille, B. (1994). *Approche mixte pour l'extraction automatique de terminologie: statistiques lexicales et filtres linguistiques* [Doctoral thesis, Paris Diderot University]. Béatrice Daille. http://www.bdaille.com/index.php?option=com_docman&task=doc_download&gid=8&Itemid=
-[17] [**^**](#ref-im3) Daille, B. (1995). Combined approach for terminology extraction: Lexical statistics and linguistic filtering. *UCREL technical papers* (Vol. 5). Lancaster University.
+[18] [**^**](#ref-im3) Daille, B. (1995). Combined approach for terminology extraction: Lexical statistics and linguistic filtering. *UCREL technical papers* (Vol. 5). Lancaster University.
-[18] [**^**](#ref-spache-grade-level) Dale, E. (1931). A comparison of two word lists. *Educational Research Bulletin*, *10*(18), 484–489.
+[19] [**^**](#ref-num-words-769) [**^**](#ref-spache-grade-level) Dale, E. (1931). A comparison of two word lists. *Educational Research Bulletin*, *10*(18), 484–489.
-[19] [**^**](#ref-dale-chall-readability-score) Dale, E., & Chall, J. S. (1948a). A formula for predicting readability. *Educational Research Bulletin*, *27*(1), 11–20, 28.
+[20] [**^**](#ref-dale-chall-readability-formula) Dale, E., & Chall, J. S. (1948a). A formula for predicting readability. *Educational Research Bulletin*, *27*(1), 11–20, 28.
-[20] [**^**](#ref-dale-chall-readability-score) Dale, E., & Chall, J. S. (1948b). A formula for predicting readability: Instructions. *Educational Research Bulletin*, *27*(2), 37–54.
+[21] [**^**](#ref-num-words-3000) [**^**](#ref-dale-chall-readability-formula) Dale, E., & Chall, J. S. (1948b). A formula for predicting readability: Instructions. *Educational Research Bulletin*, *27*(2), 37–54.
-[21] [**^**](#ref-z-score) Dennis, S. F. (1964). The construction of a thesaurus automatically from a sample of text. In M. E. Stevens, V. E. Giuliano, & L. B. Heilprin (Eds.), *Proceedings of the symposium on statistical association methods for mechanized documentation* (pp. 61–148). National Bureau of Standards.
+[22] [**^**](#ref-z-score) Dennis, S. F. (1964). The construction of a thesaurus automatically from a sample of text. In M. E. Stevens, V. E. Giuliano, & L. B. Heilprin (Eds.), *Proceedings of the symposium on statistical association methods for mechanized documentation* (pp. 61–148). National Bureau of Standards.
-[22] [**^**](#ref-me) Dias, G., Guilloré, S., & Pereira Lopes, J. G. (1999). Language independent automatic acquisition of rigid multiword units from unrestricted text corpora. In A. Condamines, C. Fabre, & M. Péry-Woodley (Eds.), *TALN'99: 6ème Conférence Annuelle Sur le Traitement Automatique des Langues Naturelles* (pp. 333–339). TALN.
+[23] [**^**](#ref-me) Dias, G., Guilloré, S., & Pereira Lopes, J. G. (1999). Language independent automatic acquisition of rigid multiword units from unrestricted text corpora. In A. Condamines, C. Fabre, & M. Péry-Woodley (Eds.), *TALN'99: 6ème Conférence Annuelle Sur le Traitement Automatique des Langues Naturelles* (pp. 333–339). TALN.
-[23] [**^**](#ref-re) Douma, W. H. (1960). *De leesbaarheid van landbouwbladen: Een onderzoek naar en een toepassing van leesbaarheidsformules* [Readability of Dutch farm papers: A discussion and application of readability-formulas]. Afdeling sociologie en sociografie van de Landbouwhogeschool Wageningen. https://edepot.wur.nl/276323 +[24] [**^**](#ref-re) Douma, W. H. (1960). *De leesbaarheid van landbouwbladen: Een onderzoek naar en een toepassing van leesbaarheidsformules* [Readability of Dutch farm papers: A discussion and application of readability-formulas]. Afdeling sociologie en sociografie van de Landbouwhogeschool Wageningen. https://edepot.wur.nl/276323 -[24] [**^**](#ref-log-likehood-ratio-test) Dunning, T. E. (1993). Accurate methods for the statistics of surprise and coincidence. *Computational Linguistics*, *19*(1), 61–74.
+[25] [**^**](#ref-log-likehood-ratio-test) Dunning, T. E. (1993). Accurate methods for the statistics of surprise and coincidence. *Computational Linguistics*, *19*(1), 61–74.
-[25] [**^**](#ref-jaccard-index)[**^**](#ref-mi) Dunning, T. E. (1998). *Finding structure in text, genome and other symbolic sequences* [Doctoral dissertation, University of Sheffield]. arXiv. arxiv.org/pdf/1207.1847.pdf
+[26] [**^**](#ref-jaccard-index)[**^**](#ref-mi) Dunning, T. E. (1998). *Finding structure in text, genome and other symbolic sequences* [Doctoral dissertation, University of Sheffield]. arXiv. arxiv.org/pdf/1207.1847.pdf
-[26] [**^**](#ref-osman) El-Haj, M., & Rayson, P. (2016). OSMAN: A novel Arabic readability metric. In N. Calzolari, K. Choukri, T. Declerck, S. Goggi, M. Grobelnik, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, & S. Piperidis (Eds.), *Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)* (pp. 250–255). European Language Resources Association. http://www.lrec-conf.org/proceedings/lrec2016/index.html
+[27] [**^**](#ref-osman) El-Haj, M., & Rayson, P. (2016). OSMAN: A novel Arabic readability metric. In N. Calzolari, K. Choukri, T. Declerck, S. Goggi, M. Grobelnik, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, & S. Piperidis (Eds.), *Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)* (pp. 250–255). European Language Resources Association. http://www.lrec-conf.org/proceedings/lrec2016/index.html
-[27] [**^**](#ref-engwalls-fm) Engwall, G. (1974). *Fréquence et distribution du vocabulaire dans un choix de romans français* [Unpublished doctoral dissertation]. Stockholm University.
+[28] [**^**](#ref-engwalls-fm) Engwall, G. (1974). *Fréquence et distribution du vocabulaire dans un choix de romans français* [Unpublished doctoral dissertation]. Stockholm University.
-[28] [**^**](#ref-re-simplified) Farr, J. N., Jenkins, J. J., & Paterson, D. G. (1951). Simplification of Flesch reading ease formula. *Journal of Applied Psychology*, *35*(5), 333–337. https://doi.org/10.1037/h0062427
+[29] [**^**](#ref-re-simplified) Farr, J. N., Jenkins, J. J., & Paterson, D. G. (1951). Simplification of Flesch reading ease formula. *Journal of Applied Psychology*, *35*(5), 333–337. https://doi.org/10.1037/h0062427
-[29] [**^**](#ref-re) Fernández Huerta, J. (1959). Medidas sencillas de lecturabilidad. *Consigna*, *214*, 29–32.
+[30] [**^**](#ref-re) Fernández Huerta, J. (1959). Medidas sencillas de lecturabilidad. *Consigna*, *214*, 29–32.
-[30] [**^**](#ref-re) Flesch, R. (1948). A new readability yardstick. *Journal of Applied Psychology*, *32*(3), 221–233. https://doi.org/10.1037/h0057532
+[31] [**^**](#ref-re) Flesch, R. (1948). A new readability yardstick. *Journal of Applied Psychology*, *32*(3), 221–233. https://doi.org/10.1037/h0057532
-[31] [**^**](#ref-re) Franchina, V., & Vacca, R. (1986). Adaptation of Flesh readability index on a bilingual text written by the same author both in Italian and English languages. *Linguaggi*, *3*, 47–49.
+[32] [**^**](#ref-re) Franchina, V., & Vacca, R. (1986). Adaptation of Flesh readability index on a bilingual text written by the same author both in Italian and English languages. *Linguaggi*, *3*, 47–49.
-[32] [**^**](#ref-diff-coeff) Gabrielatos, C. (2018). Keyness analysis: Nature, metrics and techniques. In C. Taylor & A. Marchi (Eds.), *Corpus approaches to discourse: A critical review* (pp. 225–258). Routledge.
+[33] [**^**](#ref-diff-coeff) Gabrielatos, C. (2018). Keyness analysis: Nature, metrics and techniques. In C. Taylor & A. Marchi (Eds.), *Corpus approaches to discourse: A critical review* (pp. 225–258). Routledge.
-[33] [**^**](#ref-pct-diff) Gabrielatos, C., & Marchi, A. (2012, September 13–14). *Keyness: Appropriate metrics and practical issues* [Conference session]. CADS International Conference 2012, University of Bologna, Italy.
+[34] [**^**](#ref-pct-diff) Gabrielatos, C., & Marchi, A. (2012, September 13–14). *Keyness: Appropriate metrics and practical issues* [Conference session]. CADS International Conference 2012, University of Bologna, Italy.
-[34] [**^**](#ref-griess-dp) Gries, S. T. (2008). Dispersions and adjusted frequencies in corpora. *International Journal of Corpus Linguistics*, *13*(4), 403–437. https://doi.org/10.1075/ijcl.13.4.02gri
+[35] [**^**](#ref-griess-dp) Gries, S. T. (2008). Dispersions and adjusted frequencies in corpora. *International Journal of Corpus Linguistics*, *13*(4), 403–437. https://doi.org/10.1075/ijcl.13.4.02gri
-[35] [**^**](#ref-fog-index) Gunning, R. (1968). *The technique of clear writing* (revised ed.). McGraw-Hill Book Company.
+[36] [**^**](#ref-fog-index) Gunning, R. (1968). *The technique of clear writing* (revised ed.). McGraw-Hill Book Company.
-[36] [**^**](#ref-formula-de-comprensibilidad-de-gutierrez-de-polini) Gutiérrez de Polini, L. E. (1972). *Investigación sobre lectura en Venezuela* [Paper presentation]. Primeras Jornadas de Educación Primaria, Ministerio de Educación, Caracas, Venezuela.
+[37] [**^**](#ref-formula-de-comprensibilidad-de-gutierrez-de-polini) Gutiérrez de Polini, L. E. (1972). *Investigación sobre lectura en Venezuela* [Paper presentation]. Primeras Jornadas de Educación Primaria, Ministerio de Educación, Caracas, Venezuela.
-[37] [**^**](#ref-log-ratio) Hardie, A. (2014, April 28). *Log ratio: An informal introduction*. ESRC Centre for Corpus Approaches to Social Science (CASS). http://cass.lancs.ac.uk/log-ratio-an-informal-introduction/.
+[38] [**^**](#ref-log-ratio) Hardie, A. (2014, April 28). *Log ratio: An informal introduction*. ESRC Centre for Corpus Approaches to Social Science (CASS). http://cass.lancs.ac.uk/log-ratio-an-informal-introduction/.
-[38] [**^**](#ref-pearsons-chi-squared-test)[**^**](#ref-diff-coeff) Hofland, K., & Johanson, S. (1982). *Word frequencies in British and American English*. Norwegian Computing Centre for the Humanities.
+[39] [**^**](#ref-pearsons-chi-squared-test)[**^**](#ref-diff-coeff) Hofland, K., & Johanson, S. (1982). *Word frequencies in British and American English*. Norwegian Computing Centre for the Humanities.
-[39] [**^**](#ref-juillands-d)[**^**](#ref-juillands-u) Juilland, A., & Chang-Rodriguez, E. (1964). *Frequency dictionary of Spanish words*. Mouton.
+[40] [**^**](#ref-juillands-d)[**^**](#ref-juillands-u) Juilland, A., & Chang-Rodriguez, E. (1964). *Frequency dictionary of Spanish words*. Mouton.
-[40] [**^**](#ref-re) Kandel, L., & Moles A. (1958). Application de l’indice de flesch la langue francaise [applying flesch index to french language]. *The Journal of Educational Research*, *21*, 283–287.
+[41] [**^**](#ref-re) Kandel, L., & Moles A. (1958). Application de l’indice de flesch la langue francaise [applying flesch index to french language]. *The Journal of Educational Research*, *21*, 283–287.
-[41] [**^**](#ref-mann-whiteney-u-test) Kilgarriff, A. (2001). Comparing corpora. *International Journal of Corpus Linguistics*, *6*(1), 232–263. https://doi.org/10.1075/ijcl.6.1.05kil
+[42] [**^**](#ref-mann-whiteney-u-test) Kilgarriff, A. (2001). Comparing corpora. *International Journal of Corpus Linguistics*, *6*(1), 232–263. https://doi.org/10.1075/ijcl.6.1.05kil
-[42] [**^**](#ref-kilgarriffs-ratio) Kilgarriff, A. (2009). Simple maths for keywords. In M. Mahlberg, V. González-Díaz, & C. Smith (Eds.), *Proceedings of the Corpus Linguistics Conference 2009* (p. 171). University of Liverpool.
+[43] [**^**](#ref-kilgarriffs-ratio) Kilgarriff, A. (2009). Simple maths for keywords. In M. Mahlberg, V. González-Díaz, & C. Smith (Eds.), *Proceedings of the Corpus Linguistics Conference 2009* (p. 171). University of Liverpool.
-[43] [**^**](#ref-mi-log-f) Kilgarriff, A., & Tugwell, D. (2002). WASP-bench: An MT lexicographers' workstation supporting state-of-the-art lexical disambiguation. In *Proceedings of the 8th Machine Translation Summit* (pp. 187–190). European Association for Machine Translation.
+[44] [**^**](#ref-mi-log-f) Kilgarriff, A., & Tugwell, D. (2002). WASP-bench: An MT lexicographers' workstation supporting state-of-the-art lexical disambiguation. In *Proceedings of the 8th Machine Translation Summit* (pp. 187–190). European Association for Machine Translation.
-[44] [**^**](#ref-flesch-kincaid-grade-level) Kincaid, J. P., Fishburne, R. P., Rogers, R. L., & Chissom, B. S. (1975). *Derivation of new readability formulas (automated readability index, fog count, and Flesch reading ease formula) for navy enlisted personnel*. Naval Air Station Memphis.
+[45] [**^**](#ref-flesch-kincaid-grade-level) Kincaid, J. P., Fishburne, R. P., Rogers, R. L., & Chissom, B. S. (1975). *Derivation of new readability formulas (automated readability index, fog count, and Flesch reading ease formula) for navy enlisted personnel*. Naval Air Station Memphis.
-[45] [**^**](#ref-kromers-ur) Kromer, V. (2003). A usage measure based on psychophysical relations. *Journal of Quantitative Linguistics*, *10*(2), 177–186. https://doi.org/10.1076/jqul.10.2.177.16718
+[46] [**^**](#ref-kromers-ur) Kromer, V. (2003). A usage measure based on psychophysical relations. *Journal of Quantitative Linguistics*, *10*(2), 177–186. https://doi.org/10.1076/jqul.10.2.177.16718
-[46] [**^**](#ref-mi-log-f) Lexical Computing. (2015, July 8). *Statistics used in Sketch Engine*. Sketch Engine. https://www.sketchengine.eu/documentation/statistics-used-in-sketch-engine/
+[47] [**^**](#ref-mi-log-f) Lexical Computing. (2015, July 8). *Statistics used in Sketch Engine*. Sketch Engine. https://www.sketchengine.eu/documentation/statistics-used-in-sketch-engine/
-[47] [**^**](#ref-colemans-readability-formula) Liau, T. L., Bassin, C. B., Martin, C. J., & Coleman, E. B. (1976). Modification of the Coleman readability formulas. *Journal of Reading Behavior*, *8*(4), 381–386. https://journals.sagepub.com/doi/pdf/10.1080/10862967609547193
+[48] [**^**](#ref-colemans-readability-formula) Liau, T. L., Bassin, C. B., Martin, C. J., & Coleman, E. B. (1976). Modification of the Coleman readability formulas. *Journal of Reading Behavior*, *8*(4), 381–386. https://journals.sagepub.com/doi/pdf/10.1080/10862967609547193
-[48] [**^**](#ref-griess-dp-norm) Lijffijt, J., & Gries, S. T. (2012). Correction to Stefan Th. Gries’ “dispersions and adjusted frequencies in corpora”. *International Journal of Corpus Linguistics*, *17*(1), 147–149. https://doi.org/10.1075/ijcl.17.1.08lij
+[49] [**^**](#ref-griess-dp-norm) Lijffijt, J., & Gries, S. T. (2012). Correction to Stefan Th. Gries’ “dispersions and adjusted frequencies in corpora”. *International Journal of Corpus Linguistics*, *17*(1), 147–149. https://doi.org/10.1075/ijcl.17.1.08lij
-[49] [**^**](#ref-gulpease-index) Lucisano, P., & Emanuela Piemontese, M. (1988). GULPEASE: A formula for the prediction of the difficulty of texts in Italian. *Scuola e Città*, *39*(3), pp. 110–124.
+[50] [**^**](#ref-gulpease-index) Lucisano, P., & Emanuela Piemontese, M. (1988). GULPEASE: A formula for the prediction of the difficulty of texts in Italian. *Scuola e Città*, *39*(3), pp. 110–124.
-[50] [**^**](#ref-lynes-d3) Lyne, A. A. (1985). Dispersion. In *The vocabulary of French business correspondence: Word frequencies, collocations, and problems of lexicometric method* (pp. 101–124). Slatkine/Champion.
+[51] [**^**](#ref-lynes-d3) Lyne, A. A. (1985). Dispersion. In *The vocabulary of French business correspondence: Word frequencies, collocations, and problems of lexicometric method* (pp. 101–124). Slatkine/Champion.
-[51] [**^**](#ref-smog-grade) McLaughlin, G. H. (1969). SMOG grading: A new readability formula. *Journal of Reading*, *12*(8), pp. 639–646.
+[52] [**^**](#ref-smog-grade) McLaughlin, G. H. (1969). SMOG grading: A new readability formula. *Journal of Reading*, *12*(8), pp. 639–646.
-[52] [**^**](#ref-legibilidad-mu) Muñoz Baquedano, M. (2006). Legibilidad y variabilidad de los textos. *Boletín de Investigación Educacional, Pontificia Universidad Católica de Chile*, *21*(2), 13–26.
+[53] [**^**](#ref-legibilidad-mu) Muñoz Baquedano, M. (2006). Legibilidad y variabilidad de los textos. *Boletín de Investigación Educacional, Pontificia Universidad Católica de Chile*, *21*(2), 13–26.
-[53] [**^**](#ref-eflaw) Nirmaldasan. (2009, April 30). *McAlpine EFLAW readability score*. Readability Monitor. Retrieved November 15, 2022, from https://strainindex.wordpress.com/2009/04/30/mcalpine-eflaw-readability-score/
+[54] [**^**](#ref-eflaw) Nirmaldasan. (2009, April 30). *McAlpine EFLAW readability score*. Readability Monitor. Retrieved November 15, 2022, from https://strainindex.wordpress.com/2009/04/30/mcalpine-eflaw-readability-score/
-[54] [**^**](#ref-pearsons-chi-squared-test) Oakes, M. P. (1998). *Statistics for Corpus Linguistics*. Edinburgh University Press.
+[55] [**^**](#ref-pearsons-chi-squared-test) Oakes, M. P. (1998). *Statistics for Corpus Linguistics*. Edinburgh University Press.
-[55] [**^**](#ref-re) Oborneva, I. V. (2006). *Автоматизированная оценка сложности учебных текстов на основе статистических параметров* [Doctoral dissertation, Institute for Strategy of Education Development of the Russian Academy of Education]. Freereferats.ru. https://static.freereferats.ru/_avtoreferats/01002881899.pdf?ver=3
+[56] [**^**](#ref-re) Oborneva, I. V. (2006). *Автоматизированная оценка сложности учебных текстов на основе статистических параметров* [Doctoral dissertation, Institute for Strategy of Education Development of the Russian Academy of Education]. Freereferats.ru. https://static.freereferats.ru/_avtoreferats/01002881899.pdf?ver=3
-[56] [**^**](#ref-lensear-write) O’Hayre, J. (1966). *Gobbledygook has gotta go*. U.S. Government Printing Office. https://www.governmentattic.org/15docs/Gobbledygook_Has_Gotta_Go_1966.pdf
+[57] [**^**](#ref-lensear-write) O’Hayre, J. (1966). *Gobbledygook has gotta go*. U.S. Government Printing Office. https://www.governmentattic.org/15docs/Gobbledygook_Has_Gotta_Go_1966.pdf
-[57] [**^**](#ref-students-t-test-2-sample) Paquot, M., & Bestgen, Y. (2009). Distinctive words in academic writing: A comparison of three statistical tests for keyword extraction. *Language and Computers*, *68*, 247–269.
+[58] [**^**](#ref-students-t-test-2-sample) Paquot, M., & Bestgen, Y. (2009). Distinctive words in academic writing: A comparison of three statistical tests for keyword extraction. *Language and Computers*, *68*, 247–269.
-[58] [**^**](#ref-fishers-exact-test) Pedersen, T. (1996). Fishing for exactness. In T. Winn (Ed.), *Proceedings of the Sixth Annual South-Central Regional SAS Users' Group Conference* (pp. 188–200). The South–Central Regional SAS Users' Group.
+[59] [**^**](#ref-fishers-exact-test) Pedersen, T. (1996). Fishing for exactness. In T. Winn (Ed.), *Proceedings of the Sixth Annual South-Central Regional SAS Users' Group Conference* (pp. 188–200). The South–Central Regional SAS Users' Group.
-[59] [**^**](#ref-min-sensitivity) Pedersen, T. (1998). Dependent bigram identification. In *Proceedings of the Fifteenth National Conference on Artificial Intelligence* (p. 1197). AAAI Press.
+[60] [**^**](#ref-min-sensitivity) Pedersen, T. (1998). Dependent bigram identification. In *Proceedings of the Fifteenth National Conference on Artificial Intelligence* (p. 1197). AAAI Press.
-[60] [**^**](#ref-fog-index) Pisarek, W. (1969). Jak mierzyć zrozumiałość tekstu?. *Zeszyty Prasoznawcze*, *4*(42), 35–48.
+[61] [**^**](#ref-fog-index) Pisarek, W. (1969). Jak mierzyć zrozumiałość tekstu?. *Zeszyty Prasoznawcze*, *4*(42), 35–48.
-[61] [**^**](#ref-odds-ratio) Pojanapunya, P., & Todd, R. W. (2016). Log-likelihood and odds ratio keyness statistics for different purposes of keyword analysis. *Corpus Linguistics and Linguistic Theory*, *15*(1), pp. 133–167. https://doi.org/10.1515/cllt-2015-0030
+[62] [**^**](#ref-odds-ratio) Pojanapunya, P., & Todd, R. W. (2016). Log-likelihood and odds ratio keyness statistics for different purposes of keyword analysis. *Corpus Linguistics and Linguistic Theory*, *15*(1), pp. 133–167. https://doi.org/10.1515/cllt-2015-0030
-[62] [**^**](#ref-poisson-collocation-measure) Quasthoff, U., & Wolff, C. (2002). The poisson collocation measure and its applications. *Proceedings of 2nd International Workshop on Computational Approaches to Collocations*. IEEE.
+[63] [**^**](#ref-poisson-collocation-measure) Quasthoff, U., & Wolff, C. (2002). The poisson collocation measure and its applications. *Proceedings of 2nd International Workshop on Computational Approaches to Collocations*. IEEE.
-[63] [**^**](#ref-rosengrens-s)[**^**](#ref-rosengrens-kf) Rosengren, I. (1971). The quantitative concept of language and its relation to the structure of frequency dictionaries. *Études de linguistique appliquée*, *1*, 103–127.
+[64] [**^**](#ref-rosengrens-s)[**^**](#ref-rosengrens-kf) Rosengren, I. (1971). The quantitative concept of language and its relation to the structure of frequency dictionaries. *Études de linguistique appliquée*, *1*, 103–127.
-[64] [**^**](#ref-log-dice) Rychlý, P. (2008). A lexicographyer-friendly association score. In P. Sojka & A. Horák (Eds.), *Proceedings of Second Workshop on Recent Advances in Slavonic Natural Languages Processing*. Masaryk University
+[65] [**^**](#ref-log-dice) Rychlý, P. (2008). A lexicographyer-friendly association score. In P. Sojka & A. Horák (Eds.), *Proceedings of Second Workshop on Recent Advances in Slavonic Natural Languages Processing*. Masaryk University
-[65] [**^**](#ref-ald) [**^**](#ref-fald) [**^**](#ref-arf) [**^**](#ref-farf) [**^**](#ref-awt) [**^**](#ref-fawt) Savický, P., & Hlaváčová, J. (2002). Measures of word commonness. *Journal of Quantitative Linguistics*, *9*(3), 215–231. https://doi.org/10.1076/jqul.9.3.215.14124
+[66] [**^**](#ref-ald) [**^**](#ref-fald) [**^**](#ref-arf) [**^**](#ref-farf) [**^**](#ref-awt) [**^**](#ref-fawt) Savický, P., & Hlaváčová, J. (2002). Measures of word commonness. *Journal of Quantitative Linguistics*, *9*(3), 215–231. https://doi.org/10.1076/jqul.9.3.215.14124
-[66] [**^**](#ref-dices-coeff) Smadja, F., McKeown, K. R., & Hatzivassiloglou, V. (1996). Translating collocations for bilingual lexicons: A statistical approach. *Computational Linguistics*, *22*(1), pp. 1–38.
+[67] [**^**](#ref-dices-coeff) Smadja, F., McKeown, K. R., & Hatzivassiloglou, V. (1996). Translating collocations for bilingual lexicons: A statistical approach. *Computational Linguistics*, *22*(1), pp. 1–38.
-[67] [**^**](#ref-devereux-readability-index) Smith, E. A. (1961). Devereaux readability index. *Journal of Educational Research*, *54*(8), 298–303. https://doi.org/10.1080/00220671.1961.10882728
+[68] [**^**](#ref-devereux-readability-index) Smith, E. A. (1961). Devereaux readability index. *Journal of Educational Research*, *54*(8), 298–303. https://doi.org/10.1080/00220671.1961.10882728
-[68] [**^**](#ref-ari) Smith, E. A., & Senter, R. J. (1967). *Automated readability index*. Aerospace Medical Research Laboratories. https://apps.dtic.mil/sti/pdfs/AD0667273.pdf
+[69] [**^**](#ref-ari) Smith, E. A., & Senter, R. J. (1967). *Automated readability index*. Aerospace Medical Research Laboratories. https://apps.dtic.mil/sti/pdfs/AD0667273.pdf
-[69] [**^**](#ref-spache-grade-level) Spache, G. (1953). A new readability formula for primary-grade reading materials. *Elementary School Journal*, *53*(7), 410–413. https://doi.org/10.1086/458513
+[70] [**^**](#ref-spache-grade-level) Spache, G. (1953). A new readability formula for primary-grade reading materials. *Elementary School Journal*, *53*(7), 410–413. https://doi.org/10.1086/458513
-[70] [**^**](#ref-re) Szigriszt Pazos, F. (1993). *Sistemas predictivos de legibilidad del mensaje escrito: Formula de perspicuidad* [Doctoral dissertation, Complutense University of Madrid]. Biblos-e Archivo. https://repositorio.uam.es/bitstream/handle/10486/2488/3907_barrio_cantalejo_ines_maria.pdf?sequence=1&isAllowed=y
+[71] [**^**](#ref-re) Szigriszt Pazos, F. (1993). *Sistemas predictivos de legibilidad del mensaje escrito: Formula de perspicuidad* [Doctoral dissertation, Complutense University of Madrid]. Biblos-e Archivo. https://repositorio.uam.es/bitstream/handle/10486/2488/3907_barrio_cantalejo_ines_maria.pdf?sequence=1&isAllowed=y
-[71] [**^**](#ref-lfmd)[**^**](#ref-md) Thanopoulos, A., Fakotakis, N., & Kokkinakis, G. (2002). Comparative evaluation of collocation extraction metrics. In M. G. González & C. P. S. Araujo (Eds.), *Proceedings of the Third International Conference on Language Resources and Evaluation* (pp. 620–625). European Language Resources Association.
+[72] [**^**](#ref-lfmd)[**^**](#ref-md) Thanopoulos, A., Fakotakis, N., & Kokkinakis, G. (2002). Comparative evaluation of collocation extraction metrics. In M. G. González & C. P. S. Araujo (Eds.), *Proceedings of the Third International Conference on Language Resources and Evaluation* (pp. 620–625). European Language Resources Association.
-[72] [**^**](#ref-log-likehood-ratio-test-bayes-factor)[**^**](#ref-students-t-test-2-sample-bayes-factor) Wilson, A. (2013). Embracing Bayes Factors for key item analysis in corpus linguistics. In M. Bieswanger & A. Koll-Stobbe (Eds.), *New Approaches to the Study of Linguistic Variability* (pp. 3–11). Peter Lang.
+[73] [**^**](#ref-log-likehood-ratio-test-bayes-factor)[**^**](#ref-students-t-test-2-sample-bayes-factor) Wilson, A. (2013). Embracing Bayes Factors for key item analysis in corpus linguistics. In M. Bieswanger & A. Koll-Stobbe (Eds.), *New Approaches to the Study of Linguistic Variability* (pp. 3–11). Peter Lang.
-[73] [**^**](#ref-zhangs-distributional-consistency) Zhang, H., Huang, C., & Yu, S. (2004). Distributional consistency: As a general method for defining a core lexicon. In M. T. Lino, M. F. Xavier, F. Ferreira, R. Costa, & R. Silva (Eds.), *Proceedings of Fourth International Conference on Language Resources and Evaluation* (pp. 1119–1122). European Language Resources Association.
+[74] [**^**](#ref-zhangs-distributional-consistency) Zhang, H., Huang, C., & Yu, S. (2004). Distributional consistency: As a general method for defining a core lexicon. In M. T. Lino, M. F. Xavier, F. Ferreira, R. Costa, & R. Silva (Eds.), *Proceedings of Fourth International Conference on Language Resources and Evaluation* (pp. 1119–1122). European Language Resources Association.
diff --git a/doc/measures/readability/x_c50.svg b/doc/measures/readability/x_c50.svg index 7b578fd3c..8d1658e93 100644 --- a/doc/measures/readability/x_c50.svg +++ b/doc/measures/readability/x_c50.svg @@ -1,114 +1,122 @@ - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/doc/measures/readability/x_c50_new.svg b/doc/measures/readability/x_c50_new.svg new file mode 100644 index 000000000..a51e186a9 --- /dev/null +++ b/doc/measures/readability/x_c50_new.svg @@ -0,0 +1,112 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/doc/work_area/colligation_extractor_table.png b/doc/work_area/colligation_extractor_table.png deleted file mode 100644 index 30c14b2ff..000000000 Binary files a/doc/work_area/colligation_extractor_table.png and /dev/null differ diff --git a/doc/work_area/collocation_extractor_fig_network_graph.png b/doc/work_area/collocation_extractor_fig_network_graph.png deleted file mode 100644 index 3bf629131..000000000 Binary files a/doc/work_area/collocation_extractor_fig_network_graph.png and /dev/null differ diff --git a/doc/work_area/collocation_extractor_table.png b/doc/work_area/collocation_extractor_table.png deleted file mode 100644 index fb3f03b93..000000000 Binary files a/doc/work_area/collocation_extractor_table.png and /dev/null differ diff --git a/doc/work_area/concordancer_fig.png b/doc/work_area/concordancer_fig.png deleted file mode 100644 index a95e47e10..000000000 Binary files a/doc/work_area/concordancer_fig.png and /dev/null differ diff --git a/doc/work_area/concordancer_parallel_table.png b/doc/work_area/concordancer_parallel_table.png deleted file mode 100644 index 8c5a7ea6e..000000000 Binary files a/doc/work_area/concordancer_parallel_table.png and /dev/null differ diff --git a/doc/work_area/concordancer_table.png b/doc/work_area/concordancer_table.png deleted file mode 100644 index 6a0c7cbae..000000000 Binary files a/doc/work_area/concordancer_table.png and /dev/null differ diff --git a/doc/work_area/keyword_extractor_table.png b/doc/work_area/keyword_extractor_table.png deleted file mode 100644 index fa39e9750..000000000 Binary files a/doc/work_area/keyword_extractor_table.png and /dev/null differ diff --git a/doc/work_area/ngram_generator_table.png b/doc/work_area/ngram_generator_table.png deleted file mode 100644 index 6d0b2918a..000000000 Binary files a/doc/work_area/ngram_generator_table.png and /dev/null differ diff --git a/doc/work_area/profiler_table.png b/doc/work_area/profiler_table.png deleted file mode 100644 index 1d3f1a9f0..000000000 Binary files a/doc/work_area/profiler_table.png and /dev/null differ diff --git a/doc/work_area/wordlist_generator_fig_line_chart.png b/doc/work_area/wordlist_generator_fig_line_chart.png deleted file mode 100644 index 7a8391fc5..000000000 Binary files a/doc/work_area/wordlist_generator_fig_line_chart.png and /dev/null differ diff --git a/doc/work_area/wordlist_generator_fig_word_cloud.png b/doc/work_area/wordlist_generator_fig_word_cloud.png deleted file mode 100644 index 61a7bafeb..000000000 Binary files a/doc/work_area/wordlist_generator_fig_word_cloud.png and /dev/null differ diff --git a/doc/work_area/wordlist_generator_table.png b/doc/work_area/wordlist_generator_table.png deleted file mode 100644 index b5c9c0e5c..000000000 Binary files a/doc/work_area/wordlist_generator_table.png and /dev/null differ diff --git a/tests/wl_tests_measures/test_measures_readability.py b/tests/wl_tests_measures/test_measures_readability.py index 68d644e65..7066f65a7 100644 --- a/tests/wl_tests_measures/test_measures_readability.py +++ b/tests/wl_tests_measures/test_measures_readability.py @@ -178,18 +178,32 @@ def test_colemans_readability_formula(): assert cloze_pct_eng_12_4 == 1.04 * (9 / 12 * 100) + 1.06 * (3 / 12 * 100) + 0.56 * (0 / 12 * 100) - 0.36 * (0 / 12) - 26.01 assert cloze_pct_other_12 == 'no_support' -def test_dale_chall_readability_score(): - x_c50_eng_0 = wl_measures_readability.dale_chall_readability_score(main, test_text_eng_0) - x_c50_eng_12 = wl_measures_readability.dale_chall_readability_score(main, test_text_eng_12) - x_c50_spa_12 = wl_measures_readability.dale_chall_readability_score(main, test_text_spa_12) +def test_dale_chall_readability_formula(): + x_c50_eng_0 = wl_measures_readability.dale_chall_readability_formula(main, test_text_eng_0) + x_c50_eng_12 = wl_measures_readability.dale_chall_readability_formula(main, test_text_eng_12) + x_c50_spa_12 = wl_measures_readability.dale_chall_readability_formula(main, test_text_spa_12) - print('Dale-Chall Readibility Score:') + print('Dale-Chall Readability Formula:') print(f'\teng/0: {x_c50_eng_0}') print(f'\teng/12: {x_c50_eng_12}') print(f'\tspa/12: {x_c50_spa_12}') assert x_c50_eng_0 == 'text_too_short' - assert x_c50_eng_12 == 0.1579 * (1 / 12) + 0.0496 * (12 / 3) + 3.6365 + assert x_c50_eng_12 == 0.1579 * (1 / 12 * 100) + 0.0496 * (12 / 3) + 3.6365 + assert x_c50_spa_12 == 'no_support' + +def test_dale_chall_readability_formula_new(): + x_c50_eng_0 = wl_measures_readability.dale_chall_readability_formula_new(main, test_text_eng_0) + x_c50_eng_12 = wl_measures_readability.dale_chall_readability_formula_new(main, test_text_eng_12) + x_c50_spa_12 = wl_measures_readability.dale_chall_readability_formula_new(main, test_text_spa_12) + + print('Dale-Chall Readability Formula (New):') + print(f'\teng/0: {x_c50_eng_0}') + print(f'\teng/12: {x_c50_eng_12}') + print(f'\tspa/12: {x_c50_spa_12}') + + assert x_c50_eng_0 == 'text_too_short' + assert x_c50_eng_12 == 64 - 0.95 * (1 / 12 * 100) - 0.69 * (12 / 3) assert x_c50_spa_12 == 'no_support' def test_devereux_readability_index(): @@ -308,7 +322,7 @@ def test_formula_de_comprensibilidad_de_gutierrez_de_polini(): cp_spa_12 = wl_measures_readability.formula_de_comprensibilidad_de_gutierrez_de_polini(main, test_text_spa_12) cp_eng_12 = wl_measures_readability.formula_de_comprensibilidad_de_gutierrez_de_polini(main, test_text_eng_12) - print('Fórmula de comprensibilidad de Gutiérrez de Polini:') + print('Fórmula de Comprensibilidad de Gutiérrez de Polini:') print(f'\tspa/0: {cp_spa_0}') print(f'\tspa/12: {cp_spa_12}') print(f'\teng/12: {cp_eng_12}') @@ -518,7 +532,8 @@ def test_wiener_sachtextformel(): test_bormuths_gp() test_coleman_liau_index() test_colemans_readability_formula() - test_dale_chall_readability_score() + test_dale_chall_readability_formula() + test_dale_chall_readability_formula_new() test_devereux_readability_index() test_flesch_kincaid_grade_level() test_flesch_reading_ease() diff --git a/tests/wl_tests_work_area/test_profiler.py b/tests/wl_tests_work_area/test_profiler.py index 4cdbc1d29..8d70ff5ed 100644 --- a/tests/wl_tests_work_area/test_profiler.py +++ b/tests/wl_tests_work_area/test_profiler.py @@ -95,7 +95,7 @@ def update_gui(err_msg, texts_stats_files): count_tokens_lens_syls.append(collections.Counter(len_tokens_syls)) count_tokens_lens_chars.append(collections.Counter(len_tokens_chars)) - assert len(readability_statistics) == 25 + assert len(readability_statistics) == 26 # Counts assert count_paras diff --git a/wordless/wl_measures/wl_measures_readability.py b/wordless/wl_measures/wl_measures_readability.py index 8c7888562..84a57d5b6 100644 --- a/wordless/wl_measures/wl_measures_readability.py +++ b/wordless/wl_measures/wl_measures_readability.py @@ -170,12 +170,12 @@ def bormuths_cloze_mean(main, text): if text.count_sentences and text.count_words: ddl = get_count_words_dale(text.words_flat, 3000) m = ( - 0.886593 - - 0.083640 * (text.count_chars_alphabetic / text.count_words) + - 0.161911 * ((ddl / text.count_words)**3) - - 0.021401 * (text.count_words / text.count_sentences) + - 0.000577 * ((text.count_words / text.count_sentences)**2) - - 0.000005 * ((text.count_words / text.count_sentences)**3) + 0.886593 + - 0.083640 * (text.count_chars_alphabetic / text.count_words) + + 0.161911 * ((ddl / text.count_words)**3) + - 0.021401 * (text.count_words / text.count_sentences) + + 0.000577 * ((text.count_words / text.count_sentences)**2) + - 0.000005 * ((text.count_words / text.count_sentences)**3) ) else: m = 'text_too_short' @@ -193,9 +193,9 @@ def bormuths_gp(main, text): gp = m else: gp = ( - 4.275 + 12.881 * m - 34.934 * (m**2) + 20.388 * (m**3) + - 26.194 * c - 2.046 * (c**2) - 11.767 * (c**3) - - 44.285 * (m * c) + 97.620 * ((m * c)**2) - 59.538 * ((m * c)**3) + 4.275 + 12.881 * m - 34.934 * (m**2) + 20.388 * (m**3) + + 26.194 * c - 2.046 * (c**2) - 11.767 * (c**3) + - 44.285 * (m * c) + 97.620 * ((m * c)**2) - 59.538 * ((m * c)**3) ) else: gp = 'no_support' @@ -238,29 +238,29 @@ def colemans_readability_formula(main, text): if variant == '1': cloze_pct = ( - 1.29 * (count_words_1_syl / text.count_words * 100) - - 38.45 + 1.29 * (count_words_1_syl / text.count_words * 100) + - 38.45 ) elif variant == '2': cloze_pct = ( - 1.16 * (count_words_1_syl / text.count_words * 100) + - 1.48 * (text.count_sentences / text.count_words * 100) - - 37.95 + 1.16 * (count_words_1_syl / text.count_words * 100) + + 1.48 * (text.count_sentences / text.count_words * 100) + - 37.95 ) elif variant == '3': cloze_pct = ( - 1.07 * (count_words_1_syl / text.count_words * 100) + - 1.18 * (text.count_sentences / text.count_words * 100) + - 0.76 * (count_prons / text.count_words * 100) - - 34.02 + 1.07 * (count_words_1_syl / text.count_words * 100) + + 1.18 * (text.count_sentences / text.count_words * 100) + + 0.76 * (count_prons / text.count_words * 100) + - 34.02 ) elif variant == '4': cloze_pct = ( - 1.04 * (count_words_1_syl / text.count_words * 100) + - 1.06 * (text.count_sentences / text.count_words * 100) + - 0.56 * (count_prons / text.count_words * 100) - - 0.36 * (count_preps / text.count_words) - - 26.01 + 1.04 * (count_words_1_syl / text.count_words * 100) + + 1.06 * (text.count_sentences / text.count_words * 100) + + 0.56 * (count_prons / text.count_words * 100) + - 0.36 * (count_preps / text.count_words) + - 26.01 ) else: cloze_pct = 'text_too_short' @@ -269,18 +269,18 @@ def colemans_readability_formula(main, text): return cloze_pct -# Dale-Chall Readability Score +# Dale-Chall Readability Formula # References: # Dale, E., & Chall, J. S. (1948a). A formula for predicting readability. Educational Research Bulletin, 27(1), 11–20, 28. # Dale, E., & Chall, J. S. (1948b). A formula for predicting readability: Instructions. Educational Research Bulletin, 27(2), 37–54. -def dale_chall_readability_score(main, text): +def dale_chall_readability_formula(main, text): if text.lang.startswith('eng_'): text = get_counts(main, text) if text.count_words and text.count_sentences: count_difficult_words = get_count_words_dale(text.words_flat, 3000) x_c50 = ( - 0.1579 * (count_difficult_words / text.count_words) + 0.1579 * (count_difficult_words / text.count_words * 100) + 0.0496 * (text.count_words / text.count_sentences) + 3.6365 ) @@ -291,6 +291,26 @@ def dale_chall_readability_score(main, text): return x_c50 +# Dale-Chall Readability Formula (New) +# Reference: Chall, J. S., & Dale, E. (1995). Readability revisited: The new Dale-Chall readability formula. Brookline Books. +def dale_chall_readability_formula_new(main, text): + if text.lang.startswith('eng_'): + text = get_counts(main, text) + + if text.count_words and text.count_sentences: + count_difficult_words = get_count_words_dale(text.words_flat, 3000) + x_c50 = ( + 64 + - 0.95 * (count_difficult_words / text.count_words * 100) + - 0.69 * (text.count_words / text.count_sentences) + ) + else: + x_c50 = 'text_too_short' + else: + x_c50 = 'no_support' + + return x_c50 + # Devereux Readability Index # Reference: Smith, E. A. (1961). Devereaux readability index. Journal of Educational Research, 54(8), 298–303. https://doi.org/10.1080/00220671.1961.10882728 def devereux_readability_index(main, text): @@ -457,7 +477,7 @@ def forcast_grade_level(main, text): return rgl -# Fórmula de comprensibilidad de Gutiérrez de Polini +# Fórmula de Comprensibilidad de Gutiérrez de Polini # References: # Gutiérrez de Polini, L. E. (1972). Investigación sobre lectura en Venezuela [Paper presentation]. Primeras Jornadas de Educación Primaria, Ministerio de Educación, Caracas, Venezuela. # Rodríguez Trujillo, N. (1980). Determinación de la comprensibilidad de materiales de lectura por medio de variables lingüísticas. Lectura y Vida, 1(1). http://www.lecturayvida.fahce.unlp.edu.ar/numeros/a1n1/01_01_Rodriguez.pdf @@ -506,7 +526,10 @@ def gulpease_index(main, text): text = get_counts(main, text) if text.count_words: - gulpease_index = 89 + (300 * text.count_sentences - 10 * text.count_chars_alphabetic) / text.count_words + gulpease_index = ( + 89 + + (300 * text.count_sentences - 10 * text.count_chars_alphabetic) / text.count_words + ) else: gulpease_index = 'text_too_short' else: @@ -543,7 +566,10 @@ def gunning_fog_index(main, text): if len(syls) >= 4: count_hard_words += 1 - fog_index = 0.4 * (text.count_words / text.count_sentences + count_hard_words / text.count_words * 100) + fog_index = ( + 0.4 + * (text.count_words / text.count_sentences + count_hard_words / text.count_words * 100) + ) else: fog_index = 'text_too_short' else: diff --git a/wordless/wl_profiler.py b/wordless/wl_profiler.py index 23fc0543a..9273fe600 100644 --- a/wordless/wl_profiler.py +++ b/wordless/wl_profiler.py @@ -363,13 +363,14 @@ def __init__(self, parent): _tr('wl_profiler', "Bormuth's Grade Placement"), _tr('wl_profiler', 'Coleman-Liau Index'), _tr('wl_profiler', "Coleman's Readability Formula"), - _tr('wl_profiler', 'Dale-Chall Readability Score'), + _tr('wl_profiler', 'Dale-Chall Readability Formula'), + _tr('wl_profiler', 'Dale-Chall Readability Formula (New)'), _tr('wl_profiler', 'Devereaux Readability Index'), _tr('wl_profiler', 'Flesch-Kincaid Grade Level'), _tr('wl_profiler', 'Flesch Reading Ease'), _tr('wl_profiler', 'Flesch Reading Ease (Simplified)'), _tr('wl_profiler', 'FORCAST Grade Level'), - _tr('wl_profiler', 'Fórmula de comprensibilidad de Gutiérrez de Polini'), + _tr('wl_profiler', 'Fórmula de Comprensibilidad de Gutiérrez de Polini'), _tr('wl_profiler', 'Fórmula de Crawford'), _tr('wl_profiler', 'Gulpease Index'), _tr('wl_profiler', 'Gunning Fog Index'), @@ -1180,7 +1181,8 @@ def run(self): wl_measures_readability.bormuths_gp(self.main, text), wl_measures_readability.coleman_liau_index(self.main, text), wl_measures_readability.colemans_readability_formula(self.main, text), - wl_measures_readability.dale_chall_readability_score(self.main, text), + wl_measures_readability.dale_chall_readability_formula(self.main, text), + wl_measures_readability.dale_chall_readability_formula_new(self.main, text), wl_measures_readability.devereux_readability_index(self.main, text), wl_measures_readability.flesch_kincaid_grade_level(self.main, text), wl_measures_readability.flesch_reading_ease(self.main, text),