Skip to content

Commit

Permalink
paper: Interpretation of TLO etc. ref #31
Browse files Browse the repository at this point in the history
  • Loading branch information
chainsawriot committed Oct 15, 2020
1 parent 0a78277 commit e7b11a8
Show file tree
Hide file tree
Showing 4 changed files with 20 additions and 10 deletions.
24 changes: 14 additions & 10 deletions paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@ oolong_test
## An oolong test object with k = 20, 20 coded.
## 95% precision
## With 25 cases of topic intrusion test. 25 coded.
## TLO: -0.129
## TLO: -0.135
```

The suggested workflow is to have at least two human raters to do the same set of tests. Test object can be cloned to allow multiple raters to do the test. More than one test object can be studied together using the function `summarize_oolong()`.
Expand Down Expand Up @@ -145,25 +145,29 @@ oolong_test_rater2$lock()
Get a summary of the two objects.




```r
summarize_oolong(oolong_test_rater1, oolong_test_rater2)
```

```
## Mean model precision: 0.45
## Quantiles of model precision: 0.25, 0.35, 0.45, 0.55, 0.65
## Mean model precision: 0.3
## Quantiles of model precision: 0.25, 0.275, 0.3, 0.325, 0.35
## P-value of the model precision
## (H0: Model precision is not better than random guess): 0
## Krippendorff's alpha: 0.015
## (H0: Model precision is not better than random guess): 0.0494
## Krippendorff's alpha: 0.071
## K Precision:
## 0, 0.5, 1, 0, 1, 0.5, 1, 0, 0.5, 1, 0.5, 0, 0, 0.5, 0.5, 0.5, 0.5, 0.5, 0, 0.5
## Mean TLO: -2.18
## Median TLO: -2.31
## Quantiles of TLO: -4.85, -3.63, -2.31, -0.13, 0
## 0, 0, 0, 0, 0, 0.5, 1, 0, 0.5, 0, 0.5, 0, 0, 0.5, 0.5, 0, 0.5, 0.5, 0.5, 1
## Mean TLO: -1.9
## Median TLO: -1.54
## Quantiles of TLO: -6.05, -3.56, -1.54, 0, 0
## P-Value of the median TLO
## (H0: Median TLO is not better than random guess): 0.3047
## (H0: Median TLO is not better than random guess): 0.014
```

Two key indicators of semantic validity are mean model precision and median TLO. Please interpret the magnitude of the two values [see @chang2009reading] rather than the two statisical tests. The two statistical tests are testing whether the raters did better than random guess. Therefore, rejection of the null hypothesis is just the bare minimum of topic interpretability, *not* an indicator of adquate semantic validity of the topic model. Besides, please a very conservative significant level, e.g. alpha < 0.001.

# Semantic validation of dictionary-based methods

Dictionary-based methods such as AFINN [@nielsen2011new] can be validated by creating a gold standard dataset [@song2020validations]. Oolong provides a workflow for generating such gold standard dataset.
Expand Down
Binary file modified paper/paper.pdf
Binary file not shown.
6 changes: 6 additions & 0 deletions paper/paper.rmd
Original file line number Diff line number Diff line change
Expand Up @@ -140,10 +140,16 @@ oolong_test_rater2$lock()

Get a summary of the two objects.

```{r, include = FALSE}
set.seed(46709394)
```

```{r, step3}
summarize_oolong(oolong_test_rater1, oolong_test_rater2)
```

Two key indicators of semantic validity are mean model precision and median TLO. Please interpret the magnitude of the two values [see @chang2009reading] rather than the two statisical tests. The two statistical tests are testing whether the raters did better than random guess. Therefore, rejection of the null hypothesis is just the bare minimum of topic interpretability, *not* an indicator of adquate semantic validity of the topic model. Besides, please a very conservative significant level, e.g. alpha < 0.001.

# Semantic validation of dictionary-based methods

Dictionary-based methods such as AFINN [@nielsen2011new] can be validated by creating a gold standard dataset [@song2020validations]. Oolong provides a workflow for generating such gold standard dataset.
Expand Down
Binary file modified paper/paper_files/figure-latex/diagplot-1.pdf
Binary file not shown.

0 comments on commit e7b11a8

Please sign in to comment.