-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathreport.Rmd
141 lines (124 loc) · 6.85 KB
/
report.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
---
title: "Bayesian Estimates of Anti-Bayesian Psychometric Curves"
author: "John Pearson"
date: "12/28/2021"
output: pdf_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE, warning = FALSE, message = FALSE)
require("rstan")
rstan_options(auto_write=TRUE)
require("tidyverse")
# require("latex2exp")
require(booktabs)
require(kable)
```
```{r}
dat <- read_csv("data/discrimination.csv")
# layer_names = levels(as.factor(dat$Layer))
# protein_names = levels(as.factor(dat$Immunolabeled_Protein))
```
# Model specification:
As detailed in the notes, given an _objective_ target movement $x$, the probability of reporting a jump is
\begin{equation}
D(x) = \int_{|\tilde{x}| > \tilde{x}_c} p(\tilde{x}|x)\, dx
= \Phi\left(-\frac{\tilde{x}_c + x}{\sigma_t} \right) + 1 - \Phi\left(\frac{\tilde{x}_c - x}{\sigma_t} \right) \label{norm_cdf_D}
\end{equation}
where $\sigma_t$ is ``target noise'' that represents the uncertainty due to target size and
\begin{equation}
\tilde{x}_c^2 \equiv \frac{2 \log \frac{p(\neg J)}{p(J)} + \log \frac{\sigma_t^2 + \sigma_J^2}{\sigma_t^2 + \sigma_{\neg J}^2}}{\frac{1}{\sigma_t^2 + \sigma_{\neg J}^2} - \frac{1}{\sigma_t^2 + \sigma_J^2}},
\end{equation}
where $p(J)$ and $p(\neg J)$ are the prior probability of jump and no-jump, respectively, and $\sigma^2_J$ and $\sigma^2_{\neg J}$ are the widths of the jump and no-jump distributions.
<!-- For modeling the data, we take $\sigma^2_{\neg J} = 0$, so that this simplifies to -->
<!-- \begin{equation} -->
<!-- \frac{\tilde{x}_c^2}{\sigma_t^2} \equiv \left[2 \log \frac{p(\neg J)}{p(J)} + \log \left(1 + \frac{\sigma_J^2}{\sigma_t^2}\right) \right]\left(1 + \left(\frac{\sigma^2_J}{\sigma^2_t} \right)^{-1} \right), \label{x_critical} -->
<!-- \end{equation} -->
Thus the psychometric curve is a function of a width parameter $\sigma_t$ and a threshold that depends on both the prior log odds and the widths of the jump and no-jump distributions. In theory, all of these parameters are experimentally determined, but we want to fit them empirically.
```{r, fig.width=4, fig.height=3}
xc <- function(st, sJ, snJ, logodds){
numerator <- 2 * logodds + log(st^2 + sJ^2) - log(st^2 + snJ^2)
denominator <- 1/(st^2 + snJ^2) - 1/(st^2 + sJ^2)
return(sqrt(numerator/denominator))
}
plot_data <- expand_grid(st=c(0.2, 0.4, 0.6), sJ=c(2.5), snJ=c(0.2, 0.4),
logodds=seq(from=-log(5), to=log(5), by=0.01)) %>%
mutate(xc=xc(st, sJ, snJ, logodds), snJ=as.factor(snJ), st=as.factor(st),
id=stringr::str_c("st=", st, "; snJ=", snJ))
plot_data %>% ggplot(mapping=aes(x=logodds, y=xc, color=st, linetype=snJ)) + geom_path()
```
```{r}
conditions <- dat %>% filter(Monkey==1) %>%
select(TargetWidth, isSaccade, Prior) %>%
distinct()
```
The primary difficulty in doing this is a lack of identifiability between changes in the log prior odds, $\log \frac{p(\neg J)}{p(J)}$ and the width of the jump distribution $\sigma^2_J$ and non-jump distribution $\sigma^2_{\neg J}$. In the data, these take on discrete values $\log \frac{p(\neg J)}{p(J)} \in \lbrace 0, \pm \log 4 \rbrace$, $\sigma_J = 2.5$, $\sigma_{\neg J} = 0.2$, $\sigma_t \in \lbrace 0.5, 1.25, 1.75, 2 \rbrace$ (only one of the two largest of these per monkey). It is also assumed that changes in motor noise due to saccadic responses will result in a subjective change in $\sigma^2_{\neg J}$. If we wish to fit all these parameters empirically, this requires $3 + 1 + 3 + 2 = 9$ free parameters, whereas the existing conditions (for Monkey 1) are:
```{r}
knitr::kable(conditions, row.names = TRUE)
```
which makes a unique identification of parameters possible in theory.\footnote{It is also worth noting that the \emph{width} of the psychometric curve gives an independent estimate of $\sigma_t$, so that we have one fewer free parameter to estimate from $\tilde{x}_c$ than it might appear.}
Now, we assume in addition that the subjective estimates of quantities like $\sigma_t$ increase monotonically with their true underlying values, which leads us to the following strategy: First, we convert these parameters to factors. Second, we use a difference coding scheme between factor levels, as summarized below:
\begin{table}[h]
\centering
\begin{tabular}{l l}
Parameter & Model \\
\midrule
$\sigma^2_J$ & \texttt{sJ} \\
$\sigma^2_{\neg J}$ (\texttt{isSaccade == 0}) & \texttt{snJ} \\
$\sigma^2_{\neg J}$ (\texttt{isSaccade == 1}) & \texttt{snJ + dsnJ} \\
$\sigma^2_t$ (\texttt{TargetWidth == 0.5}) & \texttt{st} \\
$\sigma^2_t$ (\texttt{TargetWidth == 1.25}) & \texttt{st + dst1} \\
$\sigma^2_t$ (\texttt{TargetWidth == 1.75} or \texttt{2.0}) & \texttt{st + dst1 + dst2} \\
$\log \frac{p(\neg J)}{p(J)}$ (\texttt{Prior == 0.5}) & \texttt{pr\_odds} \\
$\log \frac{p(\neg J)}{p(J)}$ (\texttt{Prior == 0.2}) & \texttt{pr\_odds + dlpu} \\
$\log \frac{p(\neg J)}{p(J)}$ (\texttt{Prior == 0.8}) & \texttt{pr\_odds - dlpl}
\end{tabular}
\end{table}
Note that, in this scheme, all the \texttt{d*} variables are restricted to be positive to enforce the monotonicity described above.
# Running the model
```{r, echo=FALSE, warning=FALSE, message=FALSE, error=FALSE, results='hide'}
stan_dat <- filter(dat, Monkey==1) %>% slice_sample(prop=1)
TargetWidth <- as.integer(as.factor(dat$TargetWidth))
Prior <- as.integer(as.factor(dat$Prior))
rstan_data <- list(Nrows=dim(dat)[1],
response=dat$Response,
jump=dat$`Jump Size`,
isSaccade=as.integer(dat$isSaccade),
midwidth=as.integer(TargetWidth==2),
maxwidth=as.integer(TargetWidth==3),
minprior=as.integer(Prior==1),
maxprior=as.integer(Prior==3)
)
fit <- stan(
file = "basic_model.stan", # Stan program
data = rstan_data, # named list of data
chains = 4, # number of Markov chains
warmup = 1000, # number of warmup iterations per chain
iter = 2000, # total number of iterations per chain
cores = 4, # number of cores (could use one per chain)
refresh = 100 # no progress shown
)
```
```{r}
traceplot(fit, c("sJ", "snJ", "dsnJ", "st", "dst1", "dst2", "pr_odds", "dlpl", "dlpu", "theta"))
```
```{r}
fit_summary <- summary(fit, pars=c("sJ", "snJ", "dsnJ", "st", "dst1", "dst2", "pr_odds", "dlpl", "dlpu", "theta"))
print(fit_summary$summary)
```
```{r}
pars <- rstan::extract(fit, c("sJ", "snJ", "dsnJ",
"st", "dst1", "dst2",
"pr_odds", "dlpl", "dlpu",
"theta[1]", "theta[2]", "theta[3]")) %>%
as_tibble
```
```{r}
# calculate summary statistics
pars_stats <- pars %>% pivot_longer(cols=everything()) %>%
group_by(name) %>%
summarize(low=quantile(value, 0.05), med=median(value), high=quantile(value, 0.95))
p <- ggplot(pars_stats) +
geom_pointrange(aes(ymin=low, ymax=high, x=name, y=med)) +
ylab("variable")
p
```