Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ideas for the far future #23

Open
lindeloev opened this issue Oct 13, 2019 · 7 comments
Open

Ideas for the far future #23

lindeloev opened this issue Oct 13, 2019 · 7 comments
Labels
enhancement New feature or request

Comments

@lindeloev
Copy link
Owner

lindeloev commented Oct 13, 2019

Here are some ideas which could use some discussion and careful consideration. It extends the current model specification: https://lindeloev.github.io/mcp/articles/formulas.html

In the order from "soonish" (top) to "in your dreams":

Survival models

Survival models are relatively simple and we should support them, including censoring too. The API for the model itself would be something like brms:

model = list(
    eventtime | cens(status) ~ 0,
    ~ 0 + x
)

It should support both exponential decay and Cox proportional hazards. This would probably be specified via the mcp(..., family = ) argument, but I'm unsure what would be the best.

Slope on change points

If there are multiple (piecewise) lines over a single change point, and each line is associated with a different parameter x, we can use that to predict the change point. For example, we could assess subitizing in participants of varying age, and it would be reasonable to expect the subitizing range (location of first change point) to increase with age in childhood and decrease with age in adulthood.

How to implement this in a formula, I'm unsure. Maybe it has to be in the random effect: (1 + x | age) since that specifies the grouping of multiple lines. This would also ensure that the parameter names stay intact. cp_i, cp_i_age, and then probably cp_i_x.

Multivariate regression

Multiple response variables predicted from single change points. Something like

model = list(
    c(y1, y2) ~ 1 + x,  # segment 1
    ~ 0  # seg2: joined intercept
)

This could be merged with "Variance change (Specify y)" to specify a change in one response but not the other:

model = list(
    c(y1, y2) ~ 1 + x,  # segment 1
    ~ 0  # seg2: joined intercept on y1 and y2, segment 2
    y1 ~ 1 ~ x  # seg3: slope change in y1 but not y2
)

ARIMA

mcp currently supports AR(N) models. It should go more general. Take a look at:

@lindeloev lindeloev added the enhancement New feature or request label Oct 29, 2019
@lindeloev lindeloev mentioned this issue Nov 5, 2019
@josswright
Copy link

I've just come across this package whilst coding up some changepoint models directly. The specific application I've been working in is word/letter frequencies for stylometry. I'm interested to know if categorical/multinomial models could fit into this framework?

(This package looks amazing, by the way!)

@lindeloev
Copy link
Owner Author

@josswright Yes, mathematically and in JAGS it's straightforward to extend mcp to include multinomial models, as long as there is a continuous predictor. Just need mcp to count the number of categories in the dependent variable (N), to set up a logistic model for each of them (dbern), and to set up a proper parameterization of the priors on the regression coefficients (probably a dirichlet). The change point will then define a location on x where the N-1 regression coefficients change.

I hope to get time to do this after adding survival analysis, but any pull requests are welcome :-)

@josswright
Copy link

@lindeloev Fantastic! I'm more familiar with Stan than JAGS, but if I find the time then I'll definitely take a look!

@jpzhangvincent
Copy link

I'm working on abrupt time series forecasting problem and interested in change point/anomaly detection integrated with forecasting algorithms. This package is very promising. Just curious, are there particular reasons to choose JAGS over Stan as the underlying Bayesian software? IMO, stan gains more popularity and extensibility in the community, like "prophet", "tidybayes" and "shinystan". I'm wondering whether it would be considered to include an option to choose "stan" as a backend so we can get a stan model output to integrate with other "stan" based packages.

@lindeloev
Copy link
Owner Author

@jpzhangvincent I chose JAGS because:

  1. Ease of installation. I have had many students struggle installing RTools. I myself recently tried installing RTools4 without success.
  2. Total speed. Many problems are solved in ~2-5 seconds in JAGS. But will take 30+ seconds because of compilation with stan. I did see that stan was working on some improvements that will bring down compilation time down considerably, though (?),
  3. I'm more familiar with JAGS :-)

If mcp is beginning to get more widely adopted, I would definitely add stan support for the reasons your state! I anticipated this early on so it would not require a major re-write. Only a few hundred of the ~2000 lines of code in mcp are JAGS-specific and the others should be quite sampler-agnostic.

I'd be happy to hear your thoughts here!

@jpzhangvincent
Copy link

@jpzhangvincent I chose JAGS because:

  1. Ease of installation. I have had many students struggle installing RTools. I myself recently tried installing RTools4 without success.
  2. Total speed. Many problems are solved in ~2-5 seconds in JAGS. But will take 30+ seconds because of compilation with stan. I did see that stan was working on some improvements that will bring down compilation time down considerably, though (?),
  3. I'm more familiar with JAGS :-)

If mcp is beginning to get more widely adopted, I would definitely add stan support for the reasons your state! I anticipated this early on so it would not require a major re-write. Only a few hundred of the ~2000 lines of code in mcp are JAGS-specific and the others should be quite sampler-agnostic.

I'd be happy to hear your thoughts here!

Thanks for your response! I'd like to deep dive into the codebase and see anything I can contribute to. Can you give me some pointers where I can better understand how the model is specified in JAGS?

@lindeloev
Copy link
Owner Author

lindeloev commented May 18, 2020

@jpzhangvincent Yes, definitely!

fit = mcp(...) returns an mcpfit object and you can see the JAGS code using cat(fit$jags_code).

This code is mostly generated in R/get_jagscode.R and R/get_formula.R using the abstract representation (the "Segment table") which is also saved to fit$.other$ST - one row per segment. ST is not a fixed API but will be in flux as new features are added and old features are refined.

Adding support for stan would be the business of making a get_stancode.R and add a function get_formula_str_stan to the get_formula.R file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants