Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in if (inputs$clusterID == inputs$panelID) { : argument is of length zero #58

Open
jonah-allen opened this issue Nov 4, 2024 · 6 comments

Comments

@jonah-allen
Copy link

First of all, thank you for developing a fantastic package! I am having and issue implementing clusters. Could you please provide insight into why this error occurs or suggest any adjustments to handle clustering properly? Thank you for your support!

Description

I encountered an issue when running the logitr() function with the clusterID parameter specified. While the model runs successfully without clusterID, adding it results in an error. The error message is:

Error in if (inputs$clusterID == inputs$panelID) { :
  argument is of length zero

I created a clusterID column specifically to ensure the data type in the version column was not the issue. (The scenario columns are not currently in use but may be in the future).

Reproducible Example

Here is a sample structure of my dataset (wtp_risk):

# A tibble: 180 × 12
   version  risk loss_value id         original_choice scenario_a scenario_b scenario_c alt   choice obsID clusterID
   <fct>   <dbl>      <dbl> <chr>      <fct>                <dbl>      <dbl>      <dbl> <fct>  <dbl> <int>    <int>
 1 CO1         8         0  R_bf8TOspc                        2          3          1 a          0    53        1
 2 CO1         4       113. R_bf8TOspc                        2          3          1 b          0    53        1
 3 CO1         4       113. R_bf8TOspc                        2          3          1 c          1    53        1
 4 CO1         8         0  R_2tnw7mpa                        1          2          2 a          1    27        2
 5 CO1         4       113. R_2tnw7mpa                        1          2          2 b          0    27        2
 6 CO1         4       113. R_2tnw7mpa                        1          2          2 c          0    27        2

Code to Reproduce

# Run without clusterID (successful)
mnl_pref <- logitr(
  data = wtp_risk,
  outcome = "choice",
  obsID = "obsID",
  pars = c("loss_value", "risk")
)

# Run with clusterID (error)
mnl_pref <- logitr(
  data = wtp_risk,
  outcome = "choice",
  obsID = "obsID",
  pars = c("loss_value", "risk"),
  clusterID = "clusterID"
)

Observations

  • The wtp_risk data frame does not contain NA values in obsID, clusterID, or version.
  • The clusterID was created to ensure correct data type handling.
  • The error seems related to how panelID is internally processed in the logitr function, even when panelID is not specified. (This is not panel data).
  • I have tried setting panelID = NULL specifically and the error still occurs.

Environment

  • R version: 4.2.3
  • logitr version: 1.1.2
  • macOS: 15.0.1
@jhelvy
Copy link
Owner

jhelvy commented Nov 4, 2024

Could you send a small portion of the data so I can replicate this error? Would only need the relevant variables used in the example: "choice", "obsID", "loss_value", "risk", "clusterID"

@jonah-allen
Copy link
Author

sample_wtp_risk_data.csv

@jonah-allen
Copy link
Author

Just realizing that only has cluster groups 1 and 2 included -- there are 25 cluster groups in my data....I think you can manually change those for replication but let me know if you need a different sample!

@jhelvy
Copy link
Owner

jhelvy commented Nov 4, 2024

Okay I just ran this and I can replicate the error. It is perhaps a bug in the code, but I'm not sure if it should occur because I'm questioning the use of clusters here. Usually clustering is suggested when you have panel data. In your case, you have different versions. Is that just different versions of a choice experiment? If so then I'm not sure why you would want to cluster your errors around the version. Basically, I don't think clustering is needed.

If you do want to use clusters, then as a work around you can also set panelID = "clusterID" and it will work. With a MNL model there is no difference in the calculation of the log-likelihood with or without a panelID specified, so this will get you what you want without error. You just need to specify both clusterID and panelID like this:

m2 <- logitr(
  data      = wtp_risk,
  outcome   = "choice",
  obsID     = "obsID",
  pars      = c("loss_value", "risk"),
  clusterID = "clusterID",
  panelID   = "clusterID"
)

@jonah-allen
Copy link
Author

Thanks very much, that fixed the issue!

Interesting -- my understanding was that clustering by the survey version is best practice because I have significant variation across survey versions; parameters vary (percent risk & percent profit loss) across three options, and the "source of risk" varies across half the surveys (half are viewed as "climate" and the other is "policy", without going into too much detail). I know that description is very general...but any resources you might be able to share on clustering in this case would be very appreciated!

@jhelvy
Copy link
Owner

jhelvy commented Nov 5, 2024

I suppose that's a reasonable assumption. This is in general though an issue that I'll have to deal with because you should be able to use clusters without defining a panelID. This is a workaround for now, but I'll patch this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants