Error in if (inputs$clusterID == inputs$panelID) { : argument is of length zero #58

jonah-allen · 2024-11-04T10:20:15Z

First of all, thank you for developing a fantastic package! I am having and issue implementing clusters. Could you please provide insight into why this error occurs or suggest any adjustments to handle clustering properly? Thank you for your support!

Description

I encountered an issue when running the logitr() function with the clusterID parameter specified. While the model runs successfully without clusterID, adding it results in an error. The error message is:

Error in if (inputs$clusterID == inputs$panelID) { :
  argument is of length zero

I created a clusterID column specifically to ensure the data type in the version column was not the issue. (The scenario columns are not currently in use but may be in the future).

Reproducible Example

Here is a sample structure of my dataset (wtp_risk):

# A tibble: 180 × 12
   version  risk loss_value id         original_choice scenario_a scenario_b scenario_c alt   choice obsID clusterID
   <fct>   <dbl>      <dbl> <chr>      <fct>                <dbl>      <dbl>      <dbl> <fct>  <dbl> <int>    <int>
 1 CO1         8         0  R_bf8TOsp… c                        2          3          1 a          0    53        1
 2 CO1         4       113. R_bf8TOsp… c                        2          3          1 b          0    53        1
 3 CO1         4       113. R_bf8TOsp… c                        2          3          1 c          1    53        1
 4 CO1         8         0  R_2tnw7mp… a                        1          2          2 a          1    27        2
 5 CO1         4       113. R_2tnw7mp… a                        1          2          2 b          0    27        2
 6 CO1         4       113. R_2tnw7mp… a                        1          2          2 c          0    27        2

Code to Reproduce

# Run without clusterID (successful)
mnl_pref <- logitr(
  data = wtp_risk,
  outcome = "choice",
  obsID = "obsID",
  pars = c("loss_value", "risk")
)

# Run with clusterID (error)
mnl_pref <- logitr(
  data = wtp_risk,
  outcome = "choice",
  obsID = "obsID",
  pars = c("loss_value", "risk"),
  clusterID = "clusterID"
)

Observations

The wtp_risk data frame does not contain NA values in obsID, clusterID, or version.
The clusterID was created to ensure correct data type handling.
The error seems related to how panelID is internally processed in the logitr function, even when panelID is not specified. (This is not panel data).
I have tried setting panelID = NULL specifically and the error still occurs.

Environment

R version: 4.2.3
logitr version: 1.1.2
macOS: 15.0.1

The text was updated successfully, but these errors were encountered:

jhelvy · 2024-11-04T10:29:38Z

Could you send a small portion of the data so I can replicate this error? Would only need the relevant variables used in the example: "choice", "obsID", "loss_value", "risk", "clusterID"

jonah-allen · 2024-11-04T10:39:53Z

sample_wtp_risk_data.csv

jonah-allen · 2024-11-04T10:41:58Z

Just realizing that only has cluster groups 1 and 2 included -- there are 25 cluster groups in my data....I think you can manually change those for replication but let me know if you need a different sample!

jhelvy · 2024-11-04T11:38:27Z

Okay I just ran this and I can replicate the error. It is perhaps a bug in the code, but I'm not sure if it should occur because I'm questioning the use of clusters here. Usually clustering is suggested when you have panel data. In your case, you have different versions. Is that just different versions of a choice experiment? If so then I'm not sure why you would want to cluster your errors around the version. Basically, I don't think clustering is needed.

If you do want to use clusters, then as a work around you can also set panelID = "clusterID" and it will work. With a MNL model there is no difference in the calculation of the log-likelihood with or without a panelID specified, so this will get you what you want without error. You just need to specify both clusterID and panelID like this:

m2 <- logitr(
  data      = wtp_risk,
  outcome   = "choice",
  obsID     = "obsID",
  pars      = c("loss_value", "risk"),
  clusterID = "clusterID",
  panelID   = "clusterID"
)

jonah-allen · 2024-11-05T11:24:52Z

Thanks very much, that fixed the issue!

Interesting -- my understanding was that clustering by the survey version is best practice because I have significant variation across survey versions; parameters vary (percent risk & percent profit loss) across three options, and the "source of risk" varies across half the surveys (half are viewed as "climate" and the other is "policy", without going into too much detail). I know that description is very general...but any resources you might be able to share on clustering in this case would be very appreciated!

jhelvy · 2024-11-05T16:59:24Z

I suppose that's a reasonable assumption. This is in general though an issue that I'll have to deal with because you should be able to use clusters without defining a panelID. This is a workaround for now, but I'll patch this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in if (inputs$clusterID == inputs$panelID) { : argument is of length zero #58

Error in if (inputs$clusterID == inputs$panelID) { : argument is of length zero #58

jonah-allen commented Nov 4, 2024

jhelvy commented Nov 4, 2024

jonah-allen commented Nov 4, 2024

jonah-allen commented Nov 4, 2024

jhelvy commented Nov 4, 2024

jonah-allen commented Nov 5, 2024

jhelvy commented Nov 5, 2024

Error in if (inputs$clusterID == inputs$panelID) { : argument is of length zero #58

Error in if (inputs$clusterID == inputs$panelID) { : argument is of length zero #58

Comments

jonah-allen commented Nov 4, 2024

Description

Reproducible Example

Code to Reproduce

Observations

Environment

jhelvy commented Nov 4, 2024

jonah-allen commented Nov 4, 2024

jonah-allen commented Nov 4, 2024

jhelvy commented Nov 4, 2024

jonah-allen commented Nov 5, 2024

jhelvy commented Nov 5, 2024