First of all, thank you for creating selbal and for the support you provide to the community.
I am using the selbal.cv function with dichotomous response (n ≈ 50 samples). To avoid the warning regarding species appearing in less than 20% of samples, I have pre-filtered my data, resulting in a dataframe with 43 species.
This is how I call the function:
AUC_max <- selbal.cv(x = df_X, y = Y,
n.iter = 30,
n.fold = 5,
maxV = 30,
opt.cri = "max")
I have a few questions regarding the selbal.cv output and its internal logic.
1. How to extract data for a non-optimal number of variables?
The selbal.cv function identifies an "optimal" number of variables (2 in my case). However, I am interested in obtaining balances that include more variables than the suggested optimum. I cannot get the user_numVar argument to work.
In the accuracy.nvar object, the plot shows the performance for a range of variables broader than the optimal number. Since the algorithm calculates these values for the representation, is there a way to extract the specific balances or the raw data for these other points?
2. How does the method handle "ties" in pairwise balances?
When used the method published to obtain all the pairwise balances manually, I found multiple pairs of species whose balance prediction is equally correct (identical AUC).
In these cases, how does the method select the "best" balance among several candidates with the same performance? Does it follow the original order of the variables in the dataframe, or is there an additional internal criterion?
Thank you in advance!
First of all, thank you for creating selbal and for the support you provide to the community.
I am using the
selbal.cvfunction with dichotomous response (n ≈ 50 samples). To avoid the warning regarding species appearing in less than 20% of samples, I have pre-filtered my data, resulting in a dataframe with 43 species.This is how I call the function:
I have a few questions regarding the
selbal.cvoutput and its internal logic.1. How to extract data for a non-optimal number of variables?
The
selbal.cvfunction identifies an "optimal" number of variables (2 in my case). However, I am interested in obtaining balances that include more variables than the suggested optimum. I cannot get theuser_numVarargument to work.In the
accuracy.nvarobject, the plot shows the performance for a range of variables broader than the optimal number. Since the algorithm calculates these values for the representation, is there a way to extract the specific balances or the raw data for these other points?2. How does the method handle "ties" in pairwise balances?
When used the method published to obtain all the pairwise balances manually, I found multiple pairs of species whose balance prediction is equally correct (identical AUC).
In these cases, how does the method select the "best" balance among several candidates with the same performance? Does it follow the original order of the variables in the dataframe, or is there an additional internal criterion?
Thank you in advance!