Skip to content

Extracting non-optimal balances and tie-breaking criteria #40

Description

@Dalcon2704

First of all, thank you for creating selbal and for the support you provide to the community.

I am using the selbal.cv function with dichotomous response (n ≈ 50 samples). To avoid the warning regarding species appearing in less than 20% of samples, I have pre-filtered my data, resulting in a dataframe with 43 species.

This is how I call the function:

AUC_max <- selbal.cv(x = df_X, y = Y,
                       n.iter = 30,
                       n.fold = 5,
                       maxV = 30,
                       opt.cri = "max")

I have a few questions regarding the selbal.cv output and its internal logic.

1. How to extract data for a non-optimal number of variables?

The selbal.cv function identifies an "optimal" number of variables (2 in my case). However, I am interested in obtaining balances that include more variables than the suggested optimum. I cannot get the user_numVar argument to work.

In the accuracy.nvar object, the plot shows the performance for a range of variables broader than the optimal number. Since the algorithm calculates these values for the representation, is there a way to extract the specific balances or the raw data for these other points?

Image

2. How does the method handle "ties" in pairwise balances?

When used the method published to obtain all the pairwise balances manually, I found multiple pairs of species whose balance prediction is equally correct (identical AUC).

In these cases, how does the method select the "best" balance among several candidates with the same performance? Does it follow the original order of the variables in the dataframe, or is there an additional internal criterion?
Thank you in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions