Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It is not possible make equal discretization #136

Open
arilwan opened this issue Jun 25, 2024 · 0 comments
Open

It is not possible make equal discretization #136

arilwan opened this issue Jun 25, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@arilwan
Copy link

arilwan commented Jun 25, 2024

I kept receiving the following error when computing meta-features in my dataset:

site-packages/pymfe/_internal.py:1568: UserWarning: It is not possible make equal discretization
  warnings.warn("It is not possible make equal discretization")

At first, thought this happens because my dataset exhibits class imbalance, to a generate pseudo data, having class class proportion matching the dataset, like so:

X, y = make_classification(n_samples=10_000,n_features=80, weights=[0.23, 0.03, 0.16, 0.52],
    n_informative=60, n_classes=5, random_state=42)

extractor = MFE(features=[ "t1"], groups=["complexity"],
                  summary=["min", "max", "mean", "sd"])
extractor.fit(X,y)
extractor.extract()

This works fine, without that warning.

I understand this got to do with _equal_freq_discretization(), precisely:

def _equal_freq_discretization(
    data: np.ndarray, num_bins: int, tol: float = 1e-8
) -> np.ndarray:
    """Discretize a 1-D numeric array into an equal-frequency histogram."""
    hist_divs = np.quantile(data, np.linspace(0, 1, num_bins + 1)[1:])

    # Sometimes the 'hist_divs' is not appropriated.
    # For example when all values are constants. It implies in 'hist_divs'
    # repetitive values.
    # To avoid partitions with the same value, we check if all partitions are
    # different. Unfortunately, it leads to a non-equal frequency
    # discretization.
    prev_size = hist_divs.size

    hist_divs = hist_divs[np.append(True, np.diff(hist_divs) > tol)]

    if prev_size != hist_divs.size:
        warnings.warn("It is not possible make equal discretization")

    hist_divs = np.unique(hist_divs)

    return np.digitize(x=data, bins=hist_divs, right=True)

But I cannot understand what's causing the warning (if I can avoid it), otherwise how would it affect final result?

@arilwan arilwan added the bug Something isn't working label Jun 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant