Skip to content

KMN: Results generated by CDF function does not sum to 1 #17

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
antemooo opened this issue Feb 11, 2021 · 0 comments
Open

KMN: Results generated by CDF function does not sum to 1 #17

antemooo opened this issue Feb 11, 2021 · 0 comments

Comments

@antemooo
Copy link

Dears,
For a CDE task, I am trying to use your package to estimate the error distribution on a dependent variable y based on a set of features in a variable X. My first trial was by using the KMN model created and trained on default parameters.
Creating, fitting, other functions all works without errors. However, when looking at the results generated by the CDF function, I noticed that they do not converge near to 1 (they do not sum to 1) no matter how much I try. I have generated CDF with different resolutions and range limits (number of instance to condition on and the start and end of the linespace) but still the same issue. Below are 3 screenshots generated on different resolutions. You can also see the results of the PDF which is actually matching what I am expecting the conditional distribution to look like. Eventually the part of the code related to this issue is included as well.

btw: I have also tried to check the mixture parameters, and the weights seem to sum to 1 (idk if that is relevant).

Two questions in here:

  • The KMN model that you are using seems to be different than what is described in the original paper of the KMN, especially in the case of the number of gaussians to use in the output, True?
  • Did you ever come across such a problem when you were testing or evaluating the model?

Remarks not related to the issue:

  • The predict_density() does not exist although it is generated in the documentation.
  • What is the effect of n_centers? is not that the output is calculated on the mixture of kernels/gaussians that are set using Kmeans? can one still build that out put on a specific subset of these gaussians? like 1 or 2, etc?

CDF Results

Screenshot 2021-02-11 at 15 54 39
Screenshot 2021-02-11 at 15 57 58
Screenshot 2021-02-11 at 16 49 25

PDF Results

Screenshot 2021-02-11 at 17 00 33

The model and the function to generate the CDF/PDF results

model = KernelMixtureNetwork("KDE_1", ndim_x=21, ndim_y=1, n_centers=50,
                             x_noise_std=0.2, y_noise_std=0.1, random_seed=22)
model.fit(X_train, y_train, eval_set=(X_test,y_test))

def get_instance_to_draw(instance, lower_limit=-14,upper_limit=14, resolution=1000):
    x_dist = np.array([instance for _ in range(resolution)])
    y_dist = np.linspace(lower_limit, upper_limit, num=resolution)
    pred_dist = model.pdf(x_dist,y_dist)
    mean, std = model.mean_std(x_dist[0].reshape(1,-1))
    return x_dist, y_dist,pred_dist, mean, std
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant