KMN: Results generated by CDF function does not sum to 1 #17

antemooo · 2021-02-11T16:21:39Z

Dears,
For a CDE task, I am trying to use your package to estimate the error distribution on a dependent variable y based on a set of features in a variable X. My first trial was by using the KMN model created and trained on default parameters.
Creating, fitting, other functions all works without errors. However, when looking at the results generated by the CDF function, I noticed that they do not converge near to 1 (they do not sum to 1) no matter how much I try. I have generated CDF with different resolutions and range limits (number of instance to condition on and the start and end of the linespace) but still the same issue. Below are 3 screenshots generated on different resolutions. You can also see the results of the PDF which is actually matching what I am expecting the conditional distribution to look like. Eventually the part of the code related to this issue is included as well.

btw: I have also tried to check the mixture parameters, and the weights seem to sum to 1 (idk if that is relevant).

Two questions in here:

The KMN model that you are using seems to be different than what is described in the original paper of the KMN, especially in the case of the number of gaussians to use in the output, True?
Did you ever come across such a problem when you were testing or evaluating the model?

Remarks not related to the issue:

The predict_density() does not exist although it is generated in the documentation.
What is the effect of n_centers? is not that the output is calculated on the mixture of kernels/gaussians that are set using Kmeans? can one still build that out put on a specific subset of these gaussians? like 1 or 2, etc?

CDF Results

PDF Results

The model and the function to generate the CDF/PDF results

model = KernelMixtureNetwork("KDE_1", ndim_x=21, ndim_y=1, n_centers=50,
                             x_noise_std=0.2, y_noise_std=0.1, random_seed=22)
model.fit(X_train, y_train, eval_set=(X_test,y_test))

def get_instance_to_draw(instance, lower_limit=-14,upper_limit=14, resolution=1000):
    x_dist = np.array([instance for _ in range(resolution)])
    y_dist = np.linspace(lower_limit, upper_limit, num=resolution)
    pred_dist = model.pdf(x_dist,y_dist)
    mean, std = model.mean_std(x_dist[0].reshape(1,-1))
    return x_dist, y_dist,pred_dist, mean, std

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KMN: Results generated by CDF function does not sum to 1 #17

KMN: Results generated by CDF function does not sum to 1 #17

antemooo commented Feb 11, 2021

KMN: Results generated by CDF function does not sum to 1 #17

KMN: Results generated by CDF function does not sum to 1 #17

Comments

antemooo commented Feb 11, 2021

CDF Results

PDF Results

The model and the function to generate the CDF/PDF results