Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parametric UMAP saves multiple copies of full input data #1118

Open
bnelsj opened this issue May 2, 2024 · 2 comments
Open

Parametric UMAP saves multiple copies of full input data #1118

bnelsj opened this issue May 2, 2024 · 2 comments

Comments

@bnelsj
Copy link

bnelsj commented May 2, 2024

Parametric UMAP stores multiple copies of the full input data, but these are unnecessary for transforming new data points. By deleting self._raw_data and self._knn_search_index._raw_data from my Parametric UMAP model object, I was able to reduce the size of the saved model from 90 GB to 300 MB (the input data is a distance matrix with 80K locations). This might not work for models that require additional training, but perhaps should be an option when model size is an issue?

@timsainb
Copy link
Collaborator

timsainb commented May 3, 2024 via email

@bartbroere
Copy link

I'm having the same issue, where I want the trained model to be as small as possible (the inference machine does not have as much memory as the training machine). I'll link a PR where I added a parameter to remove the raw data to the save method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants