A question about the running time of UMAP #975
LaJollaClustering
started this conversation in
General
Replies: 2 comments 6 replies
-
Hi LaJollaClustering,
The actual SpectralEmbedding we are doing in UMAP is on the centroids of
the connected components of the UMAP complex (rescaled k nearest neighbour
graph). Its main advantage is that it initializes these disconnected
components into half reasonable locations within our space. Given that
there are typically very few connected components the computational cost is
minimal. I hope that helps.
…On Mon, Feb 27, 2023 at 5:22 PM LaJollaClustering ***@***.***> wrote:
Hi there:
I am trying to read the UMAP algorithm carefully because the idea is
really great!
I have quick question about the running time of UMAP. In this paper,
https://arxiv.org/pdf/1802.03426.pdf, it claims that the empirical
running time is N^{1.14} which is due to the construction of kNN graph.
However, in the Algorithm 4 (Spectral embedding), it has to calculate the
eigenvectors of a N x N matrix. I assume that both A,D,L are N x N
matrices. Did I make any mistakes?
—
Reply to this email directly, view it on GitHub
<#975>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AC3IUWXMAWGKORBB4SIXUJLWZUSIVANCNFSM6AAAAAAVJ7BAFU>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
1 reply
-
The initialization via a spectral approach uses a sparse matrix -- and the expected number of non-zeros is roughly O(kN), so the complexity is lower. It is also worth noting that, in practical terms, the spectral initialization is almost free in comparison to the nearest neighbor and layout optimization phases. |
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi there:
I am trying to read the UMAP algorithm carefully because the idea is really great!
I have quick question about the running time of UMAP. In this paper, https://arxiv.org/pdf/1802.03426.pdf, it claims that the empirical running time is N^{1.14} which is due to the construction of kNN graph.
However, in the Algorithm 4 (Spectral embedding), it has to calculate the eigenvectors of a N x N matrix. I assume that both A,D,L are N x N matrices. Did I make any mistakes? If A, D, L are N x N matrices, then the running time would be N^2, right?
Beta Was this translation helpful? Give feedback.
All reactions