You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An end user API should have at least the following points:
Patch constructor for each day
Node features for each patch
Node renumbering, nodes should be numbered $0 \ldots|V|-1$ where V is the full vertex set, but also need to be numbered $0 \ldots|P_k| -1$ where $P_k \subset V$ is a patch. There should be a way to get back the node numbering in the full vertex set and the node label
Generate embeddings for each patch using a method such as VGAE. Methods could be abstract classes or functions following a typing.Protocol
Align embeddings using a EmbeddingMethod (could be local2global, or the new method being developed).
Further processing of embeddings, such as using them for classification is out of scope for this issue.
The text was updated successfully, but these errors were encountered:
froml2gimportmake_patch_graph, DataLoader, make_embedding, align_embeddings# TODO: see what other graph embedding libraries use and try to be compatible# L2Gv2 should be able to work with any embeddingfroml2g.embeddingsimportVGAEEmbedding# Local2Global is the old algorithm, ManifoldOptimizer the new onefroml2g.alignimportLocal2Global, ManifoldOptimizer# Load datads=DataLoader('l2gv2/nas') # loads from web (HuggingFace?)P=make_patch_graph(ds, patch_identifier: str|V->str)
vgae=VGAEEmbedding(**kwargs)
# Create embeddings, can use trivial parallelism here (multiprocessing.Pool)embs: dict[str, np.array] =make_embedding(vgae, P) # calls emb.fit_transform(P[i]) for patch node i# ^do node and edge embeddings need to be disambiguated?# Alignmentaligner=ManifoldOptimizer()
# .fit() could generate the alignment criteria (scaling, orthogonal transformations and translation)# whereas .fit_transform() applies it. Not clear whether keeping them separate makes sense.X=aligner.fit_transform(embs) # X is xarray with node labels
Need to consider how much of this is portable to large graphs (perhaps by using dask and xarray) - should the use of multiprocessors / GPU / cluster be transparent to user which adds complexity or we handle that ourselves (such as using CPU for toy datasets), allowing the user to override as necessary.
An end user API should have at least the following points:
Further processing of embeddings, such as using them for classification is out of scope for this issue.
The text was updated successfully, but these errors were encountered: