API: proof of concept #29

abhidg · 2024-11-21T12:06:35Z

An end user API should have at least the following points:

Patch constructor for each day
Node features for each patch
Node renumbering, nodes should be numbered $0 \ldots|V|-1$ where V is the full vertex set, but also need to be numbered $0 \ldots|P_k| -1$ where $P_k \subset V$ is a patch. There should be a way to get back the node numbering in the full vertex set and the node label
Generate embeddings for each patch using a method such as VGAE. Methods could be abstract classes or functions following a typing.Protocol
Align embeddings using a EmbeddingMethod (could be local2global, or the new method being developed).

Further processing of embeddings, such as using them for classification is out of scope for this issue.

abhidg · 2024-11-25T12:40:46Z

Sketch

from l2g import make_patch_graph, DataLoader, make_embedding, align_embeddings

# TODO: see what other graph embedding libraries use and try to be compatible
# L2Gv2 should be able to work with any embedding
from l2g.embeddings import VGAEEmbedding

# Local2Global is the old algorithm, ManifoldOptimizer the new one
from l2g.align import Local2Global, ManifoldOptimizer

# Load data
ds = DataLoader('l2gv2/nas')  # loads from web (HuggingFace?)

P = make_patch_graph(ds, patch_identifier: str | V -> str)
vgae = VGAEEmbedding(**kwargs)

# Create embeddings, can use trivial parallelism here (multiprocessing.Pool)
embs: dict[str, np.array] = make_embedding(vgae, P)  # calls emb.fit_transform(P[i]) for patch node i
# ^do node and edge embeddings need to be disambiguated?

# Alignment
aligner = ManifoldOptimizer()

# .fit() could generate the alignment criteria (scaling, orthogonal transformations and translation)
# whereas .fit_transform() applies it. Not clear whether keeping them separate makes sense.
X = aligner.fit_transform(embs)  # X is xarray with node labels

Need to consider how much of this is portable to large graphs (perhaps by using dask and xarray) - should the use of multiprocessors / GPU / cluster be transparent to user which adds complexity or we handle that ourselves (such as using CPU for toy datasets), allowing the user to override as necessary.

abhidg self-assigned this Nov 21, 2024

abhidg assigned lotzma and mihaeladuta Nov 28, 2024

abhidg changed the title ~~Start developing end-user API for temporal graphs~~ API: proof of concept Jan 10, 2025

abhidg added a commit that referenced this issue Jan 10, 2025

datasets: add node renumbering code #29

d37375d

abhidg mentioned this issue Jan 15, 2025

i29 api #39

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: proof of concept #29

API: proof of concept #29

abhidg commented Nov 21, 2024

abhidg commented Nov 25, 2024

API: proof of concept #29

API: proof of concept #29

Comments

abhidg commented Nov 21, 2024

abhidg commented Nov 25, 2024