Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: proof of concept #29

Open
5 tasks
abhidg opened this issue Nov 21, 2024 · 1 comment
Open
5 tasks

API: proof of concept #29

abhidg opened this issue Nov 21, 2024 · 1 comment
Assignees

Comments

@abhidg
Copy link
Contributor

abhidg commented Nov 21, 2024

An end user API should have at least the following points:

  • Patch constructor for each day
  • Node features for each patch
  • Node renumbering, nodes should be numbered $0 \ldots|V|-1$ where V is the full vertex set, but also need to be numbered $0 \ldots|P_k| -1$ where $P_k \subset V$ is a patch. There should be a way to get back the node numbering in the full vertex set and the node label
  • Generate embeddings for each patch using a method such as VGAE. Methods could be abstract classes or functions following a typing.Protocol
  • Align embeddings using a EmbeddingMethod (could be local2global, or the new method being developed).

Further processing of embeddings, such as using them for classification is out of scope for this issue.

@abhidg abhidg self-assigned this Nov 21, 2024
@abhidg
Copy link
Contributor Author

abhidg commented Nov 25, 2024

Sketch

from l2g import make_patch_graph, DataLoader, make_embedding, align_embeddings

# TODO: see what other graph embedding libraries use and try to be compatible
# L2Gv2 should be able to work with any embedding
from l2g.embeddings import VGAEEmbedding

# Local2Global is the old algorithm, ManifoldOptimizer the new one
from l2g.align import Local2Global, ManifoldOptimizer

# Load data
ds = DataLoader('l2gv2/nas')  # loads from web (HuggingFace?)

P = make_patch_graph(ds, patch_identifier: str | V -> str)
vgae = VGAEEmbedding(**kwargs)

# Create embeddings, can use trivial parallelism here (multiprocessing.Pool)
embs: dict[str, np.array] = make_embedding(vgae, P)  # calls emb.fit_transform(P[i]) for patch node i
# ^do node and edge embeddings need to be disambiguated?

# Alignment
aligner = ManifoldOptimizer()

# .fit() could generate the alignment criteria (scaling, orthogonal transformations and translation)
# whereas .fit_transform() applies it. Not clear whether keeping them separate makes sense.
X = aligner.fit_transform(embs)  # X is xarray with node labels

Need to consider how much of this is portable to large graphs (perhaps by using dask and xarray) - should the use of multiprocessors / GPU / cluster be transparent to user which adds complexity or we handle that ourselves (such as using CPU for toy datasets), allowing the user to override as necessary.

@abhidg abhidg changed the title Start developing end-user API for temporal graphs API: proof of concept Jan 10, 2025
abhidg added a commit that referenced this issue Jan 10, 2025
@abhidg abhidg mentioned this issue Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants