CS224W - Bag of Tricks for Node Classification with GNN - Label Usage #2

mattjhayes3 · 2024-12-05T09:49:26Z

Add label_usage.py to torch_geometric.nn.models and unit test.

Part of #4 (TODO update this) for our final project for the Stanford CS224W course, this implements Label Usage as described in “Bag of Tricks for Node Classification with Graph Neural Networks”.

Description of Label Usage

Label usage utilizes true labels as features and learns to predict other labels.
Within label usage, label reuse is done to recycle predicted soft labels of previous iterations and uses the labels as input
When in training, label usage performs a split on the training nodes and predicts all other unlabeled nodes
In evaluation, label usage doesn't perform split but uses true labels for training nodes and predicts on validation and test nodes
The base model should have input dimensions of num_features + num_classes to accommodate concatenation of labels with features

Benchmark

Run on Arxiv (ogbn-arvix) dataset with 100 epochs, 10 recycling iterations, and a split ratio of 0.6 using a GAT model.

Dataset	Val Accuracy (%)	Test Accuracy (%)
Arxiv	69.32	68.53

…geometric into label_usage

for more information, see https://pre-commit.ci

torch_geometric/nn/models/label_usage.py

mattjhayes3 · 2024-12-05T09:50:58Z

torch_geometric/nn/models/label_usage.py

+        """
+        # re-assign train_idx to be nodes in mini-batch
+        if batch is not None:
+            train_idx = batch


batch is never used again, so I'm not sure it makes sense to have both batch and train_idx parameters.
@liuvince is current impl what you had in mind for minibatches? In the case of minibatching I think x and y would just not be the whole dataset.

I can remove batch and indicate that train_idx can be used for batch indices since batches passed contain the global indices. That seems more intuitive now that I read your comment and just from looking through it.

Can we name it mask to ensure aligned naming with https://pytorch-geometric.readthedocs.io/en/2.5.2/generated/torch_geometric.nn.models.LabelPropagation.html and https://pytorch-geometric.readthedocs.io/en/2.5.2/generated/torch_geometric.nn.models.CorrectAndSmooth.html for example ?

I propose the following logic:
in training mode (when self.training==True):

when mask is set to None, we should perform the split in the forward pass.

when mask is set to not-null value, we assume that it corresponds to nodes which labels are used as node features.
in test mode:

when mask is set to None, we use the whole input as unlabeled data.

when mask is set to not-null value, we assume that it corresponds to nodes which labels are used as node features which is the whole or sampled training dataset.
does it make sense ?

Yeah I can rename it to mask for keeping with convention. Would mask ever be None? I assumed mask would always contain the indices to be trained/tested on. It would make sense to not split the indices and keep it as unlabeled during testing

Also not sure it makes sense to allow None for mask. In both training and evaluation we need to know which set of nodes its ok to use true labels for.

Can we update the mask description to be more similar to the examples Vincent linked?
Something like "A mask or index tensor denoting which nodes true labels can be used. "

As is it's a little confusing because it's called "mask" but we describe it as an index tensor. A mask is generally a bool tensor of shape (N,) while an index tensor is a list of the indices where you would put "true" in the mask, so it would have shape (k,) where k<=N. We can skip this detail on the expected size for this parameter since it is a common concept. I think the code should work for both cases without any changes.

mattjhayes3 · 2024-12-05T09:51:15Z

torch_geometric/nn/models/label_usage.py

+
+        # add labels to features for train_labels_idx nodes
+        # zero value nodes in train_pred_idx
+        onehot = torch.zeros([x.shape[0], self.num_classes]).to(x.device)


We might consider making this initialization user-configurable. I'd be interested to benchmark if using the mean label works better when number of reuse iterations is zero or low.

something like init: Union[Tensor, float, 'mean'] = 0 # How to initialize unlabeled examples. 'mean' computes the mean label of the train_idx. If a tensor is passed, it must be one-dimensional of length num_classes

torch_geometric/nn/models/label_usage.py

for more information, see https://pre-commit.ci

…geometric into label_usage

for more information, see https://pre-commit.ci

torch_geometric/nn/models/label_usage.py

liuvince · 2024-12-08T18:33:56Z

add:
=> Part of test issue #4

torch_geometric/nn/models/label_usage.py

mattjhayes3 · 2024-12-10T20:46:57Z

Code lgtm modulo note on no need for explicit training parameter. Let's add a description and send it off!

for more information, see https://pre-commit.ci

…geometric into label_usage

chriskynguyen and others added 17 commits November 6, 2024 23:51

Create label_usage.py

996a189

Merge branch 'label_usage' of https://github.com/mattjhayes3/pytorch_…

e3d14ad

…geometric into label_usage

Upload label_usage.py

f18989f

Update label_usage.py

ad7e940

Merge branch 'label_usage' of https://github.com/mattjhayes3/pytorch_…

f5ddba8

…geometric into label_usage

Update __init__.py

7d9835e

Update label_usage.py

40f06d7

Update label_usage.py

339f6bc

Update label_usage.py

2b4ca6b

Update label_usage.py

c704abe

Revised label_usage, added test file

f35c1d7

Update test_label_usage.py

b7130f7

Update label_usage.py

2533608

fixes for label_usage

558b570

added mini-batch coverage

a9b5030

reworded mini-batch arg

845d618

[pre-commit.ci] auto fixes from pre-commit.com hooks

9cd9f2e

for more information, see https://pre-commit.ci

mattjhayes3 commented Dec 5, 2024

View reviewed changes

torch_geometric/nn/models/label_usage.py Outdated Show resolved Hide resolved

mattjhayes3 commented Dec 5, 2024

View reviewed changes

torch_geometric/nn/models/label_usage.py Outdated Show resolved Hide resolved

mattjhayes3 commented Dec 5, 2024

View reviewed changes

torch_geometric/nn/models/label_usage.py Outdated Show resolved Hide resolved

mattjhayes3 commented Dec 5, 2024

View reviewed changes

torch_geometric/nn/models/label_usage.py Outdated Show resolved Hide resolved

mattjhayes3 commented Dec 5, 2024

View reviewed changes

torch_geometric/nn/models/label_usage.py Show resolved Hide resolved

mattjhayes3 assigned chriskynguyen Dec 5, 2024

chriskynguyen and others added 5 commits December 7, 2024 11:22

updated fixes based on review

6e51e2a

merge conflict fix

47ab81a

[pre-commit.ci] auto fixes from pre-commit.com hooks

65a3c7f

for more information, see https://pre-commit.ci

update label_usage output

52ae665

Merge branch 'label_usage' of https://github.com/mattjhayes3/pytorch_…

52bca31

…geometric into label_usage

chriskynguyen and others added 2 commits December 7, 2024 11:37

fixed formatting to match pyg formatting

88b16f8

[pre-commit.ci] auto fixes from pre-commit.com hooks

185190c

for more information, see https://pre-commit.ci

liuvince reviewed Dec 8, 2024

View reviewed changes

torch_geometric/nn/models/label_usage.py Outdated Show resolved Hide resolved

liuvince reviewed Dec 8, 2024

View reviewed changes

torch_geometric/nn/models/label_usage.py Outdated Show resolved Hide resolved

liuvince changed the title ~~Label usage pr for internal review~~ CS224W - Bag of Tricks for Node Classification with GNN - Label Usage Dec 8, 2024

mattjhayes3 commented Dec 9, 2024

View reviewed changes

torch_geometric/nn/models/label_usage.py Outdated Show resolved Hide resolved

mattjhayes3 commented Dec 9, 2024

View reviewed changes

torch_geometric/nn/models/label_usage.py Outdated Show resolved Hide resolved

mattjhayes3 commented Dec 9, 2024

View reviewed changes

torch_geometric/nn/models/label_usage.py Outdated Show resolved Hide resolved

updated label_usage from review

9a41d15

mattjhayes3 commented Dec 10, 2024

View reviewed changes

torch_geometric/nn/models/label_usage.py Outdated Show resolved Hide resolved

mattjhayes3 commented Dec 10, 2024

View reviewed changes

torch_geometric/nn/models/label_usage.py Show resolved Hide resolved

mattjhayes3 commented Dec 10, 2024

View reviewed changes

torch_geometric/nn/models/label_usage.py Show resolved Hide resolved

mattjhayes3 commented Dec 10, 2024

View reviewed changes

torch_geometric/nn/models/label_usage.py Show resolved Hide resolved

chriskynguyen added 2 commits December 10, 2024 02:05

updated label_usage based on review

770e6ce

added training param description

825691e

mattjhayes3 commented Dec 10, 2024

View reviewed changes

torch_geometric/nn/models/label_usage.py Outdated Show resolved Hide resolved

chriskynguyen and others added 5 commits December 10, 2024 14:48

remove explicit training parameter

060a5c4

[pre-commit.ci] auto fixes from pre-commit.com hooks

33f98f4

for more information, see https://pre-commit.ci

changelog and precommit hook fix

72ff2c2

Merge branch 'label_usage' of https://github.com/mattjhayes3/pytorch_…

5b0b775

…geometric into label_usage

Merge branch 'master' into label_usage

e34cc4b

CS224W - Bag of Tricks for Node Classification with GNN - Label Usage #2

Are you sure you want to change the base?

CS224W - Bag of Tricks for Node Classification with GNN - Label Usage #2

Uh oh!

Conversation

mattjhayes3 commented Dec 5, 2024 • edited by chriskynguyen Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of Label Usage

Benchmark

Uh oh!

Uh oh!

mattjhayes3 Dec 5, 2024

Choose a reason for hiding this comment

Uh oh!

chriskynguyen Dec 5, 2024

Choose a reason for hiding this comment

Uh oh!

liuvince Dec 5, 2024

Choose a reason for hiding this comment

Uh oh!

liuvince Dec 5, 2024

Choose a reason for hiding this comment

Uh oh!

chriskynguyen Dec 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattjhayes3 Dec 9, 2024

Choose a reason for hiding this comment

Uh oh!

mattjhayes3 Dec 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

liuvince commented Dec 8, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mattjhayes3 commented Dec 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mattjhayes3 commented Dec 5, 2024 •

edited by chriskynguyen

Loading

chriskynguyen Dec 5, 2024 •

edited

Loading

mattjhayes3 Dec 5, 2024 •

edited

Loading