Skip to content

Conversation

mattjhayes3
Copy link
Owner

@mattjhayes3 mattjhayes3 commented Dec 5, 2024

Add label_usage.py to torch_geometric.nn.models and unit test.

Part of #4 (TODO update this) for our final project for the Stanford CS224W course, this implements Label Usage as described in “Bag of Tricks for Node Classification with Graph Neural Networks”.

Description of Label Usage

  • Label usage utilizes true labels as features and learns to predict other labels.
  • Within label usage, label reuse is done to recycle predicted soft labels of previous iterations and uses the labels as input
  • When in training, label usage performs a split on the training nodes and predicts all other unlabeled nodes
  • In evaluation, label usage doesn't perform split but uses true labels for training nodes and predicts on validation and test nodes
  • The base model should have input dimensions of num_features + num_classes to accommodate concatenation of labels with features

Benchmark

Run on Arxiv (ogbn-arvix) dataset with 100 epochs, 10 recycling iterations, and a split ratio of 0.6 using a GAT model.

Dataset Val Accuracy (%) Test Accuracy (%)
Arxiv 69.32 68.53

"""
# re-assign train_idx to be nodes in mini-batch
if batch is not None:
train_idx = batch
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

batch is never used again, so I'm not sure it makes sense to have both batch and train_idx parameters.
@liuvince is current impl what you had in mind for minibatches? In the case of minibatching I think x and y would just not be the whole dataset.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove batch and indicate that train_idx can be used for batch indices since batches passed contain the global indices. That seems more intuitive now that I read your comment and just from looking through it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose the following logic:
in training mode (when self.training==True):

  • when mask is set to None, we should perform the split in the forward pass.
  • when mask is set to not-null value, we assume that it corresponds to nodes which labels are used as node features.
    in test mode:
  • when mask is set to None, we use the whole input as unlabeled data.
  • when mask is set to not-null value, we assume that it corresponds to nodes which labels are used as node features which is the whole or sampled training dataset.
    does it make sense ?

Copy link
Collaborator

@chriskynguyen chriskynguyen Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I can rename it to mask for keeping with convention. Would mask ever be None? I assumed mask would always contain the indices to be trained/tested on. It would make sense to not split the indices and keep it as unlabeled during testing

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also not sure it makes sense to allow None for mask. In both training and evaluation we need to know which set of nodes its ok to use true labels for.

Can we update the mask description to be more similar to the examples Vincent linked?
Something like "A mask or index tensor denoting which nodes true labels can be used. "

As is it's a little confusing because it's called "mask" but we describe it as an index tensor. A mask is generally a bool tensor of shape (N,) while an index tensor is a list of the indices where you would put "true" in the mask, so it would have shape (k,) where k<=N. We can skip this detail on the expected size for this parameter since it is a common concept. I think the code should work for both cases without any changes.


# add labels to features for train_labels_idx nodes
# zero value nodes in train_pred_idx
onehot = torch.zeros([x.shape[0], self.num_classes]).to(x.device)
Copy link
Owner Author

@mattjhayes3 mattjhayes3 Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might consider making this initialization user-configurable. I'd be interested to benchmark if using the mean label works better when number of reuse iterations is zero or low.

something like init: Union[Tensor, float, 'mean'] = 0 # How to initialize unlabeled examples. 'mean' computes the mean label of the train_idx. If a tensor is passed, it must be one-dimensional of length num_classes

@liuvince
Copy link
Collaborator

liuvince commented Dec 8, 2024

@liuvince liuvince changed the title Label usage pr for internal review CS224W - Bag of Tricks for Node Classification with GNN - Label Usage Dec 8, 2024
@mattjhayes3
Copy link
Owner Author

Code lgtm modulo note on no need for explicit training parameter. Let's add a description and send it off!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants