-
-
Notifications
You must be signed in to change notification settings - Fork 294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ExpertGate Nan loss using multi-head classifier #1639
Comments
btw, I wanted to initially ask on your slack - but the invite link is no longer working. |
Heya, happy to try and help you out with this. I wrote this strategy for Avalanche a while ago. I'd like to make sure it's usable for you. Hopefully we can squash any bugs, if there is something in the implementation. I might be a bit slow in response, so please bear with me. It seems like you're using a custom dataset, were you able to repro this issue with a non-custom dataset? If so, could you please share a minimal example. I want to figure out whether this is a root issue with the implementation or whether there is something specific with your use case. In your current setup, what optimizer & lr are you using? |
Hey, @niniack . Thanks for your contributions. class ImageDataset(Dataset):
def __init__(self):
self.data_path = ""
self.data_name = ""
self.num_classes = 0
self.train_transform = None
self.train_csv_path = ""
self.image_paths = []
self.labels = []
def get_num_classes(self):
return self.num_classes
def __getitem__(self, index):
img_path = self.image_paths[index]
label = self.labels[index]
img = Image.open(img_path).convert("RGB")
if self.train_transform:
img = self.train_transform(img)
return img, label
def __len__(self):
return len(self.image_paths)
@property
def label_dict(self):
return {i: self.class_map[i] for i in range(self.num_classes)}
def __repr__(self):
return f"ImageDataset({self.data_name}) with {self.__len__} instances" def get_dataloader(dataset) :
split_size = int(0.8 * len(dataset))
train_dataset, val_dataset = torch.utils.data.random_split(dataset, [split_size, len(dataset) - split_size])
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True, drop_last=True, pin_memory=True)
val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=batch_size, shuffle=False, drop_last=True, pin_memory=True)
return train_loader, val_loader, dataset.get_num_classes()
``` |
This is the optimizer I am using. I picked the hyperparameters from the continual-learning-baseline repo : model = ExpertGate(shape=(3, 224, 224),device=device)
optimizer = SGD(
model.expert.parameters(), lr=0.1, momentum=0.9, weight_decay=0.0005
) |
In my current experiments, I am trying to set up a different dataset ( with different number of classes) as separate tasks/experiences. While trying to train on ExpertGate, I am not sure if I am doing the data processing correctly as I am getting
Nan
loss repeatedly.Here's my dataprocessing code:
This is how I am initializing the model and strategy:
Here's a sample (on-going) training log:
Would love to hear any pointers on this?
In general, what is the best way to set up dataloaders for my particular setting?
The text was updated successfully, but these errors were encountered: