Skip to content

Using Laplace approximation for LSTMs as base models #143

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
keyvan-amiri opened this issue Feb 2, 2024 · 9 comments
Open

Using Laplace approximation for LSTMs as base models #143

keyvan-amiri opened this issue Feb 2, 2024 · 9 comments

Comments

@keyvan-amiri
Copy link

keyvan-amiri commented Feb 2, 2024

Hi every one, and thanks for all the effort you have put on this library. I am Keyvan, and I am working on a regression task in which we use an existing pre-trained model consisting of LSTM (+dropout, batch normalization) layers to get a deterministic point estimate, and then we try to use post-hoc prior precision tuning through laplace library.
Even though I am following what is written in the documentation of Laplace library, I get different errors when Laplace.fit is called. For instance, using default values of the Laplace class I encounter this error: “ValueError: Only 2D inputs are currently supported for MSELoss”, and if I change some of the parameters, this part of the execution self.H += H_batch (within fit method in ParametricLaplace) leads to mismatch shape error. A few days ago I wrote an email to Alexander, and he wrote to me that the best way is to create a new issue here to ask for your kind support. I guess you have not yet tried the library with LSTMs but it might be possible to do it with different backends now. More details about my implementation is as follows:
Loading pretrained model and calling post-hoc laplace method:

model.load_state_dict(checkpoint['model_state_dict'])
post_hoc_laplace(arguments.net, model, train_loader, device=device, approx_type=approx_type, link_approx=link_approx, pred_type=pred_type, hessian_structure=hessian_structure, subset_of_weights=subset_of_weights,                                         last_layer_name=last_layer_name, prior_precision=prior_precision,  prior_structure=prior_structure, optimize_prior_precision=optimize_prior_precision,  temperature=temperature, nr_components=nr_components, n_samples=n_samples,  backend=backend)  

The relevant method is:

def post_hoc_laplace(net, model, train_loader, device=None, approx_type=None, link_approx=None, pred_type=None, hessian_structure=None, subset_of_weights=None, last_layer_name=None, prior_precision=None, prior_structure=None, optimize_prior_precision=None, temperature=None, nr_components=None, n_samples=None, backend=None):
   Backend = get_backend(backend, approx_type) # get the backecnd for Hessian computations
    model = model.to(device)
    model.train()
    optional_args = dict() #empty dict for optional args
    if subset_of_weights == 'last_layer':
        optional_args['last_layer_name'] = last_layer_name
    la = Laplace(model, likelihood='regression', subset_of_weights=subset_of_weights, hessian_structure=hessian_structure, prior_precision=prior_precision, temperature=temperature,            backend=Backend, **optional_args)
    la.fit(train_loader)

and the error is as follows:
File "/home/kamiriel/miniconda3/envs/laplace/lib/python3.8/site-packages/laplace/baselaplace.py", line 379, in fit
self.H += H_batch
RuntimeError: The size of tensor a (6) must match the size of tensor b (192) at non-singleton dimension 1
while the pretrained model is defined as per follows:

class DALSTMModel(nn.Module):
    def __init__(self, input_size, hidden_size, n_layers, max_len, linear_hidden_size, dropout=True, p_fix=0.2):
        '''
        ARGUMENTS:
        input_size: number of features
        hidden_size: number of neurons in LSTM layers
        n_layers: number of LSTM layers
        max_len: maximum length for prefixes in the dataset
        dropout: apply dropout if "True", otherwise no dropout
        p_fix: dropout probability
        '''
        super(DALSTMModel, self).__init__()
        
        self.n_layers = n_layers 
        self.dropout = dropout
        self.lstm1 = nn.LSTM(input_size, hidden_size, batch_first=True)
        self.lstm2 = nn.LSTM(hidden_size, hidden_size, batch_first=True)
        self.dropout_layer = nn.Dropout(p=p_fix)
        self.batch_norm1 = nn.BatchNorm1d(max_len)
        self.batch_norm2 = nn.BatchNorm1d(linear_hidden_size)
        self.linear1 = nn.Linear(hidden_size, linear_hidden_size) 
        self.linear2 = nn.Linear(linear_hidden_size, 1)  
        
    def forward(self, x):
        
        x, (hidden_state,cell_state) = self.lstm1(x)
        if self.dropout:
            x = self.dropout_layer(x)
        x = self.batch_norm1(x)
        if self.n_layers > 1:
            for i in range(self.n_layers - 1):
                x, (hidden_state,cell_state) = self.lstm2(x, (hidden_state,cell_state))
                if self.dropout:
                    x = self.dropout_layer(x)
                x = self.batch_norm1(x)
        x = self.linear1(x[:, -1, :]) # only the last one in the sequence 
        x = nn.functional.relu(x)
        x = self.batch_norm2(x)
        yhat = self.linear2(x)

        return yhat.squeeze(dim=1)

more additional context is:
The input of the model is in the shape of batch_sizesequence_dimfeature_dim (in this example 3214453), while the output shape of the model is batch_size (1D) since we have a regression task at hand. The last layer of the model is a linear layer with size of linear_hidden_size (in this example 5).
The relevant parameters are set through a config file as follows:

laplace:
    approx_type: ggn 
    link_approx: mc 
    pred_type: glm 
    hessian_structure: full 
    subset_of_weights: last_layer 
    last_layer_name: linear2 
    prior_precision: 1.0 
    prior_structure: scalar 
    optimize_prior_precision: marglik 
    nr_components: 1 
    n_samples: 30 

and the get_backend method is this simple method:

def get_backend(backend, approx_type):
    if backend == 'kazuki':
        if approx_type == 'ggn':
            return AsdlGGN
        else:
            return AsdlEF
    elif backend == 'backpack':
        if approx_type == 'ggn':
            return BackPackGGN
        else:
            return BackPackEF
    else:
        raise ValueError()

Your support is highly appreciated.
Please let me know if I should provide more details.
Regards,
Keyvan

@runame
Copy link
Collaborator

runame commented Feb 6, 2024

Hi Keyvan, can you change the last line of the forward() method of your model from return yhat.squeeze(dim=1) to return yhat and then try the default settings for Laplace again? This will fix the shape to conform to the requirements of the backpack backend.

@keyvan-amiri
Copy link
Author

keyvan-amiri commented Feb 14, 2024

Hi Runa, I change shape of the output as you mentioned and tried the default setting for Laplace. But I got another error:

Traceback (most recent call last):
  File "main.py", line 251, in <module>
    main()
  File "main.py", line 213, in main
    post_hoc_laplace(arguments.net, model, train_loader, device=device,
  File "/work/kamiriel/PPM-UQ/training_models.py", line 719, in post_hoc_laplace
    la.fit(train_loader)
  File "/home/kamiriel/miniconda3/envs/laplace/lib/python3.8/site-packages/laplace/lllaplace.py", line 114, in fit
    super().fit(train_loader, override=override)
  File "/home/kamiriel/miniconda3/envs/laplace/lib/python3.8/site-packages/laplace/baselaplace.py", line 797, in fit
    super().fit(train_loader, override=override)
  File "/home/kamiriel/miniconda3/envs/laplace/lib/python3.8/site-packages/laplace/baselaplace.py", line 377, in fit
    loss_batch, H_batch = self._curv_closure(X, y, N)
  File "/home/kamiriel/miniconda3/envs/laplace/lib/python3.8/site-packages/laplace/baselaplace.py", line 777, in _curv_closure
    return self.backend.kron(X, y, N=N)
  File "/home/kamiriel/miniconda3/envs/laplace/lib/python3.8/site-packages/laplace/curvature/backpack.py", line 134, in kron
    loss.backward()
  File "/home/kamiriel/miniconda3/envs/laplace/lib/python3.8/site-packages/torch/_tensor.py", line 522, in backward
    torch.autograd.backward(
  File "/home/kamiriel/miniconda3/envs/laplace/lib/python3.8/site-packages/torch/autograd/__init__.py", line 266, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: cudnn RNN backward can only be called in training mode

it would be perfect if you could advise me on this error since I am calling model.train() before fitting Laplace, and the model should be in training mode but it seems that setting the model in train mode does not work because if I remove it I get the same error message. Thanks a lot for your support.

@runame
Copy link
Collaborator

runame commented Feb 14, 2024

We call model.eval() within the Laplace implementation, which is why you keep seeing this error. Can you try the following setting?

torch.backends.cudnn.enabled = False

This might result in a slow-down, but should make it work.

@keyvan-amiri
Copy link
Author

Hi Runa, torch.backends.cudnn.enabled = False works for me, and now I am able to fit Laplace without any error. Thanks for your prompt support. I have another issue which might be not relevant to the overal topic of this issue: The default setting is to use only the last layer for approximation. I tried to use subnetwork option (both ModuleNameSubnetMask and LargestMagnitudeSubnetMask) howeover while the indices of the model are extracted correctly, and conditions isinstance(subnetwork_indices, torch.cuda.LongTensor), subnetwork_indices.numel() > 0, len(subnetwork_indices.shape) == 1 are satisfied, I encoutner this error: ValueError: Subnetwork indices must be non-empty 1-dimensional torch.LongTensor. and when I am trying the full network I get this error: NotImplementedError: Extension saving to kflr does not have an extension for Module <class 'training_models.DALSTMModel'> it would be perfect if you could advise me on this issue.

@runame
Copy link
Collaborator

runame commented Feb 15, 2024

Glad it works now.

Can you double check that the conditions actually hold? This error is only raised when at least one of these conditions does not hold. Regarding applying Laplace to the full network, the issue is that not all modules which are used in your model are supported by our two backends, ASDL and backpack. Unless you extend these backends, there is no way of applying Laplace to the full NN.

@keyvan-amiri
Copy link
Author

keyvan-amiri commented Feb 15, 2024

Thanks for your advise. I see your point about the full network. But regarding the subnetwork, I am quite sure that everything should work. I am using this code to check the conditions

subnetwork_mask = LargestMagnitudeSubnetMask(model, n_params_subnet=16)
subnetwork_indices = subnetwork_mask.select()
print(isinstance(subnetwork_indices, torch.cuda.LongTensor))
print(subnetwork_indices.numel())
print(len(subnetwork_indices.shape))
print(subnetwork_indices)
la = Laplace(model, likelihood='regression',
                 subset_of_weights=subset_of_weights,
                 subnetwork_indices=subnetwork_indices,
                 hessian_structure=hessian_structure,
                 backend=BackPackGGN)

and this it the result:

True
16
1
tensor([544200, 544201, 544203, 544204, 544205, 544207, 544208, 544210, 544211,
        544212, 544213, 544228, 544229, 544230, 544232, 544993],  device='cuda:0')
ValueError: Subnetwork indices must be non-empty 1-dimensional torch.LongTensor.

once again thanks for your prompt support.

@runame
Copy link
Collaborator

runame commented Feb 15, 2024

Did you install the package via pip install laplace-torch? If yes, can you try to install it from the current main branch of the repo, i.e. clone the repo and install via pip install -e .? I just noticed that in the last release we only checked isinstance(subnetwork_indices, torch.LongTensor) and not for torch.cuda.LongTensor, but this is fixed on main.

@keyvan-amiri
Copy link
Author

Yeah, that's correct, and installing via repo resolves the problem. Yet, unfortunately I am not able to use this option as I guess the batch normalization layer in my model is not supported by Laplace library and its backends. Anyway, thanks a lot for your support.

@wiseodd
Copy link
Collaborator

wiseodd commented Mar 12, 2024

@keyvan-amiri could you try this with our new backend? Install the main branch of the laplace-torch

pip uninstall laplace-torch
pip install git+https://[email protected]/aleximmer/laplace

Then use the diagonal GGN/EF from curvlinops (I don't think curvlinops KFAC supports BatchNorm):

from laplace import Laplace
from laplace.curvature import CurvlinopsGGN, CurvlinopsEF

model = ...
la = Laplace(model, likelihood, hessian_structure='diag', backend=CurvlinopsEF)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants