Using Laplace approximation for LSTMs as base models #143

keyvan-amiri · 2024-02-02T13:33:02Z

Hi every one, and thanks for all the effort you have put on this library. I am Keyvan, and I am working on a regression task in which we use an existing pre-trained model consisting of LSTM (+dropout, batch normalization) layers to get a deterministic point estimate, and then we try to use post-hoc prior precision tuning through laplace library.
Even though I am following what is written in the documentation of Laplace library, I get different errors when Laplace.fit is called. For instance, using default values of the Laplace class I encounter this error: “ValueError: Only 2D inputs are currently supported for MSELoss”, and if I change some of the parameters, this part of the execution self.H += H_batch (within fit method in ParametricLaplace) leads to mismatch shape error. A few days ago I wrote an email to Alexander, and he wrote to me that the best way is to create a new issue here to ask for your kind support. I guess you have not yet tried the library with LSTMs but it might be possible to do it with different backends now. More details about my implementation is as follows:
Loading pretrained model and calling post-hoc laplace method:

model.load_state_dict(checkpoint['model_state_dict'])
post_hoc_laplace(arguments.net, model, train_loader, device=device, approx_type=approx_type, link_approx=link_approx, pred_type=pred_type, hessian_structure=hessian_structure, subset_of_weights=subset_of_weights,                                         last_layer_name=last_layer_name, prior_precision=prior_precision,  prior_structure=prior_structure, optimize_prior_precision=optimize_prior_precision,  temperature=temperature, nr_components=nr_components, n_samples=n_samples,  backend=backend)

The relevant method is:

def post_hoc_laplace(net, model, train_loader, device=None, approx_type=None, link_approx=None, pred_type=None, hessian_structure=None, subset_of_weights=None, last_layer_name=None, prior_precision=None, prior_structure=None, optimize_prior_precision=None, temperature=None, nr_components=None, n_samples=None, backend=None):
   Backend = get_backend(backend, approx_type) # get the backecnd for Hessian computations
    model = model.to(device)
    model.train()
    optional_args = dict() #empty dict for optional args
    if subset_of_weights == 'last_layer':
        optional_args['last_layer_name'] = last_layer_name
    la = Laplace(model, likelihood='regression', subset_of_weights=subset_of_weights, hessian_structure=hessian_structure, prior_precision=prior_precision, temperature=temperature,            backend=Backend, **optional_args)
    la.fit(train_loader)

and the error is as follows:
File "/home/kamiriel/miniconda3/envs/laplace/lib/python3.8/site-packages/laplace/baselaplace.py", line 379, in fit
self.H += H_batch
RuntimeError: The size of tensor a (6) must match the size of tensor b (192) at non-singleton dimension 1
while the pretrained model is defined as per follows:

class DALSTMModel(nn.Module):
    def __init__(self, input_size, hidden_size, n_layers, max_len, linear_hidden_size, dropout=True, p_fix=0.2):
        '''
        ARGUMENTS:
        input_size: number of features
        hidden_size: number of neurons in LSTM layers
        n_layers: number of LSTM layers
        max_len: maximum length for prefixes in the dataset
        dropout: apply dropout if "True", otherwise no dropout
        p_fix: dropout probability
        '''
        super(DALSTMModel, self).__init__()
        
        self.n_layers = n_layers 
        self.dropout = dropout
        self.lstm1 = nn.LSTM(input_size, hidden_size, batch_first=True)
        self.lstm2 = nn.LSTM(hidden_size, hidden_size, batch_first=True)
        self.dropout_layer = nn.Dropout(p=p_fix)
        self.batch_norm1 = nn.BatchNorm1d(max_len)
        self.batch_norm2 = nn.BatchNorm1d(linear_hidden_size)
        self.linear1 = nn.Linear(hidden_size, linear_hidden_size) 
        self.linear2 = nn.Linear(linear_hidden_size, 1)  
        
    def forward(self, x):
        
        x, (hidden_state,cell_state) = self.lstm1(x)
        if self.dropout:
            x = self.dropout_layer(x)
        x = self.batch_norm1(x)
        if self.n_layers > 1:
            for i in range(self.n_layers - 1):
                x, (hidden_state,cell_state) = self.lstm2(x, (hidden_state,cell_state))
                if self.dropout:
                    x = self.dropout_layer(x)
                x = self.batch_norm1(x)
        x = self.linear1(x[:, -1, :]) # only the last one in the sequence 
        x = nn.functional.relu(x)
        x = self.batch_norm2(x)
        yhat = self.linear2(x)

        return yhat.squeeze(dim=1)

more additional context is:
The input of the model is in the shape of batch_sizesequence_dimfeature_dim (in this example 3214453), while the output shape of the model is batch_size (1D) since we have a regression task at hand. The last layer of the model is a linear layer with size of linear_hidden_size (in this example 5).
The relevant parameters are set through a config file as follows:

laplace:
    approx_type: ggn 
    link_approx: mc 
    pred_type: glm 
    hessian_structure: full 
    subset_of_weights: last_layer 
    last_layer_name: linear2 
    prior_precision: 1.0 
    prior_structure: scalar 
    optimize_prior_precision: marglik 
    nr_components: 1 
    n_samples: 30

and the get_backend method is this simple method:

def get_backend(backend, approx_type):
    if backend == 'kazuki':
        if approx_type == 'ggn':
            return AsdlGGN
        else:
            return AsdlEF
    elif backend == 'backpack':
        if approx_type == 'ggn':
            return BackPackGGN
        else:
            return BackPackEF
    else:
        raise ValueError()

Your support is highly appreciated.
Please let me know if I should provide more details.
Regards,
Keyvan

The text was updated successfully, but these errors were encountered:

runame · 2024-02-06T00:22:26Z

Hi Keyvan, can you change the last line of the forward() method of your model from return yhat.squeeze(dim=1) to return yhat and then try the default settings for Laplace again? This will fix the shape to conform to the requirements of the backpack backend.

keyvan-amiri · 2024-02-14T14:41:38Z

Hi Runa, I change shape of the output as you mentioned and tried the default setting for Laplace. But I got another error:

Traceback (most recent call last):
  File "main.py", line 251, in <module>
    main()
  File "main.py", line 213, in main
    post_hoc_laplace(arguments.net, model, train_loader, device=device,
  File "/work/kamiriel/PPM-UQ/training_models.py", line 719, in post_hoc_laplace
    la.fit(train_loader)
  File "/home/kamiriel/miniconda3/envs/laplace/lib/python3.8/site-packages/laplace/lllaplace.py", line 114, in fit
    super().fit(train_loader, override=override)
  File "/home/kamiriel/miniconda3/envs/laplace/lib/python3.8/site-packages/laplace/baselaplace.py", line 797, in fit
    super().fit(train_loader, override=override)
  File "/home/kamiriel/miniconda3/envs/laplace/lib/python3.8/site-packages/laplace/baselaplace.py", line 377, in fit
    loss_batch, H_batch = self._curv_closure(X, y, N)
  File "/home/kamiriel/miniconda3/envs/laplace/lib/python3.8/site-packages/laplace/baselaplace.py", line 777, in _curv_closure
    return self.backend.kron(X, y, N=N)
  File "/home/kamiriel/miniconda3/envs/laplace/lib/python3.8/site-packages/laplace/curvature/backpack.py", line 134, in kron
    loss.backward()
  File "/home/kamiriel/miniconda3/envs/laplace/lib/python3.8/site-packages/torch/_tensor.py", line 522, in backward
    torch.autograd.backward(
  File "/home/kamiriel/miniconda3/envs/laplace/lib/python3.8/site-packages/torch/autograd/__init__.py", line 266, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: cudnn RNN backward can only be called in training mode

it would be perfect if you could advise me on this error since I am calling model.train() before fitting Laplace, and the model should be in training mode but it seems that setting the model in train mode does not work because if I remove it I get the same error message. Thanks a lot for your support.

runame · 2024-02-14T14:58:26Z

We call model.eval() within the Laplace implementation, which is why you keep seeing this error. Can you try the following setting?

torch.backends.cudnn.enabled = False

This might result in a slow-down, but should make it work.

keyvan-amiri · 2024-02-15T10:45:15Z

Hi Runa, torch.backends.cudnn.enabled = False works for me, and now I am able to fit Laplace without any error. Thanks for your prompt support. I have another issue which might be not relevant to the overal topic of this issue: The default setting is to use only the last layer for approximation. I tried to use subnetwork option (both ModuleNameSubnetMask and LargestMagnitudeSubnetMask) howeover while the indices of the model are extracted correctly, and conditions isinstance(subnetwork_indices, torch.cuda.LongTensor), subnetwork_indices.numel() > 0, len(subnetwork_indices.shape) == 1 are satisfied, I encoutner this error: ValueError: Subnetwork indices must be non-empty 1-dimensional torch.LongTensor. and when I am trying the full network I get this error: NotImplementedError: Extension saving to kflr does not have an extension for Module <class 'training_models.DALSTMModel'> it would be perfect if you could advise me on this issue.

runame · 2024-02-15T12:15:32Z

Glad it works now.

Can you double check that the conditions actually hold? This error is only raised when at least one of these conditions does not hold. Regarding applying Laplace to the full network, the issue is that not all modules which are used in your model are supported by our two backends, ASDL and backpack. Unless you extend these backends, there is no way of applying Laplace to the full NN.

keyvan-amiri · 2024-02-15T12:50:48Z

Thanks for your advise. I see your point about the full network. But regarding the subnetwork, I am quite sure that everything should work. I am using this code to check the conditions

subnetwork_mask = LargestMagnitudeSubnetMask(model, n_params_subnet=16)
subnetwork_indices = subnetwork_mask.select()
print(isinstance(subnetwork_indices, torch.cuda.LongTensor))
print(subnetwork_indices.numel())
print(len(subnetwork_indices.shape))
print(subnetwork_indices)
la = Laplace(model, likelihood='regression',
                 subset_of_weights=subset_of_weights,
                 subnetwork_indices=subnetwork_indices,
                 hessian_structure=hessian_structure,
                 backend=BackPackGGN)

and this it the result:

True
16
1
tensor([544200, 544201, 544203, 544204, 544205, 544207, 544208, 544210, 544211,
        544212, 544213, 544228, 544229, 544230, 544232, 544993],  device='cuda:0')
ValueError: Subnetwork indices must be non-empty 1-dimensional torch.LongTensor.

once again thanks for your prompt support.

runame · 2024-02-15T13:31:57Z

Did you install the package via pip install laplace-torch? If yes, can you try to install it from the current main branch of the repo, i.e. clone the repo and install via pip install -e .? I just noticed that in the last release we only checked isinstance(subnetwork_indices, torch.LongTensor) and not for torch.cuda.LongTensor, but this is fixed on main.

keyvan-amiri · 2024-02-15T15:34:24Z

Yeah, that's correct, and installing via repo resolves the problem. Yet, unfortunately I am not able to use this option as I guess the batch normalization layer in my model is not supported by Laplace library and its backends. Anyway, thanks a lot for your support.

wiseodd · 2024-03-12T18:01:31Z

@keyvan-amiri could you try this with our new backend? Install the main branch of the laplace-torch

pip uninstall laplace-torch
pip install git+https://[email protected]/aleximmer/laplace

Then use the diagonal GGN/EF from curvlinops (I don't think curvlinops KFAC supports BatchNorm):

from laplace import Laplace
from laplace.curvature import CurvlinopsGGN, CurvlinopsEF

model = ...
la = Laplace(model, likelihood, hessian_structure='diag', backend=CurvlinopsEF)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using Laplace approximation for LSTMs as base models #143

Using Laplace approximation for LSTMs as base models #143

keyvan-amiri commented Feb 2, 2024 •

edited

Loading

runame commented Feb 6, 2024

keyvan-amiri commented Feb 14, 2024 •

edited

Loading

runame commented Feb 14, 2024

keyvan-amiri commented Feb 15, 2024

runame commented Feb 15, 2024

keyvan-amiri commented Feb 15, 2024 •

edited

Loading

runame commented Feb 15, 2024

keyvan-amiri commented Feb 15, 2024

wiseodd commented Mar 12, 2024

Using Laplace approximation for LSTMs as base models #143

Using Laplace approximation for LSTMs as base models #143

Comments

keyvan-amiri commented Feb 2, 2024 • edited Loading

runame commented Feb 6, 2024

keyvan-amiri commented Feb 14, 2024 • edited Loading

runame commented Feb 14, 2024

keyvan-amiri commented Feb 15, 2024

runame commented Feb 15, 2024

keyvan-amiri commented Feb 15, 2024 • edited Loading

runame commented Feb 15, 2024

keyvan-amiri commented Feb 15, 2024

wiseodd commented Mar 12, 2024

keyvan-amiri commented Feb 2, 2024 •

edited

Loading

keyvan-amiri commented Feb 14, 2024 •

edited

Loading

keyvan-amiri commented Feb 15, 2024 •

edited

Loading