Student's t-distribution as base distribution #31

CaioDaumann · 2023-11-29T14:57:36Z

CaioDaumann
Nov 29, 2023

Hello Guys,

I am currently working on implementing the techniques proposed in this paper (https://arxiv.org/pdf/1907.04481.pdf), which focuses on improving flow results in the tails of distributions, which can be very useful in may physics analysis. The authors suggest using a Student's t-distribution instead of a multivariate diagonal Gaussian as the loss function. Additionally, they introduce the number of degrees of freedom for the t-distribution as a learnable parameter. I think I have successfully implemented the Student's t-distribution in place of the Gaussian. The corresponding code can be found here:

class MultiStudentT(Independent):
    r"""Creates a multivariate student's t-distribution parametrized by the variables
    degrees of freedom :math `\nu`, mean :math:`\mu` and standard deviation :math:`\sigma`, but assumes no
    correlation between the variables.

    Arguments:
        df: The number of degress of freedom of the distribution
        loc: The mean :math:`\mu` of the variables.
        scale: The standard deviation :math:`\sigma` of the variables.
        ndims: The number of batch dimensions to interpret as event dimensions.

    Example:
        >>> d = MultiStudentT(torch.tensor(2.5),torch.zeros(3), torch.ones(3))
        >>> d.event_shape
        torch.Size([3])
        >>> d.sample()
        tensor([-0.9570,  1.0004,  0.4297])
    """

    def __init__(self, df: Tensor ,loc: Tensor, scale: Tensor, ndims: int = 1):
        super().__init__(torch.distributions.studentT.StudentT( torch.as_tensor(df), torch.as_tensor(loc), torch.as_tensor(scale)), ndims)

    def __repr__(self) -> str:
        return 'Diag' + repr(self.base_dist)

    def expand(self, batch_shape: Size, new: Distribution = None) -> Distribution:
        new = self._get_checked_instance(MultiStudentT, new)
        return super().expand(batch_shape, new)

But I still was not able to implement the learning parameter, what I tried until the moment was:

flow = zuko.flows.NSF( self.training_inputs.size()[1] , self.training_conditions.size()[1] , transforms = 5, bins = 8  ,hidden_features=[256] * 3) 
self.t_degress_of_fredom = nn.Parameter( torch.tensor( 5.0 ), requires_grad=True)
d = Unconditional(MultiStudentT, df = self.t_degress_of_fredom, loc = mu , scale=sigma, buffer=True )
flow      = zuko.flows.Flow( flow.transform , base = d)
optimizer = torch.optim.Adam(  flow.parameters(), 1e-3)

But the degrees of freedom parameters does not show in the flow.parameters() list, and it is not changing though the training.

do you have any leads in how I could implement this extra learnable parameter?

Cheers,
Caio Daumann

Answered by francois-rozet

Nov 29, 2023

Hello @CaioDaumann, thank you for your question! There are several ways to create a custom base distribution, but it must be a LazyDistribution, that is a module that returns a distribution when called.

The first way is to create a function (or a class constructor) that returns a Distribution when called and wrap it inside Unconditional.

def student_t(log_df: Tensor) -> Distribution:
    return Independent(StudentT(df=log_df.exp()), 1)
    
base = Unconditional(student_t, torch.randn(5))

There are a few subtleties with Unconditional. First, keyword arguments are not considered as parameters (or buffers), they will be passed unmodified to the function during the forward. This means that te…

View full answer

CaioDaumann · 2023-11-29T15:49:09Z

CaioDaumann
Nov 29, 2023
Author

Hi,

A update, I think one can solve this by simply adding the additional parameter to the optimiser like this:

optimizer = torch.optim.Adam(  itertools.chain(flow.parameters(), (self.t_degress_of_fredom,))  , 1e-3)

In the end the question was not related to Zuko itself, I am sorry for the noise! But perhaps this might be useful to others that want to also try this.

Best,
Caio Daumann

1 reply

francois-rozet Nov 29, 2023
Maintainer

Hi, in your example, because of the buffer=True, the self.t_degress_of_fredom tensor will not be affected by gradient steps.

francois-rozet · 2023-11-29T15:57:09Z

francois-rozet
Nov 29, 2023
Maintainer

Hello @CaioDaumann, thank you for your question! There are several ways to create a custom base distribution, but it must be a LazyDistribution, that is a module that returns a distribution when called.

The first way is to create a function (or a class constructor) that returns a Distribution when called and wrap it inside Unconditional.

def student_t(log_df: Tensor) -> Distribution:
    return Independent(StudentT(df=log_df.exp()), 1)
    
base = Unconditional(student_t, torch.randn(5))

There are a few subtleties with Unconditional. First, keyword arguments are not considered as parameters (or buffers), they will be passed unmodified to the function during the forward. This means that tensors should be given as positional arguments. Second, if buffer=True, all tensors in the positional arguments are considered buffers. Buffers will be affected by .to('cuda') or .float(), but not by gradient steps. If buffer=False (default), all tensors in the positional arguments are considered parameters.

Note that parameters should always be "unconstrained", meaning that they can take any value in $\mathbb{R}$, which is the case for log_df but not df. The second way is to create your own LazyDistribution class.

class LazyStudentT(zuko.flows.LazyDistribution):
    def __init__(self, features: int):
       super().__init__()

       self.log_df = torch.nn.Parameter(torch.randn(features))

    def forward(self, c: Tensor = None) -> Distribution:
        return Independent(StudentT(df=self.log_df.exp()), 1)

base = LazyStudentT(features=5)

Now that your base is created, you can create your flow as:

flow = zuko.flows.NSF(features=5, ...)
flow.base = base  # replace Gaussian with StudentT

I hope this helps!

5 replies

CaioDaumann Nov 29, 2023
Author

Hi @francois-rozet thanks so much for the detailed answer! I will try to implement this now.

But just one question, the way I implemented the StudentT() is not also correct?

francois-rozet Nov 29, 2023
Maintainer

Yes, I think it is correct. You can replace the student_t function with MultiStudentT in Unconditional.

base = Unconditional(MultiStudentT, torch.randn(5))

Note that the log_df tensor should be a vector (1D tensor) and not a scalar.

Edit My bad, you will have an issue with this as MultiStudentT takes df (and scale) as input and not log_df. So you would need the following instead.

base = Unconditional(
    lambda log_df, loc, log_scale: MultiStudentT(log_df.exp(), loc, log_scale.exp()),
    torch.randn(5),
    torch.zeros(5),
    torch.zeros(5),
)

CaioDaumann Nov 29, 2023
Author

Sorry, just so I can understand, why you keep using log_df instead of only df, there is a reason behind it?

francois-rozet Nov 29, 2023
Maintainer

Because gradient descent steps will very likely result in df being negative (which should never happen) if it is a parameter, while parameterizing log_df would prevent it.

CaioDaumann Nov 29, 2023
Author

Ah okay, I see it now. Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Student's t-distribution as base distribution #31

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 6 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Student's t-distribution as base distribution #31

CaioDaumann Nov 29, 2023

Replies: 2 comments · 6 replies

CaioDaumann Nov 29, 2023 Author

francois-rozet Nov 29, 2023 Maintainer

francois-rozet Nov 29, 2023 Maintainer

CaioDaumann Nov 29, 2023 Author

francois-rozet Nov 29, 2023 Maintainer

CaioDaumann Nov 29, 2023 Author

francois-rozet Nov 29, 2023 Maintainer

CaioDaumann Nov 29, 2023 Author

CaioDaumann
Nov 29, 2023

Replies: 2 comments 6 replies

CaioDaumann
Nov 29, 2023
Author

francois-rozet Nov 29, 2023
Maintainer

francois-rozet
Nov 29, 2023
Maintainer

CaioDaumann Nov 29, 2023
Author

francois-rozet Nov 29, 2023
Maintainer

CaioDaumann Nov 29, 2023
Author

francois-rozet Nov 29, 2023
Maintainer

CaioDaumann Nov 29, 2023
Author