-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Various module improvements #165
Conversation
…2 parameters in image case)
…-clamps the width of the spline to (-domain_clampling, domain_clamping). Since the total width before clamping is always > 0, effectively the domain is clamped to (0, domain_clamping).
…s to be in line with the original actnorm implementation of glow. This full shape was already used when actnorm is initialized, but the inconsistency in the constructor prevented loading of >1D actnorm models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, only minor comments :)
FrEIA/modules/all_in_one_block.py
Outdated
self.w_perm = nn.Parameter(torch.FloatTensor(w).view(channels, channels, *([1] * self.input_rank)), | ||
requires_grad=False) | ||
self.w_perm_inv = nn.Parameter(torch.FloatTensor(w.T).view(channels, channels, *([1] * self.input_rank)), | ||
requires_grad=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we're here: Isn't torch.from_numpy
preferred over tensor(array)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also a bit strange that the parameter is being saved as a view. A reshape
or view().contiguous()
seems more appropriate, unless the view
has a specific purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested the suggested changes, and they don't seem to violate any of my local or PR pipepline testing. torch.from_numpy
loses the explicit typecast to float, and instead uses double (as the scipy default). I personally think this is unnecessary and if you agree I propose revert this change to torch.FloatTensor
. as for view().contiguous()
I see no reason this should not be included.
And it looks like one of the tests is not passing – can you run on your machine to see if this is caused by your changes? |
…it docstring for BinnedSplineBase domain_clamping
… into VariousModuleImprovements
…ant assertion in rational quadratic splines runtime error, raise runtime error in allinone block if x does not match dims
….from_numpy(numpy) and tensor.view() to tensor.view().contiguous()
Yes, the AIO block failed to throw an Exception if the wrong input shape was given with hard permutes. |
The new ActNorm implementation is not according to the official implementation. |
In what respect do they differ? Note that the "new" implementation does not change how ActNorm functionally behaves in FrEIA, it simply fixes a bug that prevented non vector-valued models using ActNorm being loaded. |
In the latest pull Request, the ActNorm layer was modified by adding parameters per channel, height, and width. However, if we attend to the original paper: "We propose an actnorm layer (for activation normalizaton), that performs an affine transformation of the activations using a scale and bias parameter per channel, similar to batch normalization". This can also be checked using its official code. To sum up, to me the previous implementation was the correct one. |
I verified this as true.
This is not correct. The later dimensions should be initialized to be of unit size. I think something like this code should solve the issue: dims = list(dims_in)
param_dims = copy(dims)
param_dims[1:] = 1
scale = torch.empty(1, *param_dims) |
I do not understand you with
I see it perfectly well. |
The original code from Lynton Ardizzone, while functional, was unreadable and not maintainable. I refactored this block to reflect its intended use in f1e73d1 and e0287a2, but my implementation relies on implicit tensor broadcasting. I am not sure when this is a problem, but nevertheless @RussellALA approached me later stating that it is. In any case, for the initialization, Lyntons code should work fine, and so should my proposal above. I will push a solution to this on Monday, if @RussellALA has not by then, but otherwise I consider this issue solved. |
Well, although you did a good change, ActNorm should normalize along the channels only...am I wrong? |
I think there has been some confusion what the actual change introduced in this PR was and how ActNorm should work.
This is incorrect. The ActNorm layer in FrEIA does indeed act per c x h x w, but was already modified in f1e73d1 to do so.
This is correct. ActNorm was not faithful to the original implementation in FrEIA and still is not. This should be fixed (in another PR ;) ). The reason for the confusion might be that the parameters for actnorm were overwritten during initialization with the first batch. So event though we define dim = next(iter(dims_in))[0]
self.log_scale = nn.Parameter(torch.empty(1, dim))
self.loc = nn.Parameter(torch.empty(1, dim)) in def initialize(self, batch: torch.Tensor):
self.is_initialized.data = torch.tensor(True)
self.log_scale.data = torch.log(torch.std(batch, dim=0, keepdim=True))
self.loc.data = torch.mean(batch, dim=0, keepdim=True) This overwrites the tensor data with a different shape than they were initialized with. |
These changes include a few stability improvements and minor bugfixes to FrEIA modules: