Pooled texture model #284

billbrod · 2024-08-19T21:34:22Z

This pull request adds the pooled texture statistic model from Freeman and Simoncelli, 2011 and elsewhere. In our implementation, the statistics are computed with user-defined masks, which is a list of 3d tensors of shape (masks, height, width). The list may contain any number of tensors, and they will be multiplied together to create the final pooling regions. The intended use case is to define the regions as two-dimensional and separable, either in x and y, or polar angle and eccentricity.

This pull request contains a notebook, pooled_texture_model, which demonstrates the usage of the model with different images and types of windows.

This pull request is not yet ready; the following questions still need to be addressed:

Math / model details:

Mask sums:
- The model performs best when each of the pooling regions (that is, the product of all mask tensors) sums to approximately 1. If their sums are too large or too small, the model's statistics will vary too much in magnitude, making optimization difficult. In particular, if the mask sums are too large, the marginal pixel statistics (first four moments in each window) will be really large, and if the sums are too small, the skew/kurtosis of each reconstructed lowpass image will be really large.
- The windows should all have approximately the same sum, so that each of them contributes equally to the loss.
- Currently, I rely on the user to handle both of these. Should I just do it myself during model initialization?
We use an epsilon parameter, currently with an arbitrary value of 1e-6, to make sure that the sqrt and division operations in the model don't return NaNs or very large numbers. This has worked for the models and windows I've tested it with, assuming that the above normalization has been done (I noticed if the window sums are too small, a larger epsilon is needed). But it feels fairly arbitrary; I can try and figure out if I can figure out the relationship between the window sum and the epsilon value and then do one of: use window sum to determine epsilon raise an error if the window sum looks wrong; normalize the windows myself
For the pooling regions, you want to avoid aliasing. This is an interaction between the sampling distance and the function used to generate the regions, and can be checked by seeing if they're equivariant to translation (if the windows are Cartesian) or rotation / dilation (if the windows are polar). I don't think the model should check this, but we can show an example of how to check this in our documentation and point out that this is important.
In plenoptic's implementation of the Portilla-Simoncelli texture model, the statistics are all accurate. That is, we're using the actual mean, the actual variance, the actual skew, the actual auto- and cross-correlations. The statistics in the pooled version won't be the actual averages unless the windows sum to 1. And they won't be the actual correlations or higher moments unless I subtract the mean off of them, but that ends up being a bit tricky to implement. I don't think this is a problem (we still end up with good looking metamers), but just need to be clear that this is the case.
The auto- and cross-correlations can be computed by either correlating and then masking, or masking and then correlating. These are not the same thing (they will differ at the boundaries of the masking regions), and I think correlating and then masking is the right way to do it (though I think that Jeremy Freeman's original code does it the other way round).

Software design:

Currently the pooled texture model is completely separate from the regular one. That seems bad. But the implementations of the actual computations differ pretty substantially, so how best to relate them?
Should see if I can improve the efficiency. In particular, the way I compute the auto-correlations is not very efficient on the CPU (the two auto-correlation computations take up about 60% of the total forward call). In the regular texture model, the autocorrelation is computed using the Fourier transform (see Wiener-Khinchin theorem), which is very efficient but doesn't work for the pooled model. Here, we multiply the image by shifted copies of itself, which is not efficient at all. I get around this by pre-computing indices required to shift the image and then using torch.gather at run-time. This works well on the GPU, but not on the CPU and isn't very memory-efficient. I'm not sure if there's a better way here.
Relatedly, need to benchmark synthesis speed against other implementations. Currently, however, it's quite quick, especially on GPU, though it will depend on the number of pooling regions and the size of the image. But on the GPU with a total of 125 pooling regions, it takes 30 seconds to generate a 256 x 256 image.

User interface:

Currently, the masks argument is a list containing some number of 3d tensors, which are all multiplied together. In practice, I think the majority of users will pass two tensors here (x/y or angle/eccentricity) and I can't imagine a use case for three or more. Should I just require two tensors as the input? Or keep the list (allowing one or two), but raise an exception if more than two are passed? Allowing for a single mask allows me to test the pooled version against the original (see below).
The foveated pooling windows are not currently in plenoptic and are a pain to implement yourself. Pytorch implementations exist (as well as ways to get existing windows into pytorch), including one that I wrote. I can show examples making use of some of these, is that sufficient? None of them are python packages (so they can't be installed with pip), but I could at least package the one I wrote at least. I've hesitated because it's definitely research code, but it would simplify the process for people.

Testing / relationship to other models:

What tests to write? Regression tests like the regular model: cache its output, metamer result, and _representation_scales .

Create a model with a single image-wide mask and compare it against the regular model. The following works:

ps = po.simul.PortillaSimoncelli((256, 256))
ps_mask = po.simul.PortillaSimoncelliMasked((256, 256), [torch.ones_like(img)[0]/(256**2)])
ps_mask._stability_epsilon = 0
ps_rep = ps.convert_to_dict(ps(img))
mask_rep = ps_mask.convert_to_dict(ps_mask(img))
#unfolded
diffs = []
for k, v in ps_rep.items():
    if k == 'pixel_statistics':
        # the masked model uses non-central moments and doesn't have the min or max
        d = torch.cat([img.pow(i+1).mean((-2, -1)) for i in range(4)]) - mask_rep[k].flatten()
    else:
        # because of the difference in how we compute the auto-correlations, those statistics 
        # have different shapes in the two models. dropping all the redundant values here will still
        # return a vector of the same length however.
        d = v[~v.isnan()] - mask_rep[k][~mask_rep[k].isnan()]
    diffs.append(d)
diffs = torch.cat(diffs)

I would like to do the similar but with a "folded image", like so:

# this converts the image from (1, 1, 256, 256) to (1, 4, 64, 64), to match the masks above
folded_img = einops.rearrange(img, 'b c (m0 h) (m1 w) -> b (c m0 m1) h w', m0=2, m1=2)
ps = po.simul.PortillaSimoncelli(folded_img.shape[-2:], spatial_corr_width=7)
masks = []
for i in range(2):
    mask = [torch.zeros_like(img)[0] for i in range(2)]
    mask_sz = int(256 // len(mask))
    for j, m in enumerate(mask):
        if i == 0:
            m[..., j*mask_sz:(j+1)*mask_sz, :] = 1
        else:
            m[..., j*mask_sz:(j+1)*mask_sz] = 1
    masks.append(torch.cat(mask, 0) / 128)
ps_mask = po.simul.PortillaSimoncelliMasked((256, 256), masks, spatial_corr_width=7)
ps_mask._stability_epsilon = 0
ps_rep = ps.convert_to_dict(ps(img))
mask_rep = ps_mask.convert_to_dict(ps_mask(img))

However, this doesn't work because, in the regular model on the "folded image" above, the steerable pyramid coefficients, their magnitudes, and the reconstructed lowpass images on each of those four regions will all have zero mean, whereas in the pooled version, they all have a global zero mean: across the whole image, but not within each region. This ends up being fine for the purposes of synthesis, but means the outputs are quite different.

Maybe I can just synthesize the two metamers and (after rearrangement) check that each version of the model agrees it's a good metamer?

Would be nice to better understand the relationship between my implementation and at least Freeman's implementation and Brown et al. 2023's. The actual implementation details are different enough that getting the model outputs to match seems unlikely, but maybe check that metamers for those models are decent metamers for mine?
Any other tests?

…onalVision/plenoptic into pooled_texture

several steps all in here together - store the n_autocorrs as a hidden attribute for easy reference. - rename n_shifts to n_autocorrs throughout - fix autocorr calculation: the output of torch.gather has the same shape as the *indices*, for some reason. this meant that we were doing something weird when we were computing the autocorrs of the magnitudes, which had an orientation dimension as well. changed how those indices are created in order to fix this - fixes shape_dict and necessary stats mask to work with new shapes: now only 4 pixel stats and the autocorrs dropped a dimension, and no longer have any unnecessary stats - got compute_cross_correlation working. this also required changing the shape of the variance returned by the autocorr function, which we then change the shape of afterwards - gets plot_representation callable. this still needs more work, but right now it runs, creating a bunch of images

change back to using stem plot, gets update_plot working on GPU

two changes here: - use register_buffer for masks and autocorr_rolls, so that we can move model to GPU correctly - when downsampling the masks, scale the pixel values up so that the sum remains approximately the same across scales

…ilar magnitude as other stats

with this: - use blur_downsample rather than shrink to downsample masks across scales (avoids aliasing) - check that mask has no negative values (which messes up computation of say, variance) - clip the downsampled masks so that they have no negative values, for same reason as above

Two changes, intermixed here: - adds _division_epsilon and _sqrt_epsilon attributes, which are used to make the sqrt and division operations more stable, in order to avoid NaNs or very large numbers. still needs work to determine what values are reasonable - we still had some redundant autocorrs, which I removed: was including the "other side" of the diagonal, as well as the "zero shift" term itself. got rid of those, and then added the extra shifts for even spatial_corr_width

… into pooled_texture

instead of separate epsilons for division and sqrt, we now use a single epsilon. this works as long as the windows aren't *too* small, which can be checked by taking their sum. this also removes the unstable_locs function which, with the above, wasn't necessary

review-notebook-app · 2024-08-19T21:34:27Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

codecov · 2024-08-19T21:49:13Z

Codecov Report

Attention: Patch coverage is 15.52795% with 272 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...ptic/simulate/models/portilla_simoncelli_masked.py	14.73%	272 Missing ⚠️

Files with missing lines	Coverage Δ
src/plenoptic/simulate/models/__init__.py	`100.00% <100.00%> (ø)`
...c/plenoptic/simulate/models/portilla_simoncelli.py	`97.76% <100.00%> (+0.35%)`	⬆️
src/plenoptic/tools/conv.py	`100.00% <100.00%> (ø)`
...ptic/simulate/models/portilla_simoncelli_masked.py	`14.73% <14.73%> (ø)`

... and 1 file with indirect coverage changes

ershook · 2024-10-11T17:02:37Z

Adding thoughts to questions/comments inline below:

Currently, both of these approaches are supported -- should they be? Or should we require the user to pass two masks that we multiply together? I can't think there's any case in which someone would pass a list with three or more mask tensors, but that would work as well. Even if we allow a list of masks, should we allow more than two?

At least for me I don’t see a need of having more than two mask tensors or two masks more generally. Although I do think for simplicity allowing either format of mask input may be useful. I guess for large images is where using the product of the two masks will be important — I can imagine a world where I will want to measure statistics on larger images where this will be important.

Before moving on, another question: how to handle mask normalization? The model works best if the individual masks (as shown above) sum to approximately 1 because, otherwise, some of the statistics end up being much larger than the others and optimization is hard. Currently, we're not normalizing within the model code, but should we?

I would normalize within the model code — it’s one of the things that I could imagine a first time user forgetting even if you make it very explicit or at the very least have a warning in the case that its not normalized that reminds the user this is an issue.

For the pooling regions, you want to avoid aliasing. This is an interaction between the sampling distance and the function used to generate the regions, and can be checked by seeing if they're equivariant to translation (if the windows are Cartesian) or rotation / dilation (if the windows are polar). I don't think the model should check this, but we can show an example of how to check this in our documentation and point out that this is important.

This seems reasonable to me.

The foveated pooling windows are not currently in plenoptic and are a pain to implement yourself. Pytorch implementations exist (as well as ways to get existing windows into pytorch), including one that I wrote. I can show examples making use of some of these, is that sufficient? None of them are python packages (so they can't be installed with pip), but I could at least package the one I wrote at least. I've hesitated because it's definitely research code, but it would simplify the process for people.

I have always found using the pooling-windows repo straightforward so I think it is sufficient. You could package it but cloning has always been easy to get running for me.

billbrod added 27 commits August 9, 2024 09:57

initialize masked

59b9e9e

adds mask to model

1528fa5

changes compute_pixel_stats to weighted version

966d8ad

create mask per scale, updates pixel stats

09a4e43

adds masked autocorr computation

ded8204

bugfix: need extra scale for reconstructed image autocorr

ad79bcf

updates comments describing shapes

0d61ded

makes compute_skew_kurtosis_recon work with masks

e5851a2

update name: m -> scale_mask to be consistent

dff122d

Merge branch 'portilla_simoncelli_vars' of github.com:LabForComputati…

2c7cfc9

…onalVision/plenoptic into pooled_texture

sets _compute_cross_correlation to new version

885a17c

updates docstring

230ecf1

update masked autocorr function for dim rearrangement

fa9135f

fix portilla_simoncelli.update_plot for GPU

153000a

add masked model back to init

69e1796

changes to ps_masked plotting

c8f6285

change back to using stem plot, gets update_plot working on GPU

gets working with GPU, correct across-scale issue

44592d3

two changes here: - use register_buffer for masks and autocorr_rolls, so that we can move model to GPU correctly - when downsampling the masks, scale the pixel values up so that the sum remains approximately the same across scales

compute_pixel_stats now expects single-scale mask

5793aed

normalize skew/kurtosis recon with variance, which brings them to sim…

62e5f6d

…ilar magnitude as other stats

bugfix: get blur_downsample working with other dtypes

9063395

Merge branch 'main' of github.com:LabForComputationalVision/plenoptic…

53cb7f0

… into pooled_texture

makes PS.plot_representation defaults more reasonable

888dc8c

adds example notebook

dc3a700

adds nblink file

46f5a1e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pooled texture model #284

Pooled texture model #284

billbrod commented Aug 19, 2024 •

edited

Loading

review-notebook-app bot commented Aug 19, 2024

codecov bot commented Aug 19, 2024 •

edited

Loading

ershook commented Oct 11, 2024

Pooled texture model #284

Are you sure you want to change the base?

Pooled texture model #284

Conversation

billbrod commented Aug 19, 2024 • edited Loading

review-notebook-app bot commented Aug 19, 2024

codecov bot commented Aug 19, 2024 • edited Loading

Codecov Report

ershook commented Oct 11, 2024

billbrod commented Aug 19, 2024 •

edited

Loading

codecov bot commented Aug 19, 2024 •

edited

Loading