Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add PD3O with tests #1834

Merged
merged 16 commits into from
Aug 22, 2024
Merged

Conversation

epapoutsellis
Copy link
Contributor

@epapoutsellis epapoutsellis commented Jun 14, 2024

Description

Adding PD3O algorithm with unitests.

Implementation based from A new primal-dual algorithm for minimizing the sum of
three functions with a linear operator
. With this algorithm we have for free 2 additional algs: : PAPC, David-Yin (sometimes called PDDY). PDHG is a special case. Also, we can have Condat-Vu, AFBA, see details in the above paper. Finally, all of the above algorithms can be combined with stochastic (variance-reduced) algorithms.

Below some results, presented 2 years (Workshop on modern image reconstruction algorithms and practices for medical imaging) ago using the stochastic framework just before removing the 1/n weight convention.

Screenshot 2024-06-14 at 15 37 57

Related issues/links

Closes #1890

Checklist

  • I have performed a self-review of my code
  • I have added docstrings in line with the guidance in the developer guide
  • I have updated the relevant documentation
  • I have implemented unit tests that cover any new or modified functionality
  • CHANGELOG.md has been updated with any functionality change
  • Request review from all relevant developers
  • Change pull request label to 'Waiting for review'

Contribution Notes

Please read and adhere to the developer guide and local patterns and conventions.

  • The content of this Pull Request (the Contribution) is intentionally submitted for inclusion in CIL (the Work) under the terms and conditions of the Apache-2.0 License
  • I confirm that the contribution does not violate any intellectual property rights of third parties

@gfardell
Copy link
Member

Thanks @epapoutsellis this looks great! We're not far from releasing 24.1, but we'll be planning the next release in a couple of weeks and we can add this to the workplan.

@MargaretDuff
Copy link
Member

Thanks @epapoutsellis! It looks good. I added a few todos for myself around testing and documentation. Feel free to answer them if you have time before me but I will aim to look again around the beginning of July.

@MargaretDuff
Copy link
Member

MargaretDuff commented Jul 2, 2024

Hi @epapoutsellis - I am going through this in detail and have a few questions:

  1. Comparing the code to the paper, I have lost this term:
    image
    Can you point me in the right direction in the code?

  2. How do we want to set default step sizes:
    The paper suggests convergence if
    image
    where $\beta =1/f.L$ and $\lambda=\delta\gamma$ and thus I suggest
    image
    but don't know if you have any experience in practice

  3. I presume we raise a warning not an error for testing purposes so we can compare against PDHG?
    image

  4. Do we want a check convergence option, like we have for PDHG and FISTA?

@MargaretDuff
Copy link
Member

MargaretDuff commented Jul 2, 2024

  1. How do we want to set default step sizes:
    The paper suggests convergence if
    image
    where β=1/f.L and λ=δγ and thus I suggest
    image
    but don't know if you have any experience in practice

Actually, playing with this further. I considered

    def test_pd3o_vs_fista(self):
        alpha = 0.1  
        G = alpha * TotalVariation(max_iteration=5, lower=0)

        F= 0.5 * L2NormSquared(b=self.data)
        algo=FISTA(f=F, g=G,initial=0*self.data)
        algo.run(200)

        F1= 0.5 * L2NormSquared(b=self.data)
        H1 = alpha *  MixedL21Norm()
        operator =  GradientOperator(self.data.geometry)
        G1= IndicatorBox(lower=0)
        norm_op=operator.norm()
        

        algo_pd3o=PD3O(f=F1, g=G1, h=H1, operator=operator, initial=0*self.data, gamma=gamma, delta=delta)
        algo_pd3o.run(500)

        np.testing.assert_allclose(algo.solution.as_array(), algo_pd3o.solution.as_array(), atol=1e-2)
        np.testing.assert_allclose(algo.objective[-1], algo_pd3o.objective[-1], atol=1e-2)

For the proposed default step sizes:

gamma =2./F.L
delta = 1./(gamma*norm_op**2)

PD3O does not converge to the correct solution however with

gamma = 0.99*2./F.L
delta = 1./(gamma*norm_op**2)

it does converge. Any thoughts?

@jakobsj
Copy link
Contributor

jakobsj commented Jul 3, 2024

For the step size, it sounds like the "safety factor" of 0.99 is sensible.

For you question 1 on AA^T having gone missing. Maybe I misunderstand, but I read the implemented code as based on equation 4 (also the code comment states this), which is reformulated from equation 3, and the AA^T appears only in eq 3.

Not sure if this affects whether AA^T should be included in the step size bound as well?

@jakobsj
Copy link
Contributor

jakobsj commented Jul 3, 2024

Eq. (4a) contains the gradient of l^*. I don't see that in the code?

@MargaretDuff
Copy link
Member

For you question 1 on AA^T having gone missing. Maybe I misunderstand, but I read the implemented code as based on equation 4 (also the code comment states this), which is reformulated from equation 3, and the AA^T appears only in eq 3.

Aaah - I was looking at the arxiv version of the paper which had a different eq4. That makes more sense.

@MargaretDuff
Copy link
Member

Eq. (4a) contains the gradient of l^*. I don't see that in the code?

The paper considers optimising
image
where we are considering an objective $f(x)+g(x)+h(Ax)$. I don't know enough about infimal convolutions to know which $l$ makes the two equivalent
image

@jakobsj
Copy link
Contributor

jakobsj commented Jul 3, 2024

For reference the link to the published version:
https://link.springer.com/article/10.1007/s10915-018-0680-3

In the lines just below eq (4) it says l^* is considered equal to zero. So perhaps this implementation assumes l^*=0. @epapoutsellis can you confirm?

@jakobsj
Copy link
Contributor

jakobsj commented Jul 3, 2024

Re. question 3, whether a warning. I think it makes sense to allow the algorithm to run with ZeroFunction being passed. I think also consistent with allowing this in PDHG and FISTA? Have you checked if the algorithm behaves sensibly with a ZeroFunction passed, including converging to expected result? I think having a warning is okay to recommend the user to switch to PDHG, but allow it to run. If the algorithm runs and behaves as expected I'd also be okay with omitting the warning.

@MargaretDuff
Copy link
Member

MargaretDuff commented Jul 3, 2024

For reference the link to the published version: https://link.springer.com/article/10.1007/s10915-018-0680-3

In the lines just below eq (4) it says l^* is considered equal to zero. So perhaps this implementation assumes l^*=0. @epapoutsellis can you confirm?

I think that makes sense, if $l= \iota_{0}$ then $h\square l = h$, $l^* =0$ and thus $\nabla l^*=0$

@MargaretDuff
Copy link
Member

Re. question 3, whether a warning. I think it makes sense to allow the algorithm to run with ZeroFunction being passed. I think also consistent with allowing this in PDHG and FISTA? Have you checked if the algorithm behaves sensibly with a ZeroFunction passed, including converging to expected result? I think having a warning is okay to recommend the user to switch to PDHG, but allow it to run. If the algorithm runs and behaves as expected I'd also be okay with omitting the warning.

Yes, one of the tests compares the behaviour of PDHG and PD3O with a zero function. I think in PDHG we have the strong-convexity additions so it is probably worth the user using that over PD3O

@jakobsj
Copy link
Contributor

jakobsj commented Jul 3, 2024

OK great yes then I suggest keeping the warning to allow users to run PD3O with ZeroFunction to deliberately create the special case algorithm and compare with our PDHG implementation, and continue to have the warning recommend the user to switch to PD3O for efficiency reasons.

@jakobsj
Copy link
Contributor

jakobsj commented Jul 3, 2024

BTW, a few places appear to use the number 0 instead of the letter O in ie "PD30" instead of "PD3O", please search and replace.

@epapoutsellis
Copy link
Contributor Author

Yes, I use equations below from https://doi.org/10.1007/s10915-018-0680-3

Screenshot 2024-07-03 at 16 11 22
  • No need to cover the inf-conv case for now. Maybe an InfimalConvolution Function class in the future could be useful and compute the pdgap for the PD3O case. This is why, I included only primal objective in the update_objective.

  • Better to allow f=ZeroFunction, with a warning. In this case, user can go back to PDHG which is safer in terms of default step-sizes.

  • For the step-sizes, better to have 0.99*2 which close to theory. In practice, it depends on iteration number used for the power method and the projection operator (shape of ig, ag etc). "Better" step-sizes are in https://arxiv.org/pdf/2201.00139, table 2. This also related to the previous point. In theory, you can use any $(0,\infty)$ parameter when f=0. In practice, I noticed that you need to tune this parameter. Another option is to use instead an epsilon/2*L2normSquared() where epsilon is small.

@MargaretDuff
Copy link
Member

MargaretDuff commented Jul 11, 2024

  • For the step-sizes, better to have 0.99*2 which close to theory.

Thanks Vaggelis, I think i have done this.

In theory, you can use any (0,∞) parameter when f=0. In practice, I noticed that you need to tune this parameter. Another option is to use instead an epsilon/2*L2normSquared() where epsilon is small.

I have set the same default as PDHG when f=0 so 1.0/operator.norm() for both gamma and delta

@MargaretDuff
Copy link
Member

@epapoutsellis @jakobsj - please can I revive this PR. Do you have any more comments?

Copy link
Contributor

@jakobsj jakobsj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a brief look at updates and everything looks good to me, thanks! Exciting to have PD3O added - thanks!

Copy link
Member

@gfardell gfardell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be more memory efficient with some simple changes to the order of operations in update. This will impact what needs to be in set_up and update_objective too.

Wrappers/Python/cil/optimisation/algorithms/PD3O.py Outdated Show resolved Hide resolved
Wrappers/Python/cil/optimisation/algorithms/PD3O.py Outdated Show resolved Hide resolved
@MargaretDuff MargaretDuff added the Merge pending jenkins Once jenkins is happy with can be merged label Aug 21, 2024
Signed-off-by: Margaret Duff <[email protected]>
@MargaretDuff
Copy link
Member

Jenkins is happy
image

@MargaretDuff MargaretDuff merged commit bd21df3 into TomographicImaging:master Aug 22, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Merge pending jenkins Once jenkins is happy with can be merged
Projects
No open projects
Status: Needs reviewing
Status: ToDo
Development

Successfully merging this pull request may close these issues.

Add PD3O to extend our algorithm class
4 participants