-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Richardson CG #29
base: devel
Are you sure you want to change the base?
Richardson CG #29
Conversation
Excellent work Martin. I will pull-request on QPhiX soon such that someone else can test this. As for the time to solution: did you actually do mixed precision or did you have a double-double solver? Also, you oversubscribed, so any benefit might be masked by that... |
Thank you for the kind words! It seems I did a single-single solve. My QDP++ is configured with default precision (like to default to single) and Chroma is configured with default inner-precision (defaults to single). Therefore I have a inner and outer veclen of 8 on my 256-Bit machine. So no conversion was done. I will recompile QDP++ and Chroma using double-precision on my machine, then I can test it again and also I will use 1 MPI rank and 2×2 threads. |
In the case of differing inner and outer precisions, it does not compile any more:
So this needs a bit more work to be useful, I work on it. |
Seemed to be an easy fix. Now it works with double-single on my laptop, though the speed-up is not significant. I'll set it up on Hazel Hen after getting ice cream. |
@bjoo you mentioned that you had observed some bicgstab convergence issues. If this was "physics", these changes may help. Using Richardson-CG for the most chiral determinant ratio monomial may solve your stability problems. |
@martin-ueding How come there is a conflict? Does that just mean that a merge must take place? |
I have no idea, but it seemed strange to me as well. I probably did not do a |
Conflict is gone now, it was a simple issue, Bálint has added a I just see that I have added a |
Now it is working on Hazel Hen, this is the performance on the thermalized L = 24 lattice. One can see that Richardson-CG has an advantage over regular CG, though BiCGStab is still faster. But in our situation where BiCGStab is not stable, this already is a nice win: The details follow, it is the first solve that is done. The other monomials probably have a different speed-up, but I did not further look into it because it seems to work and it seems faster. Regular CG (double):
Richardson-BiCGStab (double-float):
Richardson-CG (double-float)
|
Excellent, looks good. We'll try with Richardson-BiCGstab for all monomials to begin with and if this proves unstable for the lightest determinant ratio, we'll go with Richardson-CG there. The more iterations, the closer the advantage of mixed-precision gets to the theoretical factor of 2. |
I have removed the |
Could you adjust the output to read:
for completeness? |
Yes, that can be done. While we are at it, the XML name should also be changed in order to remove the An alternative way would be to register the solver in the factory twice with the new and the old name. Existing configuration files would only get the correct behavior if there is no Regarding the breaking of backwards compatibility against having two entries, what would be the preferred option? Or just leave the name as-is and hope that users find out from the C++ code that CG is supported now? |
I have changed the name in standard output from |
Yes, that looks good. Currently testing this (without the new name) on Hazel Hen. CG double-single iterative refinement in all light determinant and determinant ratio terms gives a speed-up of roughly 1.8 over CG in double (speed-up measured on total trajectory time). Not bad, I'd say! I will prepare a pull-request for my QPhiX branch (need to check some things for completeness before proceeding) and then this can be tested by someone else. A point which might be interesting for backwards-compatiblity reasons is to use BICGSTAB by default, such that input files which don't specify the solver type will continue to function unmodified. |
I just had a quick look at the code and I don't see where the This is probably is not hard to fix, I just did not see it right away. I'll be out of the loop for some weeks now, so please just go ahead and change it. |
Okay, thanks. I'll take over from here then. |
Exposing the QPhiX Richardson Solver with CG in Chroma
Rationale
Chroma already has the QPhiX Richardson solver exposed, but only with BiCGStab
as a solver. For our Wilson clover simulaton, the BiCGStab seems to be really
unstable. Bartek has extended QPhiX such that one can use the CG in the
Richardson solver. We want to have this exposed in Chroma in order to use it
for our HMC simulation.
Current Implementation
HMC and Propagator Solver
With a
git grep QPHIX_CLOVER_ITER_REFINE_BICGSTAB_INVERTER
one can find wherethe Richardson solver is defined within Chroma:
lib/actions/ferm/invert/qphix/syssolver_linop_clover_qphix_iter_refine_w.cc
lib/actions/ferm/invert/qphix/syssolver_mdagm_clover_qphix_iter_refine_w.cc
In the
linop
file, there is a comment stating:Whereas in the
mdagm
file, there is a comment stating:The corresponding header files enlighten with the following comment:
At this point, I am completely lost. I guess that one inverts M†M whereas
the other only inverts M. My bet is that
mdagm
solves M†M andlinop
solves, well, just the linear operator M.
Indeed, there is one call to
mixed_solver
in thelinop
version and twoinvocations in the
mdagm
variant:Versus:
So these two are indeed the solvers for the propagators (M) and the HMC (M†M).
Despite the comments in the code, both cases use the BiCGStab solver as the
inner solver. The outer solver always has been the BiCGStab solver, now it is
this only by default after Bartek's changes.
“Regular” QPhiX Solver
For the regular QPhiX solver, there is an additional XML tag that can be used
to select the CG or BiCGStab solver. It would be nice if something similar
could be implemented for the Richardson solver such that the user can specify
whether CG or BiCGStab is to be used.
Analogously to the Richardson solver, the regular solver is defined in a HMC
and a propagator variant:
lib/actions/ferm/invert/qphix/syssolver_linop_clover_qphix_w.cc
lib/actions/ferm/invert/qphix/syssolver_mdagm_clover_qphix_w.cc
In the constructor of the parameter
struct
, it reads the solver type into anenum
variable:And then later on there is a
switch
statement using dynamic polymorphism tocreate the correct operator.
For the inversion of M, the actual inversions happen these two functions:
This uses the “conjugate gradient normal relation” to invert M using CG whereas
for BiCGStab this is just a simple application. For the M†M case in the other
file, there are these methods:
There is a
switch
statement on theoperator()
which chooses the rightfunction to call based on the runtime parameter.
Planning
The regular QPhiX solvers (solvers meaning inverter for M and M†M) have a
switch
to allow for both CG and BiCGStab. There is no dynamic polymorphisminvolved because CG might need the normal relation in the M case and BiCGStab
needs two applications in the M†M case. Therefore this should be applicable to
the Richardson solver as well.
For now we only need the Richardson solver in the HMC as we do our propagators
with the sLapH method using the peram_gen code based on tmLQCD and QUDA.
Therefore I will start with the M†M implementation and do the M implementation
later on.
Implementation
The new feature needs to be developed against Bartek's QPhiX branch. This has
to be compiled and installed locally, then Chroma has to build against this.
This worked directly, which is a nice surprise.
The regular and Richardson solvers share the
SysSolverQPhiXCloverParams
struct
which means that also has the fieldSolverType
. Therefore the XMLparsing does not need to be changed, the option just needs to be honored. In
the
syssolver_mdagm_clover_qphix_iter_refine_w.h
, I add the sameswitch
statement as there is already in
syssolver_mdagm_clover_qphix_w.h
in thefollowing places:
as the outer solver.
operator()
which takes the chronological predictor has theswitch
.The regular QPhiX solver has member functions
cgSolve
andbiCGStabSolve
.Similar functions are going to be implemented in the Richardson case as well.
There the
operator()
is renamed intobiCGStabSolve
, thecgSolve
is addedanalogously to the regular CG case. This will be easier to implement because I
just need to call CG once and I have solved M†M.
Testing
Short HMC on my Laptop
I have compiled Chroma using Bartek's QPhiX branch on my AVX laptop and tested
the HMC with Nf = 2 on a 8³×8 lattice from a random starting configuration. I
have let it two trajectories with just a single time step on the outermost time
scale. I have used two MPI ranks with four threads each (effectively
oversubscribing my laptop)
Comparing CG (
-
) with Richardson-CG (+
), one can see that there arechanges, but not very large. I would conclude that the Richardson-CG did not
break anything.
You can have a look at the input file hmc-in.xml.txt, the output from plain CG
dat-cg.xml.txt, and the output from Richardson-CG dat-richardson-cg.xml.txt.
The time to solution overall seems to be the same, this can probably not be
seen on my laptop with a non-thermalized random configuration.