Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SplineC2C/R2R rotation with BLAS #4710

Merged
merged 15 commits into from
Aug 28, 2023

Conversation

camelto2
Copy link
Contributor

@camelto2 camelto2 commented Aug 24, 2023

Please review the developer documentation
on the wiki of this project that contains help and requirements.

Proposed changes

change naive matrix multiplication in SplineR2R::applyRotation to a blas call. Speeds the applyRotation call up by quite a bit.

note that these changes should be covered by the unit test, just changing how the matrix multiplication is done

Update: also added the corresponding change needed for SplineC2C.

What type(s) of changes does this code introduce?

Delete the items that do not apply

  • New feature

  • Other (please describe):

Does this introduce a breaking change?

  • No

What systems has this change been tested on?

macOS

Checklist

Update the following with a yes where the items apply. If you're unsure about any of them, don't hesitate to ask. This is
simply a reminder of what we are going to look for before merging your code.

  • Yes. This PR is up to date with current the current state of 'develop'
  • Yes. Code added or changed in the PR has been clang-formatted
  • No. This PR adds tests to cover any new code, or to catch a bug that is being fixed
  • No. Documentation has been added (if appropriate)

@quantumsteve
Copy link
Contributor

quantumsteve commented Aug 24, 2023

Let's be careful changes in derived classes aren't lost as we move to templated classes in #4684

@markdewing
Copy link
Contributor

There may be some subtleties related to precision. If a single precision spline is used, this change reduces the rotation matrix to single precision before the multiplication.

}
}
rot_mat_padded[i * Nsplines + j] = rot_mat.data()[i * OrbitalSetSize + j];
BLAS::gemm('N', 'N', Nsplines, BasisSetSize, Nsplines, ST(1.0), rot_mat_padded.data(), Nsplines, (*coef_copy_).data(), Nsplines, ST(0.0), spl_coefs, Nsplines);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we don't need rot_mat_padded.

BLAS::gemm('N', 'N', OrbitalSetSize, BasisSetSize, OrbitalSetSize, ST(1.0), rot_mat.data(), OrbitalSetSize, (*coef_copy_).data(), Nsplines, ST(0.0), spl_coefs, Nsplines);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's still a precision issue, unless BLAS::gemm can handle matrices with different precision types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was having trouble getting the BLAS call to work without padding. I am trying to figure it out now

I agree there is still a precision issue. I wasn't sure the best way to handle that either. the spline coeffs are ST type and the rotation is RealType. The BLAS::gemm doesn't like mixed precision types, so I had to do some conversion somewhere. If I changed coef_copy_ back to RealType, and the rotation is RealType, the problem then lies in spl_coefs. It is ST*, and I can't just cast it to RealType* in the case of ST=float.
Making another temporary copy would be costly memory wise

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you intending to work with single or double precision splines? I ran into some problems with rotation and single precision splines early on, and since then have only used double precision splines.
The code might need to be split differently based spline precision - call gemm for double precision, and an explicit loop for single precision. That would at least speed up the double precision case.

The precision issue also relates to the history list vs. global rotation method. With the history list, the rotations get accumulated in the existing coefficient storage. Single precision coefficients fail at this accumulation of small changes. With the global rotation method this might not be an issue, since this accumulation of rotations happens in the rotation matrix. It might work to compute the rotation matrix in double precision, then convert to single precision, and then do the matrix multiplication. But that would need to be tested, since the matrix multiplication step might also be sensitive to precision.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer double only, but some of the unit tests use float so I was trying to get it to work in general

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be okay with different code paths for float vs double (use the loop for float and gemm for double). Using "if constexpr" to test the type should make it straightforward.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markdewing, added a split depending on float vs. double. this should avoid the loss in precision.

@ye-luo still trying to figure out how to avoid adding the padded rotation matrix. I thought it should be as simple as fiddling with the LDx parameters, but having trouble getting it to work

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A code path split by precision should also avoid the need for the extra padded rotation matrix.

Copy link
Contributor Author

@camelto2 camelto2 Aug 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ye-luo I think I have the blas call correct now to avoid needing to construct the padded rotation matrix. It is working correctly on some test matrices, and unit tests pass

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we don't need rot_mat_padded.

BLAS::gemm('N', 'N', OrbitalSetSize, BasisSetSize, OrbitalSetSize, ST(1.0), rot_mat.data(), OrbitalSetSize, (*coef_copy_).data(), Nsplines, ST(0.0), spl_coefs, Nsplines);

I should have read this in detail sooner. I didn't realize you had already told me how to it. Took me a bit, but I stumbled onto doing correctly

@camelto2
Copy link
Contributor Author

camelto2 commented Aug 24, 2023

Let's be careful changes in derived classes aren't lost as we move to templated classes in #4684

Thanks for pointing this out. I had two PRs recently (#4706, #4701) that touched RotatedSPOs that doesn't look like it has been captured in #4684. I can make a PR into @williamfgc's branch with the corresponding changes to bring RotatedSPOs up to date. And if this gets merged once I sort out the review comments can do the same for this if that works

@williamfgc
Copy link
Contributor

@camelto2 @quantumsteve thanks for bringing this up. Your changes can be accommodated into the refactoring effort after merged into develop, we just won't touch it for now. We rebase develop constantly.

(*coef_copy_).data(), Nsplines, ST(0.0), spl_coefs, Nsplines);
}
else
{
Copy link
Contributor

@jptowns jptowns Aug 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tough: apparently everything I did had hidden type mis-matches because ST is not guaranteed to be equivalent to ValueType. Seems to me the underlying problem is the type system of the splines is kinda divorced from the rest of the code. If it's true that single precision may be problematic for rotations, then why not ditch single precision splines and keep the production code simple? Or is there some way to enforce double precision splines if rotation is added? Just trying to think of ways to avoid adding complexity to production code if not absolutely necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess is single precision splines are still of interest for size reasons.

markdewing
markdewing previously approved these changes Aug 24, 2023
@camelto2
Copy link
Contributor Author

Figured out how to do C2C correctly using the BLAS call. added that here as well

@camelto2 camelto2 changed the title SplineR2R rotation with blas SplineX2X rotation with BLAS Aug 24, 2023
@ye-luo ye-luo changed the title SplineX2X rotation with BLAS SplineC2C/R2R rotation with BLAS Aug 25, 2023
markdewing
markdewing previously approved these changes Aug 28, 2023
@markdewing
Copy link
Contributor

Test this please

jptowns
jptowns previously approved these changes Aug 28, 2023
src/QMCWaveFunctions/BsplineFactory/SplineC2C.cpp Outdated Show resolved Hide resolved
else
{
// if ST is float, RealType is double and ValueType is std::complex<double> for C2C
// Just use naive matrix multiplication in order to avoid losing precision on rotation matrix
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of doubt precision loss matters.

Copy link
Contributor Author

@camelto2 camelto2 Aug 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markdewing was mentioning that he had run into problems with precision when using the float splines and wanted the split between double and float. Maybe he can chime in with more details

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were cases where splines with float precision had trouble with orbital rotation optimization, and those problems went away when I switched to double precision splines. I haven't investigated further.
The important point for these code paths is that the type of the spline (ST) and the main code type (RealType) have to match for BLAS calls to work without additional copies. For full precision builds, that means double precision splines. For mixed precision builds, I would guess the single precision splines would follow the BLAS code path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How it stands currently is that coefs_copy_ and spl_coefs are ST, whereas rot_mat is passed in as ValueType. There isn't a mixed precision blas call, so in order to use blas we would have to make a copy of rot_mat of ST type. That would allow BLAS to be used always, but you are losing precision when ST is float since it is passed in as ValueType (double or std::complex). I'm not entirely sure how much that precision matters, since the output spline coeffs are ST anyway. We are doing an extra copy, but it is only Norb^2 so shouldn't be a problem.

How it is currently written avoids the extra copy of rot_mat, but only benefits from BLAS when ST == RealType.

How common are runs where ST != RealType?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NiO performance tests have the splines set with precision="single", so I think it's still an important case in general. I'm not sure how important it will be for cases involving rotation. Though my guess is someone will want to try it due to memory pressure on the size of the spline coefficients.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us defer the investigation of precision. It is important but we need the feature working first.

src/QMCWaveFunctions/BsplineFactory/SplineR2R.cpp Outdated Show resolved Hide resolved
@camelto2 camelto2 dismissed stale reviews from jptowns and markdewing via 91adaa9 August 28, 2023 20:24
@ye-luo
Copy link
Contributor

ye-luo commented Aug 28, 2023

Test this please

@ye-luo ye-luo enabled auto-merge August 28, 2023 21:10
@prckent
Copy link
Contributor

prckent commented Aug 28, 2023

Agreeing with Ye here - let us focus on getting the feature working.

@ye-luo ye-luo merged commit 6c7b0d3 into QMCPACK:develop Aug 28, 2023
22 of 23 checks passed
@camelto2 camelto2 deleted the spl_rotation_with_blas branch August 28, 2023 22:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants