Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ode-vec branch integration #21

Open
ladc opened this issue Jan 28, 2014 · 18 comments
Open

ode-vec branch integration #21

ladc opened this issue Jan 28, 2014 · 18 comments

Comments

@ladc
Copy link
Contributor

ladc commented Jan 28, 2014

I took the liberty of rebasing your ode-vec branch on the master because I needed the RK8PD integrator recently (in vector form). Do you still intend to integrate this feature? If there are any known issues with it I can try to fix them and clean it up a little.

@franko
Copy link
Owner

franko commented Jan 28, 2014

Hi Lesley,

I will be glad to accept your contribution if you got time and interest to work on that. As far as I remember there were no real problems to finalize the integrators and the work was mostly finished or at least functional. I believe that the main remaining problem was that the integrator was exposing to the user a crude C array and I was planning to wrap the array in a column matrix.

If you are interested I can review the code to make you a point about the its current state. I'm pretty busy at work but I guess I should be able to spare one hour or two for that.

By the way, not related to that, are you knowledgeable about permittivity (dielectric constant) calculation in a crystal ? We are interested about quantum confinement effect of the permittivity in the optical spectrum and I would like to know what is computable and how for a crystal. Do you know where I can post these kind of questions to get some directions ?

@ladc
Copy link
Contributor Author

ladc commented Jan 28, 2014

Hi Francesco,

I haven't scrutinized the code, but indeed it seems to be functional. However, on one occasion I saw that it caused memory usage to ramp up steadily, followed by a LuaJIT out-of-memory panic. I'm having a hard time reproducing that, though. Otherwise, it was fairly compatible with my home-brewed array type (which has a data part and some questionable metamethods) -- yay for duck typing, I suppose :-) I'll have a look at how the interface could be made a bit friendlier (just-in-time conversion from/to a column matrix or so?). It might be a week or so before I'll have a closer look at that though. But I think it would be a shame not to merge this functionality in the main branch (this and other features!)

As for the permittivity calculations in a crystal, I'm afraid my knowledge of solid state physics is more or less limited to textbook basics. Do you mean quantum confinement as in quantum dots?
I know of someone who numerically solves Maxwell's equations (http://mumax.github.io), but I suppose that's not quite the scale you're looking at :-)

@ladc
Copy link
Contributor Author

ladc commented Feb 7, 2014

Apparently the problem disappeared after I had pulled in the latest LuaJIT. The excessive memory usage was due to this bug:
http://www.freelists.org/post/luajit/GC-break-after-Fix-GC-steps-threshold-handling-when-called-by-JITcompiled-code
which was fixed in commit 9d90988. It was related to the earlier GC bug fix.

@franko
Copy link
Owner

franko commented Feb 9, 2014

Hi Lesley,

Apparently the problem disappeared after I had pulled in the latest LuaJIT. The excessive memory usage was due to this bug:
http://www.freelists.org/post/luajit/GC-break-after-Fix-GC-steps-threshold-handling-when-called-by-JITcompiled-code
which was fixed in commit 9d90988. It was related to the earlier GC bug fix.

good to know. Normally the luajit fix will be merged with the next gsl shell release but I can merge the luajit's changes in the master branch now if you prefer.

Francesco

@ladc
Copy link
Contributor Author

ladc commented Feb 9, 2014

That's fine by me, but there's no hurry!

@ladc ladc closed this as completed Feb 9, 2014
@ladc ladc reopened this Feb 9, 2014
@ladc
Copy link
Contributor Author

ladc commented Feb 16, 2014

Francesco, why exactly do cblas.lua and gsl.lua return the global namespace (ffi.C) under Linux? I just noticed that this is about 20x slower than if I explicitly load the library (again?) with return ffi.load("gslcblas"). Any idea why?

@ladc
Copy link
Contributor Author

ladc commented Feb 16, 2014

I wrote:

this is about 20x slower than if I explicitly load the library (again?) with return ffi.load("gslcblas"). Any idea why?

Apparently that was due to libgslcblas being astronomically faster than the libblas that was linked in...

@franko
Copy link
Owner

franko commented Feb 16, 2014

I'm not sure but I think the reason for returning ffi.C was that the gsl library were already linked to the executable because of the C API calls to the GSL library.

I didn't understood the point about which library is faster then which other. Normally gslcblas is very slow compared, for example, to openblas.

@ladc
Copy link
Contributor Author

ladc commented Feb 18, 2014

I'm not sure but I think the reason for returning ffi.C was that the gsl library were already linked to the executable because of the C API calls to the GSL library.

Yes, that was probably the reason. If I interpret the LJ docs, this would the case on all POSIX systems though (OSX as well) so perhaps it's safest to only explicitly load the library on Windows and just return ffi.C by default. Do you have a way to check if this works for OSX?

I didn't understood the point about which library is faster then which other. Normally gslcblas is very slow compared, for example, to openblas.

I just tested it against openblas, and that also appears to be much slower than gslcblas for this application. But in the general case it's better to use the system blas library (that's why the blas implementation was split off from GSL in the first place).

@ladc
Copy link
Contributor Author

ladc commented Feb 18, 2014

BTW, to make the ode-vec implementation compatible with matrices, I suppose you don't mind if I make everything 1-based? y[0] -> y[1] in the user-provided function, I mean.

@franko
Copy link
Owner

franko commented Feb 18, 2014

Of course you can :-)

I was thinking to a different approach, keeping a column matrix that is visible to the user so that it is 1-indexed and use the .data field in the implementation. Since the .data field is a C array it will be 0-based.

@ladc
Copy link
Contributor Author

ladc commented Feb 18, 2014

But then you get a pretty inconsistent situation, no? The y[0] in the user defined function becomes y[1] if you access the column matrix.

@franko
Copy link
Owner

franko commented Feb 18, 2014

My idea was that the user always sees the matrix and nothing else but may be I'm missing something. I guess an usage schema can help to fix the ideas.

@ladc
Copy link
Contributor Author

ladc commented Feb 18, 2014

That's another possibility. I feared that this might be slower, but I'd have to benchmark it first, of course. Otherwise it would just require some simple pointer arithmetic, just use f(t,s.y-1,s.dydt-1) in the rk*-vec implementation -- but that's probably asking for trouble. Having bound checks is a better idea :-)

BTW, why is the workspace matrix for rk8pd (13 + 6) x N-dimensional? I can't figure it out.

@franko
Copy link
Owner

franko commented Feb 18, 2014

I didn't remember this detail but looking at the code I think I understood. The space is (13 + 6) x N. I store in the array several N dimensional arrays: y, dydt, ksum8, ytmp1, ytmp2, ksum7. Finally I think that "k" needs for itself a space of 13 x N. For example you have the following code:

# for S = 2, 13 do
         cblas.cblas_dcopy($(N), y, 1, ytmp, 1)
         cblas.cblas_dgemm(RowMajor, Trans, NoTrans, $(N), 1, $(S-1), h, k, $(N), B$(S), 1, 1.0, ytmp, 1)
         f(t + $(ah[S-1]) * h, ytmp, k + $((S-1) * N))
# end

where "k" is addressed up to 13 x N in pointer arithmetic, ouch!

@ladc
Copy link
Contributor Author

ladc commented Feb 18, 2014

Thanks! Hence the difference in implementation between the two integration methods.
I've pushed an example of the pointer arithmetic approach to my ode-vec-rebased branch, but feel free to reject it on the grounds of too ugly and unsafe ;-)

@franko
Copy link
Owner

franko commented Feb 24, 2014

Hi Lesley,

I'm looking of what you have done right now. I see that my beloved "damped-ho-evol.lua" does not work any more, I have to figure out why! :-)

The benchmark rk8pd-vec is also very slow, Do you think it is related to the gslcblas/openblas issue ?

I would like to have a more realistic benchmark with a big vector size since the vector form is supposed to be used only for system with many variables. I was thinking to the damped HO example but we can find also something else.

As for the pointer offset solution, this is quite smart but the problem is that you are still passing to the user a naked cdata array. This means that if the user access the vector out of the range anything unpredictable can happen including segfaults. For me this is not acceptable, the user should be protected from out of bounds accesses like what happens when you are using a matrix.

Sorry, but for the moment I don't have much time to go much beyond that first analysis. I'm not helping you very much!

@ladc
Copy link
Contributor Author

ladc commented Feb 24, 2014

Hi Francesco,

Of course, you're absolutely right, this should be exposed to the user as a matrix with proper bounds checking. In any case, that branch is far from ready... I hope to find some time to rework and finish (and test!) it soon.

As for the slow benchmark: you could be seeing the same discrepancy I saw with OpenBLAS vs gslcblas. Or alternatively it could be due to the memory issue in the snapshot of LuaJIT you're using in GSL Shell's master branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants