Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LuaJIT 2.1? #25

Open
pygy opened this issue Apr 13, 2014 · 12 comments
Open

LuaJIT 2.1? #25

pygy opened this issue Apr 13, 2014 · 12 comments

Comments

@pygy
Copy link

pygy commented Apr 13, 2014

Do you plan to migrate to the 2.1 branch? It is faster than v2.0.x, and AFAIK stable enough to be used in production at CloudFlare.

@franko
Copy link
Owner

franko commented Apr 14, 2014

Hi,

actually I was just waiting the first stable release of the 2.1 branch but, as you suggest, it is probably ok to migrate with the next release of GSL Shell. I've actually a lot of minor changes to include and a new release is a good thing.

Otherwise what about a new gsl shell's branch in github to integrate luajit 2.1 ?

@pygy
Copy link
Author

pygy commented Apr 14, 2014

Otherwise what about a new gsl shell's branch in github to integrate luajit 2.1 ?

Why not.

AFAICT, the parse.c and Makefile modifications work as is in 2.1.

I'll also have to send you a patch for compiling on OS X 10.8

@franko
Copy link
Owner

franko commented Apr 16, 2014

Now there is a v2.1 branch in GSL Shell's repository

https://github.com/franko/gsl-shell/tree/master-lj2.1

The merge was very easy thanks to the power of git :-) and everything seems to work just fine.

Francesco

@pygy
Copy link
Author

pygy commented Apr 16, 2014

Cool :-)

The Julia guys are about to add the LuaJIT/GSL Shell benchmarks you wrote on their home page. I'll point them to the LJ 2.1 branch.

LuaJIT v2.1 is 10 times faster than v2.0 for parseint, but a bit slower for mandel (but, in both cases, it still beats the hell out of C :-).

The pure JavaScript (V8) implementation of rand_mat_stat is faster than its GSL Shell counterpart, which relies on BLAS, as do the C, Julia and Fortran benchmars. The latter three are also faster than LuaJIT/GSL Shell. Maybe you're not using the same BLAS?

LuaJIT is ~10 times slower than C for rand_mat_mul, but faster than JS.

Check here for the results on my machine: JuliaLang/julia@9a57b99#commitcomment-5996981

Edit: note also that quicksort can be made faster by switching to a FFI array.

@franko
Copy link
Owner

franko commented Apr 17, 2014

The benchmark results looks good to me.

I agree that there are some odd things. I already noticed in past that Julia was faster in rand_mat_mul but I cannot tell the reason. The only things I can suggest is to ensure that openblas is actually used for gsl shell. For me the over speed should be given by the underlying BLAS implementation.

Otherwise I would not be too picky about this benchmark results and I'm afraid I don't have enough time to further investigate the problem.

In any case I will be glad if they include lua/gsl-shell in their benchmark page. Thank you for your help about that.

@pygy
Copy link
Author

pygy commented Apr 18, 2014

How can I set the BLAS version?

On my machine, the GSL-based rand_mat_mul is ~10% faster than a straight port of the JavaScript code to Lua:

local darray = ffi.typeof("double[?]")

local function randd(n)
    local v, r
    v = darray(n)
    r = rng.new('rand')

    for i = 0, n-1 do
        v[i] = r:get()
    end

    return v
end

-- Transpose mxn matrix.
local function mattransp(A, m, n)
    local T = darray(m * n)

    for i = 0, m - 1 do
        for j = 0, n-1 do
            T[j * m + i] = A[i * n + j]
        end
    end
    return T
end


local function matmul(A,B,m,l,n)
    local C, total
    C = darray(m*n)
    -- Transpose B to take advantage of memory locality.
    B = mattransp(B,l,n)

    for i = 0, m - 1 do
        for j = 0, n - 1 do
            total = 0

            for k = 0, l - 1 do
                total = total + A[i*l+k]*B[j*l+k]
            end

            C[i*n+j] = total
        end
    end

    return C
end


local function randmatmulLJ(n)
    local A, B
    A = randd(n*n)
    B = randd(n*n)

    return matmul(A, B, n, n, n)
end

timeit(|| randmatmul(1000), "rand_mat_mul")      --> 1129.19
timeit(|| randmatmulLJ(1000), "rand_mat_mul_LJ") --> 1255.42

BTW:

$ node perf.js
...
javascript,rand_mat_mul,2933

:-)

@franko
Copy link
Owner

franko commented Apr 22, 2014

To check the BLAS library you have to "ldd" the executable and see to which file libblas.so points to by using "ls -l ".

I'm now wondering if Julia is faster because it does transpose the matrix before the multiplication just like JS is doing. In principle I should do some tests with dgemm with and without transpose like in the JS code but unfortunately I don't have time to work on that.

@pygy
Copy link
Author

pygy commented Apr 22, 2014

There's no ldd on OS X, otool -L does the trick.

$ otool -L gsl-shell | grep blas
    /usr/local/lib/libgslcblas.0.dylib (compatibility version 1.0.0, current version 1.0.0)
$ ls -l /usr/local/lib/libgslcblas.0.dylib
lrwxr-xr-x  1 pygy  staff  42 Apr 12 23:57 /usr/local/lib/libgslcblas.0.dylib -> ../Cellar/gsl/1.16/lib/libgslcblas.0.dylib

The GSL, as installed by brew relies on the default libgslcblas. I've tried to redirect the symlink to a freshly compiled OpenBLAS, but it complains about version issues (1.0.0 required, 0.0.0 found). The same goes for the Julia BLAS.

I'm also trying to build the gsl by hand, but I don't know how to tell it to use another BLAS.

@pygy
Copy link
Author

pygy commented Apr 24, 2014

I got it to compile with OpenBLAS (by adding the proper paths and options in the GSL Shell Makefile).

randommatmul is now as fast as C/Julia :-)

It may be nice to add the possibility to customize LIBS and LDFLAGS in makeconfig.

@franko
Copy link
Owner

franko commented Apr 25, 2014

Good :-)

Actually the libraries are supposed to be configurable using the file "makepackages" but may be this is not very intuitive.

On linux "makepackages" links with any "blas" library (using GSL_LIBS) provided by the system and thus openblas is not required. It is possible to modify the default makefile to links explicitly to openblas but I'm not sure this is a good idea.

May be a warning can be shown during compile time if the gslcblas library is used since this latter is really slow.

Suggestion & patches are welcome.

@pygy
Copy link
Author

pygy commented Apr 25, 2014

makepackage is probably fine... I tend to explore code rather than read the docs (too often, there are none), and I though that makeconfig was were users were supposed to tweak things.

OS X also provides a fast BLAS, I'll look how to link to it.

@pygy
Copy link
Author

pygy commented Apr 25, 2014

I found the system BLAS, which is even faster than OpenBLAS, but I don't know if it is found at the same path for all OS X versions.

Edit: actally, adding -lBLAS to the GSL_LIBS does the trick, without adding any path to the linker (which actually makes sense).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants