-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hacking/helping #12
Comments
Also interested. |
Thanks guys. I'm pretty sure the instructions are not up to date anymore. I have another branch for GHC 7.6 which I'll merge tomorrow and I'll take another look at the instructions. Biggest issue ATM is that you need a custom-built GHC 7.0 (for current master) or GHC 7.6 which is compiled with the |
I just tried to build the ghc-7.6 branch and got the following error after following the build instructions: vm/memorymanager.hh:10:10: fatal error: 'tr1/unordered_set' file not found
#include <tr1/unordered_set> So the JIT is pure c/c++? Curiosity: is that better than Haskell for this or are there other benefits/constraints (e.g. maybe most of this come from the LuaJit impl)? I also saw some movement in the sources from GHC Core to STG. I am curious about the motivation. Also, how does the compilation work? It generates byte code which would be the equivalent to the .o/.a that is currently generated, correct? What about packaging and linking: do you have specific thoughts there? Would the byte code be dynamically linked too? |
I'm not using many C++ features. I mostly felt that it would make certain abstractions nicer. I'm using C++'s unordered sets simply because I didn't want to spend time on implementing my own data structures. It looks like the ./configure stuff doesn't figure out the right version on your system. Try changing A lot of the code generation stuff is copied or adapted from LuaJIT, but I chose some more verbose names for readability. All that could use some clean-up and documentation. The switch from Core to STG was unfortunately necessary. In GHC 7.0, Core didn't allow unboxed tuples as function arguments, only as result. Max Bolingbroke lifted that restriction, but he added a magic pass that turns functions with unboxed tuples as arguments into functions taking multiple arguments. That pass is untyped and therefore only works with STG. That said, I think the STG->Bytecode bit is actually simpler. Before I was using CorePrep, and STG is really just CorePrep without types. I used CorePrep, because I thought I might need the type information, but actually dealing correctly with types was actually very messy, so I think STG is the right level. The bytecode compiler generates |
Thanks for the pointers! I will try to get past this compilation issue (I am on Mavericks XCode/clang 5.0 and gcc 4.2 installed using brew - it was able to compile ghc 7.6 from sources) and see how far I get. |
I didn't have time to follow up on this so far (hope to have soon). In any case, wondering if you saw this: https://github.com/franko/luajit-lang-toolkit And was wondering if your project could benefit from it somehow. |
Yes, I saw that. But it didn't exist (or at least I didn't know about it) when I started Lambdachine (2010, I think) and there's also the issues that LuaJIT is optimised for Lua's semantics (Mike Pall re-iterated that a few times in a number of places on the web). Since Haskell's semantics and runtime behaviour is quite different, this wouldn't work too well once you want to go past the prototype stage. |
I would be extremely interesting in hacking on the interpreter. I've implemented my own high performance interpreter using the threaded code method (like LuaJIT), but looking at lambdachine in its current state there's quite a bit of expensive code in the instruction dispatch. Maybe you'd welcome some help slimming down the interpreter? Edit: Now I'm a little confused. BCi.S in the top level isn't being used, is it? The actual interpreter is in rts/inter* ? |
The actual interpreter is in Most of porting the interpreter to assembly should be straightforward except for handling overapplication and partial applications, the current handling of this is very complicated. I'm not particularly happy with the current design of the That said, except for |
Is it really necessary to break out into assembler? I was able to get On Wed, Mar 5, 2014 at 11:53 PM, Thomas Schilling
|
Also, some psuedo for a crude bump pointer PAP thing which I hacked 1 branch per partial application, but makes a lot of assumptions. It On Wed, Mar 5, 2014 at 11:53 PM, Thomas Schilling
|
I may have invented something new by accident which could potentially drastically increase the performance of even LuaJITs interpreter by taking the dispatch onto the CPU. I'll give you more info as it comes. The general idea is to produce a large buffer of opcode handler addresses ahead of time. By pointing rsp to this buffer, with each handler ending in a This may be superior to a call-splat approach (context threading) as it doesn't require another decoder window for the ie
|
What happens if an interrupt occurs? The CPU will push the values of the registers onto the stack which will overwrite your opcode chain. It was a nice idea though. :) LuaJIT's interpreter may not be as fast as context threading (though it probably is most of the time), but it's very flexible. You can change the semantics of an instruction just by swapping out the pointer to the code section that implements it. This is great for switching between normal mode and trace recording mode. Mike Pall wrote it in assembly to optimise for the CPU pipeline as much as possible. |
Is it even allowed to do that? I would imagine that would break all kinds of stack optimizations including fancy stuff done for exception handling which I've noticed gets emitted from msvc on negative sp slots. Oh well :( |
OK, not 100% sure. It might be that it is guaranteed to switch to the thread's kernel stack first and then pushes stuff there. So, in that case it would mainly confuse GDB and Valgrind. |
I'll ask StackOverflow because I want to implement this myself if not. |
Good news! I think it's OK. See https://stackoverflow.com/questions/22211227/can-i-use-rsp-as-a-general-purpose-register/22212576#22212576 ! :) |
I made a prototype implementation and benchmarked this about a month ago. The CPU actually has a return address cache which drastically decreases performance if you pull it out of alignment (by changing This is actually documented in the AMD developer manuals. Oh well. |
I see that @kvanberendonck has fixed the tr1 issue with the build: thanks! The build now compiles all the way until it hits the need for integer-simple, which is my fault since I am using a stock GHC 7.8.3. I will fix that. Build aside, I see a recurring pattern here: copy&paste and hacking base libraries to make it suitable for lambdachine. This also happens in Haste and GHCJS and seems to be an anti-pattern caused by GHC's base in need of a refactoring (and there is an effort for it: split-base). In an ideal world, I would expect lambdachine to simply re-use most of the base libraries as-is and plug a few that are lambdachine specific. What were the main reasons for reimplementing some of the base packages? EDIT: btw, would it make sense to create an IRC channel for a more interactive discussion? |
I modified base because I don't support some language features, yet. Specifically, calling C functions, which is obviously used a lot throughout the I registered |
May be worth your time talking to erikd (m3ga) because he ported GMP to
|
Well, the goal is to be able to build it with an out-of-the-box GHC, so you don't have to build a custom GHC version. I agree, that it could be interesting to see how the JIT could make this fast in the common case, but the goal right now is to make it easier for contributors and to make it more useful out of the box by supporting more |
@nominolo I see re: language features not supported. I think any specific requirements and needs from lambdachine should feed back into the split-base effort. As you said, it would be great if this could be used with an out-of-the-box GHC. I was also thinking to what extent lambdachine could also be the foundation for a next generation GHCi. One that is much faster than the current one and so avoid compilations entirely for faster development cycles. |
I am creating a ticket since there doesn't seem to be any other way to interact about this project (what about a google groups?).
I find this an extremely interesting project for the Haskell ecosystem and would like to help (to be honest I will not be able to contribute at first to the inner workings of the Tracing JIT but I am sure there are other things around that I can help).
Is there an high level overview of the architecture. Here, I am looking for some practical information such as what are the GHC / GHC API touch points (present and planned), what are the existing patches on top of GHC and what are the artifacts of the build process, etc.
In addition, any pointers on how you go about debugging the code would also be helpful.
UPDATE: Sorry, just saw instructions on how to build.
The text was updated successfully, but these errors were encountered: