-
Notifications
You must be signed in to change notification settings - Fork 0
Linking
Emscripten has support for static linking, using emlink.py. emlink takes two compiled codebases and generates a combined codebase. This is very similar to static linking in general, but different in some ways, because we are linking JavaScript here, and specifically asm.js modules of code. A general overview of using emlink is as follows:
- Compile one codebase using
-s MAIN_MODULE=1, this is the "main module" (see below). - Compile another codebase using
-s SIDE_MODULE=1, this is the "side module". - Link them using
emlink.py main.js side.js output.js
While this example talks about two modules being linked, it is possible to link several. The result of linking a main module with a side module is a main module, which can then be linked with another side module.
It is important to note that static linking of JS using emlink generates suboptimal results. The best results will always be achieved when building all the code into one big bitcode file and compiling that as a whole into JavaScript, because that allows whole-program optimizations that can be very important. Static linking can be useful during development, however: if you just modify a small part of your project and want to rebuild it, linking the changed part (that you just compiled again) with the rest of the code (that is unchanged) will be far faster than rebuilding the whole thing. For example, building Bullet from bitcode to JS takes over 10 seconds, but linking Bullet statically takes less than half a second.
Code that is intended to be linked is called a "linkable module". There are two kinds, main modules and side modules. We can only link a main module with a side module, and no other combination. The output of linking is another main module (which can then be linked with another side module and so forth).
Main modules are code that is runnable. If it still has missing symbols (that should be linked in later), then it will fail when it tries to use them, of course, but otherwise it is usable. Main modules do not have any special relocation information embedded in them, they are very similar to normal Emscripten-generated code. The main difference is that some optimizations are disabled when building them, for example full dead-code elimination (which could remove things the other code to be linked would need) and function name minification (which would prevent linking from identifying which functions to link to what).
Side modules are code that is only intended to be linked to a main module, it cannot be run by itself. It contains relocation information, which allows us to place its globals and function pointers into the proper places during linking. Side modules disable linking of standard libraries (libc, libc++, etc.), they expect those to be present in the main module they will be linked with. Finally, side modules, like main modules, disable some optimizations that enable linking to work.
A good use case for static linking is a large codebase that you are working on, but only modifying a small number of files, and want to rapidly iterate and not wait for entire builds. A recommended workflow for that is as follows:
- Put the bulk of the project, that is not changing all the time, into the main module. That means compiling the bitcode for those files into JS using
-s MAIN_MODULE=1 -o main.js. This generates a JS file that you will later link against. - Put the rest of the project, the small part you are compiling a lot, into the side module. That means compiling the bitcode for its files into JS using
-s SIDE_MODULE=1 -o side.js. - Every time you do an iteration after changing some of those files, you rebuild the side module sources into a new build of the side module, as just described.
- Run
emlink.py main.js side.js all.js, which links the modules and generates all.js. - Use all.js in the same way as you would use a full rebuild of the whole project.
- When, less frequently, you want to see a fully-optimized build of minimal size, build all the code together into one big bitcode file and compile that into JS (not as a main module or a side module).
Note that we could reverse the roles of the main and side modules in the above workflow, and it would still work. However, it is a good idea to make the main module the one that changes less, since as mentioned above the standard libraries are linked in and compiled to JS in that one.
Note also that the main() function can be either in the main module or the side module, don't be confused by the term "main module" (it is the "main" module in that the other will be relocated "against" it, and that the system libs are in it.).
- GL emulation errors: The main module includes all the JS library code (and the side module includes none of it); this approach makes linking of JS library code trivial, and the downside of code size is not that big in a large project anyhow. However, it does mean we include GL emulation code, which can confuse some types of GL-using code. It is recommended to build the main module with
-s DISABLE_GL_EMULATION=1unless you specifically know that you need emulation of older GL features. - System libs not being included: Side modules do not link in standard libraries like libc and libc++, as mentioned above. As a consequence, if for example you use libc++ in the side module but not in the main module, neither will link in libc++ and the linked code will fail. To get around this, you can build the main module with
EMCC_FORCE_STDLIBS=1to force inclusion of all standard libs; a more refined approach is to build the side module with-vin order to see which system libs are actually needed - look forincluding lib[...]messages - and building the main module with something likeEMCC_FORCE_STDLIBS=libcxxabi(if you need libcxxabi). Note that you only need the first library mentioned, as each depends on the ones after it so they will be auto-included anyhow.
- Minification of function names is not done on linkable modules (neither main modules nor side modules), which increases code size. We could in principle do minification after linking of all modules, but this is not implemented yet.
- The static linker links asm.js modules, and does not have all the rich metadata available to a normal linker. As a consequence, we duplicate some code and globals that a more optimal linker could coalesce. As mentioned above, the only way to get the best results is to build all the code together and not do static linking of JS.
This allows caching of libraries on the client, and linking them with code that is sent from the server that updates less frequently (and the result is then cached too).
We can almost do this using the current static linking code. But we would need to convert a little python code to JS, figure out how to do reasonable dead code elimination despite linking, and would need to consider what to do with minification.
dlopen, dlsym, etc. This is needed for things like Python loading modules are runtime, etc.
We can do something similar to the old approach we had, that is still in the codebase (but deprecated - see old_shared_libs for where it still works). Basically, make shared libs slower in that they have named globals, relocation offsets in each function pointer use.
Perhaps instead of named globals, we could move some of the linker responsibility to the runtime - output the indexed globals structure and rewrite the module after patching the offsets at load.
We would need to add logic to call a function pointer from another asm.js module, through trampolines.