Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiling applications and assets to a single binary #10

Open
alexwhitman opened this issue Oct 19, 2014 · 27 comments
Open

Compiling applications and assets to a single binary #10

alexwhitman opened this issue Oct 19, 2014 · 27 comments

Comments

@alexwhitman
Copy link

One of the talked about benefits of golang is that it compiles to a single binary. Projects such as nexe and jxcore allow compiling to a single binary but nexe isn't actively maintained and I've never made jxcore work consistently.

require and fs functions would need to be changed to know about packaged files and assets and I'm sure there's some other things that would need consideration.

What are your thoughts about being able to do this in a maintained and supported manner?

@rvagg
Copy link

rvagg commented Oct 19, 2014

/cc @bmeck

@bmeck
Copy link

bmeck commented Oct 19, 2014

@alexwhitman we have work underway w/ some oversight into the api and commitment from @trevnorris to help see this through. https://gist.github.com/bmeck/0deeefd070224c10566f

We will never emulate a virtual file system, and most bundling applications (Windows EXE format, Bundler, Jar files, etc.) take the same stance on emulating file systems. A presentation will be done at Empire Node with more information.

An old binary supporting this behavior is at http://bmeck.github.io/ , but the spec has changed since then.

@junosuarez
Copy link

I know @creationix had an idea around this a few years ago

@sindresorhus
Copy link

Related: electron/electron#251

@Mickael-van-der-Beek
Copy link

If you have asynchronous require calls, you could almost use something like r.js or browserify to build a minified and optimised single file.

I say almost because of course native libraries or modules that have to be compiled or use C/C++ bindings are still an issue.

@bmeck
Copy link

bmeck commented Oct 20, 2014

@Mickael-van-der-Beek As stated in the gist linked, shared libraries will need to be dumped to disk in order to be loaded. Only Solaris supports loading a shared library from memory (Mac OS used to but deprecated the API). That is how jar files handle shared libraries in industry.

@sindresorhus I think the only thing that will be slightly different from the thoughts in that issue are that we implemented a resources API rather than making a new protocol.

@martinies
Copy link

I can't say I'm expert on this subject but we had lots of cases recently. At least 4 out of 10 tasks we have from customers mostly to show design, proof of concept, and those require to run on 'closed systems' during the presentations / POC etc. Node is a great way to combine things in-together but we had lots of troubles on client systems with packages, file system related differences.Before node, we were using PHP and things were even worse. I believe, there are many others like me would enjoy a packaging/compiling feature with a virtual file system (something similar to jxcore's)

@bmeck why no for virtual file system? eventually it worked for jxcore (after several months of struggling though but it works smooth for most cases now) The only problem is .node files. If the package has a .node file, we just write it to the disk on initial run from the virtual file system. Thankfully we don't use native modules regularly.

Some of our team members don't have cross platform development experience (difference in paths etc.), virtual file system helps us now to ease deployments especially when the customer test or server environment is unknown.

+1

@Mickael-van-der-Beek
Copy link

If a virtual file systems is an option, would Docker + Boot2Docker be a solution ?

@bmeck
Copy link

bmeck commented Oct 20, 2014

@martinies because filesystems work differently on different environments and bundled assets are read only. A bunch of things will have edge cases and randomly break, we tried that at the beginning of the year but too many problems with people needing to run stat() and expecting file system settings like case insensitivity to make it worth the problem.

Take a look at the formats that are also in the same mindset:

  1. Windows EXEs ( http://msdn.microsoft.com/en-us/library/windows/desktop/ms648042(v=vs.85).aspx )
  2. Java ( http://docs.oracle.com/javase/7/docs/api/java/lang/Class.html#getResource(java.lang.String) )
  3. Apple ( https://developer.apple.com/library/mac/documentation/Cocoa/Reference/Foundation/Classes/NSBundle_Class/ )
  4. Ruby Gems ( extract it http://docs.seattlerb.org/rubygems/Gem.html#method-c-datadir )
  5. Perl ( http://search.cpan.org/dist/PAR/lib/PAR/Tutorial.pod#Accessing_packed_files )

If you want to mount a virtual file system via fuse or using self extracting executables that would be fine, but static in memory assets are not files and treating them as such leads to misunderstandings and leaky abstractions: see ( http://www.py2exe.org/index.cgi/WorkingWithVariousPackagesAndModules ). However, when we tried doing self extracting executables there were serious cleanup problems ( same as noted in electron/electron#251 )

We want the cases where those break to be obvious (so developers can learn to diagnose the actual problem), even if it means tweaking code a little.

@Mickael-van-der-Beek docker is a bit heavy for distribution and still has some problems like how would you run 2 node apps in the same container.

tl;dr We don't want to lie; lying makes fewer but much harder to deal w/ errors.

@bnoordhuis
Copy link
Member

Only Solaris supports loading a shared library from memory (Mac OS used to but deprecated the API).

Linux does too, indirectly. You start a thread, the thread creates a pipe or socket pair, then the parent thread calls dlopen("/proc/self/fd/$fd") where $fd is the read end of the pipe. Should even work in secure ld.so mode.

@kkoopa
Copy link

kkoopa commented Oct 20, 2014

Windoze supports it too. Nothing says you have to use the operating system's LoadLibrary routine. Just replicate it. This is how every other PE-"protector" works.

@bmeck
Copy link

bmeck commented Oct 20, 2014

@bnoordhuis had not thought of that, would require some work for the setup and teardown of the pair.
@kkoopa technically we could implement our own library loader yes, but that is painful.

If the experience is the exact same everywhere either would work. Current implementation extracts to disk since that seems the normal solution across the board.

@kkoopa
Copy link

kkoopa commented Oct 20, 2014

It's quite straight-forward actually. Anyway, why reinvent the wheel: https://github.com/fancycode/MemoryModule

@creationix
Copy link

I've been implementing the exact same feature for luvit recently. It's pretty trivial to append a filesystem to the main binary using zip format since most exe formats ignore extra data at the end and zip format ignores data at the beginning (hence how self-extracting zips work)

I've been fighting the dlopen issue. If it's so easy to write code to load a module from memory, then why do jars write out to disk as was mentioned above?

My new work is https://github.com/luvit/luvi

@creationix
Copy link

Also, responding to the original post. I would not recommend patching fs to load from the vfs. It's quite a different beast and should have a different API I think. I do recommend patching require to look there in certain cases. For mine, that is for bootstrapping a single-binary app and for relative requires from files already in the vfs.

@alexwhitman
Copy link
Author

Good to see that this has kicked off some discussion.

@creationix The reason I mentioned fs is so that, ideally, files that make up the application can be loaded when both packed and unpacked. For example, I might want to load a template file for rendering. During development I'd want to load that from the regular file system but when deployed I'd want it loaded from the binary. That way I wouldn't have to build the binary for each small change during development.

@bmeck
Copy link

bmeck commented Oct 20, 2014

@creationix from what I can tell it is that JAR file conventions rely on acting the same in many environments, and implementing code loaders do not get 100% parity with OS dlopen(). We were using archives as namespaces which is how require() would work inside of them. If you wanted to grab files inside of the archive we use a read only createResourceReadStream(path, opts) -> stream / readResource(path, opt, cb). I would be interested in talking things over with you during a hangout if you have any problems because our implementation seems to work fine.

@alexwhitman in the spec and implementation posted we do allow loading from disk or inside an archive via a concept called resources. These are present in most application programming environments with archive files. The important thing to note is that the concept of resources is not tied to archives themselves. If we emulate fs ever, we are encouraging that can of worms in other situations, like mounting a remote FTP where all the Sync functions would suddenly not make any sense. Is there any problem with using a more abstract API without references to stat() and inodes etc.?

@trevnorris
Copy link

Re: Virtualized File System

Not going to happen. Has far too large a footprint on core code, too many unknowns to deal with and an overall general PITA. I think @indutny properly stated it on IRC:

<indutny> oh god
<indutny> no no no

@zcbenz
Copy link

zcbenz commented Oct 21, 2014

Hi, I want to share my experience on implementing app packaging in atom-shell, which I hope would be helpful for Node.

Introduction

The app packaging in atom-shell works by modifying node's fs module (and others like child_process) to recognize asar archives, and treat /path/to/*.asar as a directory. In general, it is a virtual filesystem compatible with current Node's APIs.

Examples of uses:

require('./test.asar/main.js');
fs.readFileSync('./test.asar/REAME.md');
fs.readdirSync('./test.asar');
child_process.fork('./test.asar/task.js');

Archive format

The packages use asar archives, which is a custom archive format targets for fast random access. I didn't use Zip because it is both over complicated for our case and lack of some core features we need. I had listed our requirements and comparisons of different archive types here, and the conclusion was developing our own archive type was a better choice.

Pros and cons

I don't know the number of users using the app packaging of atom-shell, it works for most current node apps without modifying one line of source code. Though there are still some limitations, and I had listed them in atom-shell's wiki.

Implementation

If you have skimmed the structure of asar archive format, you should know that the implementation would be very simple. You can find out how Node's APIs are overloaded in asar.coffee, which only has 300 lines, and the native asar format parsing code can be found in archive.cc, which is 250 lines.

Single binary

The asar format doesn't support being concatenated to binaries like Zip, but it would be quite easy to do by putting the size of asar archive at the end of the archive file.

Thoughts for node

I don't think adding new APIs for packaging is a solution. If fs can not read a file in archive with current APIs, nearly all modules that read from filesystem would break when used in archives.

Image a user who wants to compile an existing Express app into a single binary, he would has tons of code (including his own and third party modules) to change to make it work.

@bmeck
Copy link

bmeck commented Oct 21, 2014

@zcbenz Zip archive's support symbolic links through the external file attributes. The fast lookup is generally only a minor problem since we do cache the central directory in memory.

@indutny
Copy link

indutny commented Oct 21, 2014

I think technically it could be possible to combine two ELF files (or two Mach files) into one single file. Making it load all symbols and relocate all data.

@pmuellr
Copy link

pmuellr commented Oct 21, 2014

Another kinda wacky thing, for "single binary" deployments, would be to
create and use snapshots. Briefly alluded to here:
https://developers.google.com/v8/embed#contexts . Anyone use these in
practice?

This is just for code; non-code resources (data files) would need a
traditional archive story.

We built something like this many years ago for IBM's J9 Java VM. Compiled
Java .class files to an optimized, quicker-to-load format that the VM could
consume. It worked, provided some interesting value in some situations,
but in the end was a little too complex for most people to easily use. It
only potentially improves the runtime startup, and often at the expense of
a larger disk footprint for the archive. Our target was mobile devices,
which is where you get the most bang for the buck with this kind of story.

We had a zip-based archive story that included the "snapshot" with a
well-known file name, and resources (aligned on 4-byte boundaries), so in
the end you had a single file. Kinda similar to Android's APK story.

Would be an interesting area to explore, but obviously not a critical path
item. :-)

On Tue, Oct 21, 2014 at 9:06 AM, Fedor Indutny [email protected]
wrote:

I think technically it could be possible to combine two ELF files (or two
Mach files) into one single file. Making it load all symbols and relocate
all data.


Reply to this email directly or view it on GitHub
#10 (comment)
.

Patrick Mueller
http://muellerware.org

@bmeck
Copy link

bmeck commented Oct 27, 2014

@pmuellr I would have concerns about vendor lock in if we do use snapshotting, loading the code would be faster but would not be portable.

@pmuellr
Copy link

pmuellr commented Oct 27, 2014

ya, there's lots to worry about if you want to do this; platform
specificity is of course a concern as well. Not so sure about "vendor"
lock-in, but certainly "version" lock-in is another real concern.

You'll notice I didn't really paint this as a happy outcome of the work we
did in Java. :-) The tooling for this kind of thing tends to be ...
complicated. I don't believe we expose this functionality or tooling in
the product anymore.

Also true that as processor speeds increase, the benefits of this kind of
approach decrease.

Still, snapshots do seem to be an interesting idea, and perhaps there's
other benefits to them like "obfuscated code" that would be of value to
someone. I like to keep weird things like this in the mix of wacky ideas -
sometimes you find these things useful in unexpected ways.

On Sun, Oct 26, 2014 at 6:12 PM, Bradley Meck [email protected]
wrote:

@pmuellr https://github.com/pmuellr I would have concerns about vendor
lock in if we do use snapshotting, loading the code would be faster but
would not be portable.


Reply to this email directly or view it on GitHub
#10 (comment)
.

Patrick Mueller
http://muellerware.org

@bnoordhuis
Copy link
Member

Another kinda wacky thing, for "single binary" deployments, would be to create and use snapshots. Briefly alluded to here: https://developers.google.com/v8/embed#contexts . Anyone use these in practice?

node-webkit does, I think. There is a provision for it in V8's mksnapshot tool: you can make it load extra code with the (aptly named) --extra_code <filename> switch.

It's not very amenable to general purpose code, however. For example, snapshots with "foreign" objects don't work; that means you can't use buffers or handles and those are rather pervasive in node.

@bmeck
Copy link

bmeck commented Nov 5, 2014

@imlucas
Copy link

imlucas commented Nov 13, 2014

I also took a stab at this recently that might be helpful for other folks interested.

method

Using the löve/node-webkit self-extracting zip "single binary" approach, which is basically these 10 steps.

mksnapshot

The node-webkit docs @pmuellr mentioned detail all of the weird/gnarly tradeoffs you have to deal with.

auto-update

For python, esky has really elegant way of handling this. Some prototypes for this that are promising, but real need hasn't come up (deployment scripts just wget from github releases and stomp the local copy).

status

In production and "good enough for me" so haven't fiddled in a while. windows binary add-ons are back and forth. There are real-live tests though if anyone has interest.

conclusions

  • lot's of hard, thankless problems around compat
  • windows build box availability :( hopefully this will change as mapbox/node-pre-gyp and node-forward/build progress though
  • Python has taken a lot of stabs at this (py2exe, esky, rumps, etc) that should be throughly researched
  • scripting and infrastructure work no one wants to tackle
  • it's really fun to email apps/scripts to customers and just skip the whole "now you install node and there's this thing called npm" dance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests