-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NodeJS performance on alpine/arch running on v86 #1005
Comments
Some ideas:
If it's none of the above, it's probably cpu bound. Note that a 10x slowdown compared to native is expected. Profiling could help (but we really need that minimal testcase):
|
You probably need load node_modules into cache early or into tmpfs, like a suggested before. About long start node.js themselves, I have some idea (tested this only on 9pfs) to caching node binary and libs manually, like this makes on cold boot: By the way, I found that node's binary compiled without logroot@localhost:~# vmtouch -v /usr/bin/node
/usr/bin/node
[ ] 0/4
Files: 1
Directories: 0
Resident Pages: 0/4 0/16K 0%
Elapsed: 0.0654 seconds
root@localhost:~# vmtouch -v /usr/lib/libnode.so.111
/usr/lib/libnode.so.111
[ ] 0/12781
Files: 1
Directories: 0
Resident Pages: 0/12781 0/49M 0%
Elapsed: 0.10016 seconds
root@localhost:~# time node /root/hello.js
Hello, world
real 1m53.508s
user 0m39.370s
sys 0m3.309s
root@localhost:~# time node /root/hello.js
Hello, world
real 0m30.253s
user 0m25.674s
sys 0m1.164s
root@localhost:~# vmtouch -v /node
/node
[ ] 0/22540
Files: 1
Directories: 0
Resident Pages: 0/22540 0/88M 0%
Elapsed: 0.12426 seconds
root@localhost:~# time /node /root/hello.js
Hello, world
real 0m39.985s
user 0m6.341s
sys 0m2.483s
root@localhost:~# time /node /root/hello.js
Hello, world
real 0m5.739s
user 0m4.173s
sys 0m0.796s
root@localhost:~# /node -v
v20.11.1
root@localhost:~# node -v
v19.0.0 And I wrote a simple program that calculates Pi by Leibniz formula as benchmark: pitest.jsvar pi = 0, k = 1;
function picalc(n) {
for (var i = 0; i < n; i++) {
if (i % 2 == 0) {
pi += (4 / k);
} else {
pi -= (4 / k);
}
k += 2;
}
}
console.time("pi");
picalc(10000000);
console.timeEnd("pi"); Results: root@localhost:~# node /pitest.js
pi: 55.320s
root@localhost:~# node /pitest.js
pi: 17.297s
root@localhost:~# /node /pitest.js
pi: 15.025s
root@localhost:~# /node /pitest.js
pi: 13.192s And for interest, I tested on Void Linux (on an unpartitioned image), unlike Arch, it's builds with --shared flag (config from Void Linux), i. e. without separated results# cold start
[supermaxusa@void ~]$ time node hello.js
Hello, world
real 0m38.685s
user 0m15.197s
sys 0m10.290s
[supermaxusa@void ~]$ time node hello.js
Hello, world
real 0m24.024s
user 0m16.345s
sys 0m5.421s
[supermaxusa@void ~]$ node -v
v18.19.1
[supermaxusa@void ~]$ node pitest.js
pi: 21.739s
[supermaxusa@void ~]$ node pitest.js
pi: 20.642s NPM also slow too, in my case, I installed express.js from global package.json{
"name": "npm-test",
"version": "1.0.0",
"description": "",
"main": "/dev/null",
"scripts": {},
"author": "",
"license": "ISC",
"dependencies": {
"express": "*"
}
}
...but with precached files, result is some better in different cases, for example, I tried to import
|
Thank you guys for the pointers, I'll go though each and every point, do some testing and update here on the results. Thanks agan! |
@farhan-ct, for clarify, did you tried to test this on 9p root fs? Now I have thought only about I/O bounds (I think that CPU bounds don't matter yet).
You mean that needs run node with X session instead of running that in ttyS0 or tty1 sessions? |
Yes, the Linux framebuffer console emulator is pretty slow (might be relevant if npm prints progress bars). Using ttyS0 could also work. |
I ran all these tests on alpine standard linux, both on my packer version and the docker version you shared in the discussions (both in v86). However, my packer version doesn't have 9pfs at all. So, the performance issues exist for both with and without 9p versions. Also, I am having issues running the node v20.11.1 downloaded from the unofficial builds site. However, the nodejs version installed using apk add is the same version for me. Also, trying to load the node bin into memory using vmtouch also didn't help in my case. I just saw that you edited your above answer, let me go through your results once again 😃
What does 'precached files' here stand for? Is it the files that were loaded into memory when installing modules from global to project? |
@copy I am using pnpm instead, not too much output but it gave me relatively better performance in most cases.
Both the cases have the same performance issues. I am trying to prepare a well written document that includes testcases (minimal, simple imports, and a server) and benchmarks that covers the pointers that you and @SuperMaxusa provided. |
Maybe gcompat and some dependencies (
Interesting, thanks for the info! My previous tests were run on ttyS0, about progress, on xterm it's runs with progress=false a bit faster than without it (warmed): ...but results (for some reason) on framebuffer were the opposite. According to this results, I tried reduce output and logs from npm and this is helps.
This is file caching (or page cache) that reduce "repeated" access to I/O, see https://www.linuxatemyram.com/ and https://en.wikipedia.org/wiki/Page_cache. |
Did you tried to use different chunk sizes for image? #710 |
I split my image into chunks each of fixed size 1MB with a filename format from one of the windows 95/98 demos in copy.sh. While testing, I've been trying both the split version, as well as providing async: true and size which is like auto-chunking the image (by v86 itself, on the fly).
Yes I had to install gcompat for the missing loader and the node binary atleast executed after that, but when I try any node commands, even node --help, I was constantly getting the below error |
This node binary is for glibc, so it can be worked incorrectly with gcompat. Anyways, nodejs in Alpine compiles without --shared: https://git.alpinelinux.org/aports/tree/main/nodejs/APKBUILD#n200 |
If this is useful, node with mimalloc2 (gliderlabs/docker-alpine#161 (comment)) or jemalloc (https://stackoverflow.com/questions/53234410/how-to-use-node-js-with-jemalloc) is faster (in some cases) that without (tested on Alpine v3.18.6 and Node 18.19.1). localhost:~# time node hello.js
Hello, world
real 0m 10.31s
user 0m 7.15s
sys 0m 2.23s
localhost:~# LD_PRELOAD=/usr/lib/libmimalloc.so.2 time node hello.js
Hello, world
real 0m 7.81s
user 0m 6.34s
sys 0m 0.92s
localhost:~# LD_PRELOAD=/usr/lib/libjemalloc.so.2 time node hello.js
Hello, world
real 0m 7.43s
user 0m 6.06s
sys 0m 0.87s
localhost:~# time node pitest.js
pi: 13.077s
real 0m 21.50s
user 0m 16.69s
sys 0m 2.51s
localhost:~# LD_PRELOAD=/usr/lib/libjemalloc.so.2 time node pitest.js
pi: 9.576s
real 0m 16.20s
user 0m 14.06s
sys 0m 1.20s
localhost:~# LD_PRELOAD=/usr/lib/libmimalloc.so.2 time node pitest.js
pi: 9.961s
real 0m 16.19s
user 0m 14.28s
sys 0m 1.02s |
Thank you @SuperMaxusa for the clear details and tests, will try it out :D |
I've pushed a fix for slow disk IO into the wip branch (dfacd61). Please test. |
Thank you @copy Will test now |
Hi @copy Thanks for the update on the fix. I built on the wip branch and did some quick testing using the provided docker debian example and took some screenshots. I couldn't arrive at a fair conclusion after testing the changes. @SuperMaxusa I request you to take a look at this and test/confirm if you are facing similar results. All these 4 tests are performed under the same conditions |
Here is my results (browser: Chrome 124 on localhost, guest OS: Alpine Linux 3.19.1, fs: ext4 with
test commands# apk add coreutils
# hello.js is the same as in upper post
# 1. dd [write]
dd if=/dev/zero of=/testfile bs=1M count=128 status=progress
# 2. dd [write dsync]
dd if=/dev/zero of=/testfile bs=1M count=128 status=progress oflag=dsync
# 3. cat [read]
time cat /testfile > /dev/zero
# 4. dd [read]
dd if=/testfile of=/dev/zero bs=1M count=128 status=progress
# 5. start hello.js (cold start)
time node hello.js
I changed version in Dockerfile to trixie and added Some ideas:
|
To be clear, the fix mostly improves performance when files are actually downloaded from the server. So I wouldn't expect writes to be any faster, nor reads of any files that have been previously read (v86 caches downloaded files in most cases). So the cold performance looks pretty solid now (in comparison to warm). I think much of the remaining cold start overhead can be explained by jit warmup. The warm performance numbers you posted still look pretty bad, and much worse than my numbers. For example:
This is on Alpine 3.19.1 built with the docker script (I had to add |
Note: my previous test was tested on loaded from browser hard drive image, on 9pfs I not tested speed yet. I have tested now on 9pfs rootfs with Alpine Linux 3.19.1 as you said (+ COOP/COEP headers).
It's a caching by browser or filestrorage's cache? (sorry if incorrect question) v86/src/browser/filestorage.js Lines 15 to 21 in c69cbd7
|
Hmm, 9pfs data is not cached currently, my previous statement was incorrect. However, files are only re-downloaded if the 9pfs closes the file and it hasn't been written to. Let's try to narrow down why node runs ~20x slower for you:
|
Thanks, previously I had added profile in src/browser/main.js and uses for my tests. Almost is same, atleast I've also in kernel params "
I had now only test on i3 with 6 GB of ram (only running browser and HTTP server) on Windows, and this can be a main problem. However. test with qemu-tcg (singlethreaded,
Tried with wip branch (34ebae1) and new alpine container, for some reasons, I get around 11 seconds of warm start node.js on Firefox (release and Developer Edition) and around 18 seconds on Chrome (can't record properly with performance profiler):
No have idea why this happened, because I use same conditions for tests (just a server with COOP and browser with clean profile without extensions). |
I think that explains most of the performance problems. You could try settings Line 67 in c69cbd7
v86 isn't as fast as qemu here, but there are three layers of jit warmups going on (js->x86, x86->wasm, wasm->hostcpu). I don't think there is any single optimisation that will be an "easy win" here. The following could net a few percent:
@farhan-ct I'd suggest playing around with the alpine image and the various suggestions. Disk performance should be fine now |
Will do @copy Thank you for sharing lots of info that I can learn :D I've been using debian trixie until now and it's fine so far. Except, I am creating states after a nodejs cold run and saving them without clearing caches. State size is big but works nice. I will try this updated alpine too with the optimizations and suggestions provided. Maybe out of context but here are a few observations:
|
Seems that v8 uses SSE2 (v8/v8@3fa6100), SSE3 (can be turn on/off on runtime with popcnt), FPU (not sure how often this is used, at least required for all CPU arches: https://forum.openwrt.org/t/building-node-js-for-ramips/183164, https://github.com/v8/v8/blob/main/test/unittests/base/cpu-unittest.cc#L64) and SSSE3 (#1038).
With |
I did it with -22 (zstd --ultra -22 state.bin), and for some reason, it seems that the loaded cache is being lost after compression. Example: I boot for the 1st time and run some node command, and create a state without clearing vm caches and stuff. The state will obviously be big and when I boot using this intial_state, node is fast. But if I compress using zstd and then boot with initial_state, node is slower again. |
Can you try to compare
Node.JS 22.0 (or 21.x with |
This worked, thank you! Will update about the |
From the latest tests, v86 takes about one second to run a warm nodejs hello world on a low-end machine, which looks reasonable to me considering the constraints. I'll gently close this issue, but feel free to create a new issue if you have another testcase where v86 is significantly slower than qemu-tcg. Also, if you have any tuning suggestions than improve fs9p performance on Linux, please send a PR to update examples/alpine.html or docs/archlinux.md.
Just a note on this. |
Hi,
I've been trying to run some nodejs programs and projects on v86. So far, it involved using npm/pnpm to install packages from online & offline (npm cache). Trying to run simple http and express server programs, etc.
Below are some of my experiences.
(ignore boot times, I am using a saved booted state to load)
I know all this sounds very vague and perhaps unrelated to v86 but I want to dig deeper, explore and perhaps fix/improve things if I can.
@copy @SuperMaxusa
I request you to kindly point me where I should start looking and studying to improve v86 performance, atleast for nodejs temporarily (though it's understandable that it's running on a web browser).
Thank you!
The text was updated successfully, but these errors were encountered: