Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version bump, MRJar, and use virtual threads on Java 21+ #15

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

sworisbreathing
Copy link

@sworisbreathing sworisbreathing commented Jan 23, 2024

Summary of changes:

  • Upgrade from Spring Boot 2 to Spring Boot 3
  • Upgrade all dependencies to latest
  • Switch from adoptopenjdk to temurin (temurin is the new adoptium which was the new adoptopenjdk)
  • Use alpine 21 for docker image
  • Build a multi-release Jar
  • Use virtual threads when running on Java 21 or higher

Breaking Changes:

  • The minimum Java version required to run winfoom is increased from Java 11 to Java 17 (due to Spring Boot 3 upgrade)
  • The minimum Java version required to build winfoom is increased from Java 11 to Java 21 (due to virtual thread support)

Comments:

I've tested this locally on temurin 17, temurin 21, and graalvm 21 (simply changing JAVA_HOME when launching the gui), and verified it's working by connecting to the app using jconsole. On Java 17, I see threads named pool-x-thread-y. On Java 21, I see ForkJoinPool threads instead. In both cases, some basic smoke testing (i.e. curl https://google.com) is working through winfoom.

Unfortunately there's been a bit more refactoring required to support the MRJar than I'd like (i.e. introducing a new package, org.kpax.winfoom.proxy.concurrent). There was a lot of trial and error with the multi-release JAR before I figured out that spring boot repackaging doesn't play nicely with importing classes from dependencies in a multi-release jar. Hence why there are no lombok annotations, logging, etc in the new package (I would really have liked to use an slf4j logger here to say whether we are using a thread pool or virtual threads, but alas... it was not meant to be)

Bump minimum Java version to 17 (required due to Spring Boot 3)
Switch from adoptopenjdk to eclipse temurin (temurin is the new adoptopenjdk)
Bump major version (required due to changing minimum java version)
Use virtual threads when running on Java 21+.
Bump minimum JDK version for building to JDK21 (required due to virtual threads)
@sworisbreathing
Copy link
Author

I ran a couple real-world test scenarios to see what the impacts are of virtual threads vs platform threads.

tl;dr version

Users who are most likely to benefit from switching to virtual threads are those who run winfoom on the same host as other resource-heavy workloads. On a typical developer PC/laptop you (probably) won't see huge gains on traffic running through the proxy (virtual threads don't magically make your network run faster, and network I/O is generally the limiting factor here). What you will (probably) see is more efficient resource utilization in winfoom itself (especially memory), meaning less contention with other stuff running on the host (i.e. building and testing software).

Although the overall consumption in these tests wasn't huge, the improved memory utilization is nothing to sneeze at - platform threads consumed more than 3x the amount of memory vs virtual threads. This memory isn't released once the load drops off, or even after pushing the stop button in the gui. Once the JVM decides to expand the heap, it will hold on to that memory and only release it back to the operating system after a full GC (which the JVM tries very hard to avoid). So you're arguably better off if winfoom doesn't need to ask for that memory in the first place.

Test environment

I tested this on my developer workstation (Windows 10, 8 cores, 16GB RAM) on a home WiFi network, using a residential broadband service (the last mile runs on 30-40 year old copper phone line) while connected through an IPSec VPN with an upstream forward proxy.

Suffice it to say there are a number of hops/points of failure involved, the result of which is that some of the steps in the test scenarios below had to be repeated multiple times due to I/O errors (especially when lots of stuff was trying to hit the proxy at the same time). I've seen similar issues with CNTLM as well as winfoom 4.0.3, so at this point I'm convinced those issues all lie somewhere upstream (i.e. between my computer and the servers I'm hitting in these tests). My guess is that the sudden spikes in traffic are triggering DoS prevention somewhere upstream which kills the connection, but I'm not in a position to be able to validate that with evidence.

Tests

The test case was a GitLab CI pipeline using a local runner in Docker, with WinFoom running on the host.

The CI pipeline is for a terraform provider I've been working on. There's a bit of small stuff going on early on, but the real meaty stuff comes in during the acceptance test stage (10 jobs running go test in parallel, and having to resolve all their dependencies through winfoom at the start of each job, as well as hitting the target system during the acceptance tests). Once all 10 test jobs have passed, the build stage kicks in with 15 parallel go build jobs (also resolving their dependencies at the start of each job). There's no local caching of the dependencies, so they have to be downloaded through winfoom every time.

Test 1

Test 1 was performed with winfoom running on temurin 21 (therefore using virtual threads). I got pulled away from my desk for a couple hours towards the end of the test run. The upshot of that is that jconsole was also able to capture idle performance.

jconsole Overview tab with time range All

Heap memory remained under 40Mb for the entire test run. Platform threads peaked at 33 threads. Maximum CPU usage was around 6-7%.

jconsole Memory tab showing non-heap memory usage with time range all

non-heap memory was highest at the beginning of the test, coming in slightly above 100Mb. By about 15 minutes into the test it had dropped to around 90Mb, then inched back up to around 95Mb over the next 30 minutes. It starts to drop back down about 2 hours later (which I'm assuming is when the CI pipeline finished), finally settling around 92Mb after about half an hour (give or take).

jconsole Threads tab showing time range All

The JVM started with around 30 threads. Things got a bit up-and-down over the next half hour, but then about 30 minutes later it settled on 33 threads, where it remained (this was also the maximum during the first 30 minutes). Although not captured in this screenshot, I saw a number of ForkJoinPool worker threads instead of pool-x-thread-y threads. I did not see any threads named virtual-thread-x.

jconsole VM Summary tab

Total process uptime was 4 hours 40 minutes. CPU time was 8 minutes. Committed memory at the end of the test run was 47,104 kbytes, of which 36,540 kbytes was heap memory.

Test 2

For test 2, I relaunched the same build of winfoom on temurin 17 (by simply changing JAVA_HOME) and re-ran the same gitlab ci pipeline. Since I'd captured idle time at the end of test 1, I also captured idle time at the end of test 2. Unlike test 1, I pushed the stop button in the winfoom gui during the idle time, as well as the "Perform GC" button in jconsole a few minutes later (more on this below).

jconsole Overview tab showing time range All

Heap memory hovered somewhere between 40-50Mb up through the bulk of the go test stage, then jumped up to a max of 120Mb (more on this below). Platform threads jumped up a few times to around 80 or 90 during the go test stage, then had a huge spike at the beginning of the go build stage. CPU utilization peaked at around 10% briefly during the first 30 minutes. Maximum CPU usage beyond that was around 4%.

jconsole Memory tab showing heap memory usage with time range All

Heap memory on this test run was worth looking into in a bit more detail. I've divided the screenshot into 4 sections:

  1. the 10 parallel go test jobs (plus the stuff before that, which doesn't really amount to much)
  2. the 15 parallel go build jobs
  3. idle time between the end of the CI pipeline and the manual, full GC. I was hoping to see memory released back to the OS when everything was idle, but that didn't happen. So I pushed the stop button to see what that would do. The heap cycle slowed down a bit, but it still looked like it was going to climb back up to where it had been before. I got impatient here and decided to trigger a GC through jconsole to see what would happen
  4. after the manual GC (presumably this was a full gc... I haven't looked into the jdk sources). This finally shrunk the heap down and released the unused memory back to the operating system. After this, we still have the sawtooth pattern that's typical of a healthy java application, but the peak is much smaller (NB: it actually shrunk down a bit more after a while, and a little bit more memory was released back to the OS, but it wasn't really worth capturing another screenshot)

What we can see here is that heap remained somewhere below 50Mb until some time during the go test stage, when the JVM suddenly decided it needed to grow the heap to somewhere between 100-125Mb. If you've ever done JVM memory profiling, you'll notice the familiar sawtooth pattern, where heap memory steadily climbs until a gc kicks in and drops it back down suddenly.

I was hoping the max heap would drop back down during the idle period, but this didn't really happen. I pushed the "stop" button in the winfoom gui to see what effect this would have. Heap utilization still climbed, albeit slower than before pushing the stop button. I then pushed the Perform GC button in jconsole and that did drop the max heap memory down (and as shown a bit further down below, also released the memory back to the OS)

jconsole Memory tab showing non-heap memory usage with time range All

non-heap memory stayed around 100Mb for most of the run. Interestingly it climbs slowly even during the idle period. I think I took this screenshot before pushing the stop button so it doesn't really show what non-heap memory looks like after stopping winfoom or a full gc. I'm not too concerned about it though. Most likely the fact it's a bit higher overall than the first test is due to the fact that there were more platform threads this time around (and more threads = more stack space = more non-heap memory)

jconsole Threads tab showing time range All

The JVM started around 30 or 40 threads, but the number of threads varied more during this test vs the first one (this test was using platform threads). It's pretty easy to infer from this screenshot when CI jobs were running and when they weren't (it looks like I had to retry failed go test jobs 3 or 4 times). Maximum number of threads during the go test phase was just shy of 100.

Interestingly, there's a big spike in the number of threads at the beginning of the go build stage, peaking at 186 threads. It very quickly dropped down to a more reasonable level, though.

jconsole VM Summary tab

Total process uptime was 3 hours, 42 minutes. CPU time was 5 minutes. Committed memory at the end of the test run was 38,912 kbytes, of which 27,650 kbytes was heap memory (however I had to trigger a full gc in order to get it down to this point)

Conclusions/Caveats

I only went with a single run of each scenario (platform threads on jdk17 vs virtual threads on jdk21), so while I wouldn't go publishing a Ph.D. thesis just yet, the results do look promising. And it kinda makes sense - virtual threads are meant to improve scalability in situations where you have a lot of threads that spend the majority of their time blocked on I/O or waiting for a concurrency lock to be released (NB: the official guidance on Java 21 is to eschew synchronized blocks in favor of locks/mutexes/etc from java.util.concurrent when using virtual threads... they might eventually get around to fixing up synchronized blocks in a future release)

For a proper "apples to apples" comparison, I'd probably have edited the jar file to remove the java21 implementation of ExecutorServiceFactory and then done a test run using platform threads on jdk21, but I'd be really surprised if there was a significant difference. I can't imagine there've been many significant changes to how platform threads work on java17 vs java21, G1GC, etc.

I deliberately chose not to use ZGC for these tests because it's still relatively new and I thought there could be some changes to it between 17 and 21, but G1's been around for ages now so I doubt much has changed in it between recent JDK releases. I am kinda curious to see the performance characteristics of ZGC, but I think it'd be better to look at that under a separate PR.

My network bandwidth was definitely a limiting factor here, and I'd be interested to see what the numbers look like under a proper load test in an environment with better network bandwidth (in an on-prem datacentre, public cloud, etc... really anything with a big fat pipe to the outside world).

A couple things surprised me here. The first was the big spike in the number of threads when the go build jobs kicked off using platform threads. I'm not really sure why the number of threads went up so much more than during the go test run. Granted, there were 15 builds vs 10 tests, but the builds themselves would have been hitting the proxy less. (maybe this was just a timing issue?)

The other thing that surprised me was the difference in memory usage between platform threads and virtual threads. More threads = more stack space, which should equate to more non-heap memory usage, but overall there wasn't a huge difference in non-heap memory usage between these two test runs, nor was there a significant change in non-heap usage when the big spike in platform threads happened.

Even more surprising is the fact that platform threads resulted in more than 4x heap memory usage vs virtual threads, even though virtual threads store their "stack" on heap. I'm kinda scratching my head on this one tbh and wondering if it's just a discrepancy due to having a sample size of 1 for each test execution.

Regardless, the conclusions I draw from this are:

  1. In the worst case, you won't see any noticeable difference when running winfoom using virtual threads vs platform threads.
  2. In the best case, using virtual threads in winfoom will result in less resource utilization (and therefore contention on host resources) than platform threads

@sworisbreathing
Copy link
Author

sworisbreathing commented Feb 1, 2024

update: I've been running this build on graalvm21 for a little over a week now without issue.

I've been playing around with GC settings over the past couple days. My observations thusfar:

  • ZGC is a bit of a memory hog. It starts off with around ~250M heap size, and kept growing the heap even when no traffic was going through winfoom. Its default behavior is supposed to be to release unused memory to the OS after about 5 minutes but I can't say I really saw much of that. Suffice it to say, I won't be raising a PR to switch to it. I don't really think it's appropriate as a default GC for running locally on a developer workstation.
  • CMS was removed I think in Java 14 so I can't test that on this build
  • removing all GC settings from the JVM flags actually gives pretty good results. G1 is the default collector on modern JVMs and the heap size (and overall process memory) actually turns out smaller by removing the other GC tuning settings and just letting the JVM sort itself out. There wasn't really any noticeable difference in performance (though I haven't been smashing it with multiple CI jobs).

If need be, I'm happy to update the PR to remove the GC settings from the launch scripts, but I'm happy to leave things as-is for now.

@ArchonMegalon
Copy link

Can we please already merge this commit... ;)

@ArchonMegalon
Copy link

Trying to build your branch, i get this error:

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project winfoom: Compilation failure
[ERROR] /d:/Source/Repos/Winfoom/src/main/java/org/kpax/winfoom/proxy/ProxyExecutorService.java:[33,8] types java.util.concurrent.ExecutorService and org.kpax.winfoom.proxy.listener.StopListener are incompatible;
[ERROR] class org.kpax.winfoom.proxy.ProxyExecutorService inherits unrelated defaults for close() from types java.util.concurrent.ExecutorService and org.kpax.winfoom.proxy.listener.StopListener

@sworisbreathing
Copy link
Author

hey @ArchonMegalon apologies for the delayed response. I did run into an issue related to unrelated defaults for close(), but the fix is part of the PR already so i'm not sure why you're seeing that.

What JDK are you trying to build the PR from? I think the last build I did of this was temurin 21 but I'm not sure of the exact revision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants