-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Version bump, MRJar, and use virtual threads on Java 21+ #15
base: master
Are you sure you want to change the base?
Version bump, MRJar, and use virtual threads on Java 21+ #15
Conversation
Bump minimum Java version to 17 (required due to Spring Boot 3) Switch from adoptopenjdk to eclipse temurin (temurin is the new adoptopenjdk) Bump major version (required due to changing minimum java version)
Use virtual threads when running on Java 21+. Bump minimum JDK version for building to JDK21 (required due to virtual threads)
1dfebba
to
4299f9b
Compare
I ran a couple real-world test scenarios to see what the impacts are of virtual threads vs platform threads. tl;dr versionUsers who are most likely to benefit from switching to virtual threads are those who run winfoom on the same host as other resource-heavy workloads. On a typical developer PC/laptop you (probably) won't see huge gains on traffic running through the proxy (virtual threads don't magically make your network run faster, and network I/O is generally the limiting factor here). What you will (probably) see is more efficient resource utilization in winfoom itself (especially memory), meaning less contention with other stuff running on the host (i.e. building and testing software). Although the overall consumption in these tests wasn't huge, the improved memory utilization is nothing to sneeze at - platform threads consumed more than 3x the amount of memory vs virtual threads. This memory isn't released once the load drops off, or even after pushing the stop button in the gui. Once the JVM decides to expand the heap, it will hold on to that memory and only release it back to the operating system after a full GC (which the JVM tries very hard to avoid). So you're arguably better off if winfoom doesn't need to ask for that memory in the first place. Test environmentI tested this on my developer workstation (Windows 10, 8 cores, 16GB RAM) on a home WiFi network, using a residential broadband service (the last mile runs on 30-40 year old copper phone line) while connected through an IPSec VPN with an upstream forward proxy. Suffice it to say there are a number of hops/points of failure involved, the result of which is that some of the steps in the test scenarios below had to be repeated multiple times due to I/O errors (especially when lots of stuff was trying to hit the proxy at the same time). I've seen similar issues with CNTLM as well as winfoom 4.0.3, so at this point I'm convinced those issues all lie somewhere upstream (i.e. between my computer and the servers I'm hitting in these tests). My guess is that the sudden spikes in traffic are triggering DoS prevention somewhere upstream which kills the connection, but I'm not in a position to be able to validate that with evidence. TestsThe test case was a GitLab CI pipeline using a local runner in Docker, with WinFoom running on the host. The CI pipeline is for a terraform provider I've been working on. There's a bit of small stuff going on early on, but the real meaty stuff comes in during the acceptance test stage (10 jobs running Test 1Test 1 was performed with winfoom running on temurin 21 (therefore using virtual threads). I got pulled away from my desk for a couple hours towards the end of the test run. The upshot of that is that jconsole was also able to capture idle performance. Heap memory remained under 40Mb for the entire test run. Platform threads peaked at 33 threads. Maximum CPU usage was around 6-7%. non-heap memory was highest at the beginning of the test, coming in slightly above 100Mb. By about 15 minutes into the test it had dropped to around 90Mb, then inched back up to around 95Mb over the next 30 minutes. It starts to drop back down about 2 hours later (which I'm assuming is when the CI pipeline finished), finally settling around 92Mb after about half an hour (give or take). The JVM started with around 30 threads. Things got a bit up-and-down over the next half hour, but then about 30 minutes later it settled on 33 threads, where it remained (this was also the maximum during the first 30 minutes). Although not captured in this screenshot, I saw a number of Total process uptime was 4 hours 40 minutes. CPU time was 8 minutes. Committed memory at the end of the test run was 47,104 kbytes, of which 36,540 kbytes was heap memory. Test 2For test 2, I relaunched the same build of winfoom on temurin 17 (by simply changing Heap memory hovered somewhere between 40-50Mb up through the bulk of the Heap memory on this test run was worth looking into in a bit more detail. I've divided the screenshot into 4 sections:
What we can see here is that heap remained somewhere below 50Mb until some time during the I was hoping the max heap would drop back down during the idle period, but this didn't really happen. I pushed the "stop" button in the winfoom gui to see what effect this would have. Heap utilization still climbed, albeit slower than before pushing the stop button. I then pushed the non-heap memory stayed around 100Mb for most of the run. Interestingly it climbs slowly even during the idle period. I think I took this screenshot before pushing the stop button so it doesn't really show what non-heap memory looks like after stopping winfoom or a full gc. I'm not too concerned about it though. Most likely the fact it's a bit higher overall than the first test is due to the fact that there were more platform threads this time around (and more threads = more stack space = more non-heap memory) The JVM started around 30 or 40 threads, but the number of threads varied more during this test vs the first one (this test was using platform threads). It's pretty easy to infer from this screenshot when CI jobs were running and when they weren't (it looks like I had to retry failed Interestingly, there's a big spike in the number of threads at the beginning of the Total process uptime was 3 hours, 42 minutes. CPU time was 5 minutes. Committed memory at the end of the test run was 38,912 kbytes, of which 27,650 kbytes was heap memory (however I had to trigger a full gc in order to get it down to this point) Conclusions/CaveatsI only went with a single run of each scenario (platform threads on jdk17 vs virtual threads on jdk21), so while I wouldn't go publishing a Ph.D. thesis just yet, the results do look promising. And it kinda makes sense - virtual threads are meant to improve scalability in situations where you have a lot of threads that spend the majority of their time blocked on I/O or waiting for a concurrency lock to be released (NB: the official guidance on Java 21 is to eschew For a proper "apples to apples" comparison, I'd probably have edited the jar file to remove the java21 implementation of I deliberately chose not to use ZGC for these tests because it's still relatively new and I thought there could be some changes to it between 17 and 21, but G1's been around for ages now so I doubt much has changed in it between recent JDK releases. I am kinda curious to see the performance characteristics of ZGC, but I think it'd be better to look at that under a separate PR. My network bandwidth was definitely a limiting factor here, and I'd be interested to see what the numbers look like under a proper load test in an environment with better network bandwidth (in an on-prem datacentre, public cloud, etc... really anything with a big fat pipe to the outside world). A couple things surprised me here. The first was the big spike in the number of threads when the The other thing that surprised me was the difference in memory usage between platform threads and virtual threads. More threads = more stack space, which should equate to more non-heap memory usage, but overall there wasn't a huge difference in non-heap memory usage between these two test runs, nor was there a significant change in non-heap usage when the big spike in platform threads happened. Even more surprising is the fact that platform threads resulted in more than 4x heap memory usage vs virtual threads, even though virtual threads store their "stack" on heap. I'm kinda scratching my head on this one tbh and wondering if it's just a discrepancy due to having a sample size of 1 for each test execution. Regardless, the conclusions I draw from this are:
|
update: I've been running this build on graalvm21 for a little over a week now without issue. I've been playing around with GC settings over the past couple days. My observations thusfar:
If need be, I'm happy to update the PR to remove the GC settings from the launch scripts, but I'm happy to leave things as-is for now. |
Can we please already merge this commit... ;) |
Trying to build your branch, i get this error: [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project winfoom: Compilation failure |
hey @ArchonMegalon apologies for the delayed response. I did run into an issue related to unrelated defaults for What JDK are you trying to build the PR from? I think the last build I did of this was temurin 21 but I'm not sure of the exact revision. |
Summary of changes:
Breaking Changes:
Comments:
I've tested this locally on temurin 17, temurin 21, and graalvm 21 (simply changing
JAVA_HOME
when launching the gui), and verified it's working by connecting to the app using jconsole. On Java 17, I see threads namedpool-x-thread-y
. On Java 21, I seeForkJoinPool
threads instead. In both cases, some basic smoke testing (i.e.curl https://google.com
) is working through winfoom.Unfortunately there's been a bit more refactoring required to support the MRJar than I'd like (i.e. introducing a new package,
org.kpax.winfoom.proxy.concurrent
). There was a lot of trial and error with the multi-release JAR before I figured out that spring boot repackaging doesn't play nicely with importing classes from dependencies in a multi-release jar. Hence why there are no lombok annotations, logging, etc in the new package (I would really have liked to use an slf4j logger here to say whether we are using a thread pool or virtual threads, but alas... it was not meant to be)