-
Notifications
You must be signed in to change notification settings - Fork 986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Promoting performance on string operations #877
Comments
Please share your numbers in this thread. I have a hunch Python has a special representation of strings that consists of ascii characters only. If so, you can try the corresponding benchmark in Chez Scheme using byte strings.
|
Chez Scheme strings are stored as full 32-bit unicode code points. Much of what you are seeing is the difference needed to allocate and GC four times more memory than an implementation that uses UTF-8 encoded strings. The tradeoff is that Chez Scheme offers constant-time access by character offset. Your benchmark has any more operations that depend on the total size of the string than on finding a character offset. |
Well, I don't think full 32-bit unicode code points could explain all differences, because in my benchmark, Chez Scheme consume seconds more than 4 times comparing with other counterparts. |
Strings are immutable in both Javascript and Python, so "substring" does not need to result in a copy of the relevant portion of the string. Since the inner loop in your benchmark has two substrings in it that could be another significant fraction of the difference. (I have no idea whether substring is "almost zero" allocation in the Javascript and Python implementations; you'll have to figure that out if you want to know whether that contributes to the difference.) That said, the implementation of |
I just re-write the code by replace |
I meant changing the implementation of diff --git a/s/5_4.ss b/s/5_4.ss
index 8f5ab069..fe1a4406 100644
--- a/s/5_4.ss
+++ b/s/5_4.ss
@@ -32,10 +32,10 @@
($oops 'substring
"~s and ~s are not valid start/end indices for ~s"
m n s1))
- (let ([s2 ($make-uninitialized-string (fx- n m))])
- (do ([j 0 (fx+ j 1)] [i m (fx+ i 1)])
- ((fx= i n) s2)
- (string-set! s2 j (string-ref s1 i)))))))
+ (let* ([len (fx- n m)]
+ [s2 ($make-uninitialized-string len)])
+ (string-copy! s1 m s2 0 len)
+ s2))))
(let ()
(define do-string-append2 |
It seems reasonable. I'll have a try to see what happens to my benchmark. If it really works, I'll response in this issue. |
Sadly, it doesn't work. time scheme --script ./src/string/string.scm
real 0m2.182s
user 0m1.007s
sys 0m1.168s Origin Version: time scheme --script ./src/string/string.scm
real 0m2.023s
user 0m1.006s
sys 0m1.009s |
Interestingly I saw a noticeable improvement using jtaylor's patch. I also wrapped the call to (my-try 5000000) to have chez-scheme handle the profiling instead of using the time command. This avoids polluting the profiling data with start-up time:
|
Thank @gwatt , you're right! @jltaylor-us 's patch works! time ta6le/bin/ta6le/scheme --script ../various-program-language-benchmark/src/string/string.scm
real 0m1.432s
user 0m0.534s
sys 0m0.888s And my fault, last test I did something wrong. I'll give out a full benchmark performance report here. And @jltaylor-us , it's your smart work,will you make a pr contributing your patch? My scheme-langserver has been really bothered by string performance recently... |
Here's the results:
|
It's really interesting and useful to see a comparison of different dynamic languages. |
@ufo5260987423 It might be better to display the machine information in your benchmark results. I ran the string benchmark without @jltaylor-us 's patch, using ChezScheme Version 10.0.0:
Using
My CPU is i9-12900HK, perhaps this contributes to the small numbers? |
Unfortunately, it actually makes things worse on small strings. (On my machine, anyway) There is likely some small string size threshold for switching between the two implementations, but I think we need more data (preferably across a variety of platforms) to make sure we're not optimizing a synthetic benchmark workload at the expense of more common real-world performance. |
I'd be curious to know why the default substring implementation causes 79 collection while the string-copy! based implementation causes 2 collections. Subtracting out the time spent in collections brings their runtime approximately the same. |
How small? Maybe I can add a new benchmark. |
I'm pretty curious about that, myself. My current theory is that the implementation with the larger number of scheme "instructions" is causing the collect request handler to fire more often (approximately when it should based on |
Swift switched to UTF-8 strings while preserving amortized-constant-time access to characters by storing "breadcrumbs" to convert offsets. There is a blog post and a forum post with even-more-internal details. Here's the most relevant part of the blog post:
|
Recently I made a benchmark among javascript, scheme and python here. Actually, I'm glad that Chez Scheme achieves good performances on many items. However, it's really shocking the string operations make Chez Scheme a main hinder: among javascript/bun, javascript/nodejs, python/pypy, python/cpython,scheme/guile, Chez Scheme is the most time-consuming.
I want to know:
The text was updated successfully, but these errors were encountered: