Performance should be measured and improved #8

kwalcock · 2022-09-26T10:44:29Z

No description provided.

kwalcock · 2022-09-26T11:04:35Z

Time to tokenize 4 sentences 10,000 times. Scala looped just 1,000 times and then was multiplied by 10. Scala used the j4rs interface which is still subject to improvement.

Language	Time	Notes
Python	57 sec	serial
Scala	1112 sec	serial
Scala	680 sec	parallel, 4 threads with sentences in parallel
Scala	126 sec	parallel, 8 threads with documents in parallel

Code for Python is here and code for Scala is here.

FYI @MihaiSurdeanu

MihaiSurdeanu · 2022-09-26T14:59:20Z

Thanks! This is incredibly bad :)
How can a thin Rust wrapper be so much slower than Python?

kwalcock · 2022-09-26T15:21:30Z

I believe this particular thin Rust wrapper serializes everything to json text and then deserializes it, and that includes converting individual ints of an array to text to Integer to int, etc. I decided to try a straight jni version. However, I think it would be useful to try out the interface already (as soon as I can publish it) while waiting for a faster version, because there is so much else downstream that needs to be tried out and might not work.

MihaiSurdeanu · 2022-09-26T15:22:17Z

Agreed on both points!

kwalcock · 2022-10-03T08:57:21Z

A straight JNI version is faster than J4rs, but it looks like the key is to use the release version rather than the debug version. In C programs the difference is usually fairly minimal, like 2x, but here for Rust, the speedup is about 16 times! It is now on par with Python. 43 ~~ 45 is within the variation of runs.

Language	Variation	Build	Threads	Time (sec)	Notes
Python	N/A	N/A	1	43
Scala	JNI	debug	1	717
Scala	JNI	debug	4	439	by sentence
Scala	JNI	debug	8	92	by document
Scala	JNI	release	1	45
Scala	JNI	release	4	46	by sentence
Scala	JNI	release	8	21	by document

MihaiSurdeanu · 2022-10-03T14:57:36Z

Awesome!!

However, the multi-threaded version is not showing the expected speedup. Do you think JNI has some syncs in there that we are not aware of?

kwalcock · 2022-10-03T17:26:01Z

For the release version above, I was still multiplying by 10 and maybe in that 2.1 seconds there wasn't enough space to make a difference or some of my processors were just busy with other things. Here are some more measurements that show a 5x speedup. The "by sentence" parallelism isn't the best test because there is one long sentence out of the four and the other threads have to wait for that to finish. The parallelism is also applied in an inner loop so that the overhead is incurred 10,000 times. The "by document" parallelism should hopefully approach the number of processors. 5 seems close enough to 8 that I don't think something I've done is getting in the way.

Language	Variation	Build	Threads	Time (sec)	Notes
Scala	JNI	release	1	36
Scala	JNI	release	1	35
Scala	JNI	release	4	31	by sentence
Scala	JNI	release	4	30	by sentence
Scala	JNI	release	8	7	by document
Scala	JNI	release	8	7	by document

MihaiSurdeanu · 2022-10-03T17:49:56Z

nice!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance should be measured and improved #8

Performance should be measured and improved #8

kwalcock commented Sep 26, 2022

kwalcock commented Sep 26, 2022 •

edited

Loading

MihaiSurdeanu commented Sep 26, 2022

kwalcock commented Sep 26, 2022

MihaiSurdeanu commented Sep 26, 2022

kwalcock commented Oct 3, 2022

MihaiSurdeanu commented Oct 3, 2022

kwalcock commented Oct 3, 2022

MihaiSurdeanu commented Oct 3, 2022

Performance should be measured and improved #8

Performance should be measured and improved #8

Comments

kwalcock commented Sep 26, 2022

kwalcock commented Sep 26, 2022 • edited Loading

MihaiSurdeanu commented Sep 26, 2022

kwalcock commented Sep 26, 2022

MihaiSurdeanu commented Sep 26, 2022

kwalcock commented Oct 3, 2022

MihaiSurdeanu commented Oct 3, 2022

kwalcock commented Oct 3, 2022

MihaiSurdeanu commented Oct 3, 2022

kwalcock commented Sep 26, 2022 •

edited

Loading