Use shaded dependency on JCTools instead of copy and paste #18

nitsanw · 2016-03-01T11:44:15Z

Currently this library uses some queue implementations lifted in spirit and to some extent code out of JCTools. The 2 projects share a contributor, @akarnokd.
I would recommend this project directly shades JCTools and uses the originals. Versioned dependencies allow for clear release boundaries and well understood state of code while also enabling future improvements and corrections to be fed in easily. This is why everyone does it...

smaldini · 2016-03-01T11:54:47Z

For the context we talk about https://github.com/reactor/reactive-streams-commons/blob/master/src/main/java/reactivestreams/commons/util/SpscArrayQueue.java and https://github.com/reactor/reactive-streams-commons/blob/master/src/main/java/reactivestreams/commons/util/SpscLinkedArrayQueue.java

I need to evaluate what else JCTools offer nowadays, @akarnokd any thought ?

akarnokd · 2016-03-01T12:12:10Z

Rsc is a research effort and is thought as a basis for other reactive libraries which can have their own notion of scheduling and queueing. Therefore, Rsc was designed to be parametrizable by the queue implementation through a Supplier in each relevant operator. We did this because Reactor was mainly using Disruptor as its queue-like backing data structure. Thus, if one chooses to drive Rsc with JCTools queues, he/she is free to do so.

I didn't use JCTools directly because:

Rsc isn't really meant to be a standalone library at this point, but we have fluent operators for test and benchmark convenience. I use it in comparison benchmarks because I know exactly hot it is implemented and how it should behave (unlike RxJava 1.x which has still some hidden skeletons everywhere).
We don't need more than the 2 queue types we currently have. There are some places which could benefit from a growable, array-backed MPSC but we don't have the means to detect what XpYc queues the user supplies.
First wanted to explore the non-queue related optimization possibilities the operator-fusion gives
We can afford inlining the AtomicReferenceArray whereas JCTools can't really.
Our queues are small and relatively short lived (max up to 1M elements transferred), the peculiar fill-drain pattern it receives doesn't seem to benefit from lookahead and seems to be hurt by the excess padding and more complex class hierarchy.

nitsanw · 2016-03-01T12:45:51Z

The choice is yours. Let me see if I understand the argument:

"Rsc was designed to be parametrizable by the queue implementation through a Supplier in each relevant operator. We did this because Reactor was mainly using Disruptor as its queue-like backing data structure. Thus, if one chooses to drive Rsc with JCTools queues, he/she is free to do so." : we are therefore discussing the default implementation source. In my experience most people stick with the default, so I would urge care in choosing it. That people can throw in other impls make no difference. I'm concerned with sloppy dependencies, so people making a clear dependency is not an issue.
"...I use it in comparison benchmarks because I know exactly how it is implemented and how it should behave..." - equally, picking any one version impl would work. This is the same point as "First wanted to explore the non-queue related optimization possibilities the operator-fusion gives".
"We don't need more than the 2 queue types we currently have." - neither here nor there IMO. The number of classes is immaterial, if it's good enough to copy it's probably worth depending on. I think this is an argument for least effort, it's easier to copy than to fix the build script. We are all more familiar with copy and paste than the fiddly innards of maven and gradle.
"We can afford inlining the AtomicReferenceArray whereas JCTools can't really." - I thought you weren't concerned with queue performance, just wanted a known quantity? In the context of a framework I'd argue the Atomic option is for anti-Unsafe environments, where allot of assumptions go out the window anyhow. Where Unsafe is available using the SpscGrowable will likely give better results.

I understand JCTools may not be a good fit if you want to avoid the paddind etc. But the AtomicSpsc is notionally targeting this exact usecase. Why not collaborate to provide a common solution?
This hits on another obejction to C&P dependencies, they make a bad relationship in that they block the consuming library authors from improving the origin.

akarnokd · 2016-03-01T13:16:39Z

I thought you weren't concerned with queue performance, just wanted a known quantity

With our Spsc, it's more about allocation than queue performance at the moment. SpscAtomicArrayQueue allocates 1 parent class, 2 AtomicLong, 1 AtomicReferenceArray and 1 backing array. By inlining the field updaters, Rsc's Spsc allocates 1 parent class and 1 array. Unfortunately, the linked-array spsc can only save on the 2 AtomicLong instances compared to JCTools' version. This allocation amount is dominant for short-lived reactive sequences.

Why not collaborate to provide a common solution?

I don't think you could encode all the configuration option in a single class: to pad index fields or not, pad elements or not, use lookahead or not, support exact capacity or not. I consider both Rsc and JCTools as a template where one can extract and customize them to his/her liking.

I'll allocate some time and do a comparative benchmark of the queues in our dataflow context.

nitsanw · 2016-03-01T13:24:50Z

AFU change can be implemented for JCTools, might be worth it. I'm similarly open to extending the AtomicRefArray (you objected last we discussed it, but I actually have no issue with it. If people down cast from Queue or use the other methods directly they are shooting their own foot.)

"..I don't think you could encode all the configuration option in a single class.." This is something I mean to address using bytecode generation.

akarnokd · 2016-03-01T14:30:50Z

I've run a comparison benchmark and got interesting results.

with errors:

I expected unsafe to be better, but I'm surprised how the growable and unbounded variants do even better in some cases.

Also, Rsc's spsc queue has around 5-30% more overhead; could be the index trashing or the field updater overhead.

akarnokd · 2016-03-01T15:38:02Z

It seems the main source of lost throughput was the index trashing:

The rsc column is the original plain SpscArrayQueue, the rsc1 had just item padding and the rsc2 has index padding but no item padding.

Comparing against the unsafe version:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use shaded dependency on JCTools instead of copy and paste #18

Use shaded dependency on JCTools instead of copy and paste #18

nitsanw commented Mar 1, 2016

smaldini commented Mar 1, 2016

akarnokd commented Mar 1, 2016

nitsanw commented Mar 1, 2016

akarnokd commented Mar 1, 2016

nitsanw commented Mar 1, 2016

akarnokd commented Mar 1, 2016

akarnokd commented Mar 1, 2016

Use shaded dependency on JCTools instead of copy and paste #18

Use shaded dependency on JCTools instead of copy and paste #18

Comments

nitsanw commented Mar 1, 2016

smaldini commented Mar 1, 2016

akarnokd commented Mar 1, 2016

nitsanw commented Mar 1, 2016

akarnokd commented Mar 1, 2016

nitsanw commented Mar 1, 2016

akarnokd commented Mar 1, 2016

akarnokd commented Mar 1, 2016