Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix performance issue in CoroutineDescendantAxis #609

Open
JohannesLichtenberger opened this issue May 10, 2023 · 7 comments
Open

Fix performance issue in CoroutineDescendantAxis #609

JohannesLichtenberger opened this issue May 10, 2023 · 7 comments

Comments

@JohannesLichtenberger
Copy link
Member

https://github.com/sirixdb/sirix/blob/master/bundles/sirix-kotlin-api/src/main/kotlin/org/sirix/axis/CoroutineDescendantAxis.kt

It should read mainly from the page cache (BufferManagerImpl => caches) / in the test, actually always, but I assume it'll make I/Os all the time. That's why it's more than 100x slower than the standard "serial" DescendantAxis.

The idea of the axis is to prefetch right sibling nodes in parallel in a second read-only trx.

So we'll have to debug and profile...

@JohannesLichtenberger
Copy link
Member Author

It seems in the Producer there's a right sibling key stack, which is an issue with the proposed algorithm (I thought maybe we could process all right siblings in parallel (perhaps we can also take the descendantCount of each inner node into account and only fetch in parallel if the count is beyond a threshold.

@Kiriakos1998
Copy link

Hello, @JohannesLichtenberger, just realized that perhaps it was not clear that I looked into this issue eventually. So I specifically added a timestamp in the testAxisConventions in AbsAxisTest.
iteration -> 1001400
iteration -> 4000400
iteration -> 4000400
iteration -> 87000500
iteration -> 87000500
iteration -> 97001000
iteration -> 97001000
iteration -> 97001000
iteration -> 98001800
iteration -> 98001800
Here is the result in Nanos per iteration. Perhaps the way to go is to see which methods are specifically involved in these iterations. Also, it will be interesting to see which line of code inside the while loop is creating these delays.

@JohannesLichtenberger
Copy link
Member Author

JohannesLichtenberger commented May 31, 2023

I think the implementation itself is not good. First of all the producer seems to explicitly block until it's finished and only afterwards the next right sibling of a node during preorder is traversed to send the nodes to the axis. Second, AFAICS in the producer itself the right sibling key stack is used instead of a kind of fork join approach.

I'm also not sure if thread synchronization itself might be an issue as well as starting of new transactions...

Maybe there's also a better approach for parallelization of preorder traversal with the DOM like pointer based encoding (firstChild/lastChild/rightSibling/leftSibling/parent) of every node.

The other issue with the slow IO using based backend might also be interesting...

@xDido
Copy link

xDido commented Sep 13, 2024

Still open ?

@JohannesLichtenberger
Copy link
Member Author

JohannesLichtenberger commented Sep 13, 2024

Yes, but it's completely broken. It should check for each right sibling if it's on the same page or another one. If it's in a different page, it should fetch this page in parallel and traverse this subtree already.

@JohannesLichtenberger
Copy link
Member Author

@xDido further clarification needed?

@xDido
Copy link

xDido commented Oct 16, 2024

@JohannesLichtenberger I'm not working on the issue at the moment. I tried to read the code multiple times, but I'm just a bit busy with Midterm exams. I will probably start looking again on the 27th of October.
If I want any clarification on the code I will ask you on Discord to keep this issue clean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

3 participants