Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flicking problem #334

Closed
caxieyou opened this issue Jun 28, 2023 · 11 comments · Fixed by #526
Closed

Flicking problem #334

caxieyou opened this issue Jun 28, 2023 · 11 comments · Fixed by #526

Comments

@caxieyou
Copy link

caxieyou commented Jun 28, 2023

I have test a svg file, which is not that big, not as big as the CIA map case.

When I loaded the file, and zoom in, I find that the screen is flicking, some small parts not rendering correctly.

Is this because of float precision problem?

Quit sure there is no clipping in this file

test

@llsansun
Copy link

It is found that the config.n_drawobj in coarse.wgsl has a certain relationship. When it exceeds 65535, will there be conflicts between the data before 65535 and the data after 65535 due to the synchronization of the working group, resulting in problems with the graphics display. Is there a better solution to the 65535 drawing limit?hope it can be resolved.

@raphlinus
Copy link
Contributor

Yes, a limit of 64k draw objects is a known problem, and has a straightforward solution. This issue can serve as the tracking bug for that. Thanks for the analysis!

@caxieyou
Copy link
Author

can you tell which bug or issue link is there so we can know when it's fixed or has any progress?
thanks a lot

@caxieyou
Copy link
Author

is this bug been fixed?

@raphlinus
Copy link
Contributor

Not yet. The stroke rework is taking a lot longer than expected, though there is progress. This will be a high priority after that, and is also one of the items tracked in #302.

@DorianRudolph
Copy link

DorianRudolph commented Dec 23, 2023

Is this the same issue?

warning: flashing images

Screen.Recording.2023-12-23.at.17.22.19.mov

@raphlinus
Copy link
Contributor

No, that issue is caused by overflow of internal buffers (related to #366), which is in turn provoked by not culling lines and tiles that land outside the viewport. We do plan to work on all that.

@raphlinus
Copy link
Contributor

I plan on addressing the 64k draw object problem shortly. There are three approaches that can be taken.

One is to conditionally apply a 3-level dispatch when the (workgroup size)^2 limit is crossed. This is what's done with pathtags, and I find it ugly. Among other things, it requires more permutations of shaders to be compiled, and there's also some complex conditional logic for which shaders to dispatch. I do have a local patch which is almost done, so it is perhaps the path of least resistance.

The second approach is inspired by a technique I saw in FidelityFX sort, and is implemented in my recent sorting exploration. In that approach, each workgroup iterates over num_blocks_per_wg blocks, where each block is the amount of data currently handled by a single workgroup (256 draw objects). In that way, the size of the sequence is not inherently bounded by workgroup sizes.

A drawback to the latter approach is that it may limit the amount of addressable parallelism. Doing a quick calculation, for very large inputs it will dispatch 64k threads, regardless of the size of the input. That is more threads than directly supported by any existing hardware (RTX 4090 has 16k), though it may limit opportunities for latency hiding.

An advantage to the latter approach is that it's two fewer dispatches.

As a future potential optimization, we may want to have more permutations (specialization by pipeline override) to (a) allow larger workgroups when the hardware supports it (the WebGPU spec only requires 256, which informs the choices we've made), and (b) support iteration over multiple elements per thread. The former is probably the best way to improve opportunities to exploit parallelism on powerful GPUs (1M threads should be plenty for at least a while) and has no real downside other than wiring up the plumbing. The latter is more of a tradeoff, as it improves bandwidth for large problems but limits parallelism for small ones. To switch between the two adaptively requires potentially compiling both variants (affecting cold-start time including shader compilation) and of course the complexity of the logic.

The third approach is to go back to single pass scan techniques, as was done in piet-gpu. We now know how to do this in WebGPU (see Zulip thread) but the performance implications are mixed; in particular it would be a performance regression on Apple Silicon.

I'm most inclined to go with the second approach, as I think it's the best set of tradeoffs and admits additional optimization that would address the biggest shortcoming. I'll start on a PR, and if that goes well, probably apply the same technique to path tags.

raphlinus added a commit that referenced this issue Mar 19, 2024
Previously there was a limit of workgroup size squared for the number of draw objects, which is 64k in practice. This PR makes each workgroup iterate multiple blocks if that limit is exceeded, borrowing a technique from FidelityFX sort.

WIP, this causes hangs on mac. Uploading to test on other hardware.

Also contains some changes for testing that may not want to be committed as is.

Fixes #334
raphlinus added a commit that referenced this issue Mar 19, 2024
Previously there was a limit of workgroup size squared for the number of draw objects, which is 64k in practice. This PR makes each workgroup iterate multiple blocks if that limit is exceeded, borrowing a technique from FidelityFX sort.

WIP, this causes hangs on mac. Uploading to test on other hardware.

Also contains some changes for testing that may not want to be committed as is.

Fixes #334
github-merge-queue bot pushed a commit that referenced this issue Mar 20, 2024
* Allow large numbers of draw objects

Previously there was a limit of workgroup size squared for the number of draw objects, which is 64k in practice. This PR makes each workgroup iterate multiple blocks if that limit is exceeded, borrowing a technique from FidelityFX sort.

WIP, this causes hangs on mac. Uploading to test on other hardware.

Also contains some changes for testing that may not want to be committed as is.

Fixes #334

* Add missing barrier

Add barrier for write-after-read hazard in coarse. The loop in question processes 64k draw objects at a time, so the barrier only gets invoked when that limit is exceeded.

Also move new test scene so it isn't the first.

* Address review comments

Set resolution in params for test scene. Add comments explaining division of work.
@NyxAlexandra
Copy link
Contributor

should the readme be changed after this was closed?

@DJMcNab
Copy link
Member

DJMcNab commented Apr 3, 2024

Thanks for the reminder!

We intend to go through the list of issues in the README before publishing version 0.2.0, but a PR to remove the outdated items now would be welcome

@NyxAlexandra
Copy link
Contributor

See #543

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants