-
Notifications
You must be signed in to change notification settings - Fork 668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Excessive memory footprint in gltf_loader #942
Comments
Does your machine natively support ASTC? One of the issues with the loader is how we decode ASTC inline which may be adding to your excessive memory usage. Some of the scenes are also 4x duplicates to simulate load on a device which doesn't help the loader. In some samples the goal is to demonstrate the performance delta when using a specific technique - hence the large inefficient scenes. The goal is not to make every sample 100% efficient in all aspects (although an efficient GLTF Loader is definitely a goal) The loader is on the chopping block soon and will likely be refactored. This includes sorting out the image loading a decoding phase (see framework/scene_graph/images/* if you are interested in those issues) Completely agree with your assessment though, it is currently not in the best state! |
See #860. ASTC is mobile only, so pretty much all desktop platforms take ages to load larger assets using ASTC compressed textures. |
I get that the ASTC conversion exacerbates memory overhead, and that this code may be a target for refactoring at some point, but I still think it would be a relatively small engineering challenge to not potentially load every image into memory while still allowing it to use threading to pre-load the assets for better performance. I'll probably try to make a small PR for this. |
@jherico apologies, I wasn't trying to dismiss improvements. Just letting you know what we know at this point More than happy for you to take a shot at it! |
My initial work did a little to improve the memory footprint, but not as much as I expected, and it drastically slowed down the load time. I did some profiling and found a lot of memory being allocated and not released in surprising places deep inside the astc library. I'm not 100% certain that this accounts for the amount of memory being consumed, but the astc code, at least the version in use by this repo, is very not thread safe, and it leaves a ton of performance on the table by not having any SIMD support. I've started working on a branch to at least update astc to a recent version, but getting it set up properly will probably be a chunk of work. It does, however, fix the threading issue and also includes support for NEON, SSE2, SSE4.1 and AVX2 instruction sets, but I would need to find a cmake friendly method of determining which one to enable for a given build environment, but the "native" ISA is always available so there's a fallback for people working in obscure environments. Alternatively, I could put some effort into refactoring the assets to convert he astc assets to basisu, and include the mips in the files so they don't have to be generated at runtime. Yet another option for improving performance would be to use a temp directory and whenever an astc resource is uncompressed, write the uncompressed version and its mips to the temp folder, so that the next time they could be loaded directly. |
We've been discussing this several times, and we should replace the astc files. basisu is the way to go, I already did a sample for it, so we could leverage the code from that. Will discuss this on our next meeting. It's probably best to have the original asset authors update them for basisu instead of converting the astc (which would probably result in a quality loss). |
The current gltf loader code uses threading to load images from disk into memory prior to pushing them to the GPU.
However, it does this by creating a thread-pool of
std::thread::hardware_concurrency()
size.Vulkan-Samples/framework/gltf_loader.cpp
Lines 545 to 564 in 216e065
The code attempts to mitigate this later on by limiting the number of staging buffers that will be created to 64 MB at a time:
Vulkan-Samples/framework/gltf_loader.cpp
Lines 568 to 584 in 216e065
but IMO the "damage" is already done. My machine peaks at over 5 GB of RAM usage at loading the large scenes used in performance examples.
While the threading might be useful for speeding up the overall load, consuming all the CPUs on the machine to do so feels excessive, and I would suggest that at least 1 CPU be left in reserve for the main thread.
Additionally, it should be possible to limit the number of images loaded concurrently by using a blocking queue or a (C++) semaphore to block the thread pool from doing more work when there pending images waiting for upload.
The text was updated successfully, but these errors were encountered: