Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clearing chunk cache takes ages #246

Open
madmaxoft opened this issue Mar 24, 2021 · 13 comments
Open

Clearing chunk cache takes ages #246

madmaxoft opened this issue Mar 24, 2021 · 13 comments

Comments

@madmaxoft
Copy link
Contributor

As of commit 4ff2053, clearing the chunk cache (i. e. refreshing or reloading the map) takes an awful lot of time - long minutes (!) even in a Release build. It seems that most of the time is spent deleting the per-section palettes, mostly the properties member.

Tested on a self-compiled Win64 binary built with Qt 5.15.1.

@EtlamGit
Copy link
Collaborator

I experienced this behavior for a much longer time. But only for Debug builds!

The mentioned commit only removes Ubuntu 16.04 support from the build chain. (No code change.)
So the problem is older and as mentioned probably in the Chunk destructor.

@madmaxoft
Copy link
Contributor Author

madmaxoft commented Jun 14, 2021

I did some digging around and finally found out the culprit is not Minutor itself, but the QtCreator's debugger. When running under the debugger, the QMap and QString operations take about 5 - 60 times longer than when running in a standalone executable. For me, this makes even Release builds barely usable under QtCreator, reloading after displaying just a screenful of a world takes several minutes.

I even made a simple test project to measure the QMap<QString> performance against std::map<QString>, std::map<std::string>, QMap<std::string>, std::undordered_map<QString> and std::unordered_map<std::string>. The test fills 50k instances of map with 30 items (9-char key, 9-char value) each. The results below are from a MSVC 2019 Release build on Win8.1, standalone executable:

QMap-QString             : Filling  the containers took 686 msec | Clearing took 220 msec
StdMap-QString           : Filling  the containers took 507 msec | Clearing took 211 msec
StdMap-StdString         : Filling  the containers took 373 msec | Clearing took 186 msec
QMap-StdString           : Filling  the containers took 493 msec | Clearing took 139 msec
StdUnorderedMap-QString  : Filling  the containers took 447 msec | Clearing took 190 msec
StdUnorderedMap-StdString: Filling  the containers took 336 msec | Clearing took 147 msec

I assume the biggest difference between std::string and QString is that the latter is in UTF-16, so there's some conversion overhead when creating the strings, as well as comparison going through OS's wide character handling routines DLL call instead of direct in-code comparison. So if Minutor was to change to a std::unordered_map<std::string, QVariant> storage for the per-chunk-section block palette, it would run twice as fast.

For comparison, the very same executable, when run under QtCreator's debugger, produces these results:

QMap-QString             : Filling  the containers took 3072 msec | Clearing took 13467 msec
StdMap-QString           : Filling  the containers took 38562 msec |Clearing took 6462 msec
StdMap-StdString         : Filling  the containers took 21542 msec |Clearing took 5902 msec
QMap-StdString           : Filling  the containers took 11225 msec | Clearing took 9416 msec
StdUnorderedMap-QString  : Filling  the containers took 18860 mse c| Clearing took 8279 msec
StdUnorderedMap-StdString: Filling  the containers took 22438 msec | Clearing took 7316 msec

Here if we compare the std::unordered_map<std::string, QVariant> to the current code, it is actually much slower on the "load" side and not that much faster on the "unload" part.

I think this issue can be closed, with "won't fix, have workaround".

@mrkite
Copy link
Owner

mrkite commented Jun 14, 2021

Is that still the case under Qt6?

@EtlamGit
Copy link
Collaborator

EtlamGit commented Jun 14, 2021

In case of Palette entries, we know that Mojang will always use english names. Therefore these strings can always be represented with 7bit ASCII (or UTF-8 to have some margin). So we definitely do not need UTF-16 support there.

I will think about it during August . . .
Could you share your test project, so that we can repeat the measurement on different platforms?

@madmaxoft
Copy link
Contributor Author

I haven't tested this with anything other than Qt 5.15.1 on Windows. I expect it to be Windows-specific and QtCreator-specific (rather than Qt-specific).

My thinking was that Minutor could instead use a global palette (number -> complete block identification) and have each chunk section provide a number -> number map into it. Then we could even use std::string_view while loading the palette, since the palette data is only needed while loading and it's already in the NBT's memory. This could provide even more speedup. But that's the long-term future, I suppose :)

Here's the test project, feel free to experiment with it and let me know if you find anything interesting:
https://github.com/madmaxoft/StringMapSpeedTest
Note that it's a poor quality code, jsut as proof-of-concept, so don't judge me :)

@EtlamGit
Copy link
Collaborator

? We are exactly doing that already ?

There is ONE global definition with all Blocks having a number (hash of ID name).
While loading the Section specific Palette, the strings are translated into numbers and stored in an array for each possible Block position (16x16x16 numbers per Section).
The very first flatting approach was using strings to access the global palette. But that was the first thing I changed for performance gain.

@madmaxoft
Copy link
Contributor Author

madmaxoft commented Jun 14, 2021

I don't think that's what Minutor does, at least in the 1.13+ branch of the loading code. The section's local palette seems to stay allocated until the entire Chunk is deleted:
Loading: https://github.com/mrkite/minutor/blob/master/chunk.cpp#L276
Freeing: https://github.com/mrkite/minutor/blob/master/chunk.cpp#L28-L34

@EtlamGit
Copy link
Collaborator

EtlamGit commented Jun 14, 2021

Maybe we mean the same, but use different words?

Per Section the Palette with string IDs is loaded and converted into a HID (hashed ID). These HIDs are later used to get Block information from the global BlockIdentifier. That is what I mean with "number -> BlockInfo"

Probably the string representation of the Palette stays (unused) in memory and some more ugly stuff.

When we need BlockInfo for rendering, we access it by numbers.
When we generate the ToolTip in the status bar, we use the string representation.

@EtlamGit
Copy link
Collaborator

EtlamGit commented Oct 20, 2021

Here are my results (I put both values of a test into one line for better comparison):

[Release Build]
QMap-QString                : Filling took 457 msec | Clearing  took 208 msec
StdMap-QString              : Filling took 383 msec | Clearing  took 225 msec
StdMap-StdString            : Filling took 286 msec | Clearing  took 168 msec
QMap-StdString              : Filling took 341 msec | Clearing  took 144 msec
StdUnorderedMap-StdString   : Filling took 262 msec | Clearing  took 150 msec

[Debug Build]
QMap-QString                : Filling took 4808 msec | Clearing  took 1004 msec
StdMap-QString              : Filling took 5449 msec | Clearing  took 1108 msec
StdMap-StdString            : Filling took 4243 msec | Clearing  took 1184 msec
QMap-StdString              : Filling took 3877 msec | Clearing  took 1101 msec
StdUnorderedMap-StdString   : Filling took 3840 msec | Clearing  took 1160 msec

I do not have relevant difference if I start the executable from a console window or from QtCreator.

AMD Ryzen 5 3600, Windows 10, QtCreator 4.12.3, Qt 5.12.10, MSVC 2019
It seems, that std::whatever is slightly faster.
I would be interested in more results form other platforms, compilers and CPU architectures.

@mrkite
Copy link
Owner

mrkite commented Oct 20, 2021

Access times too quick to accurately measure?

@EtlamGit
Copy link
Collaborator

Yes, the values are different each run. But not that bad to draw a general conclusion.

Before changing our data types I would do more benchmarking.
This benchmark is very limited, as it uses only 30 items. We have to repeat for each location with more reasonable numbers.

std::map is claimed to be O( log(N) )
std::unordered_map is claimed to be O( 1 )

Therefore the unordered_map should always be faster when the map is large enough.

@madmaxoft
Copy link
Contributor Author

I think another point to consider is character encoding - QString uses UTF-16 internally, while std::string uses whatever we push into it, so it can have the native NBT encoding (UTF-8?) So std::string should use less memory and be faster to read since there's no encoding conversion.

@EtlamGit
Copy link
Collaborator

I'm already convinced to change data types that hold NBT data to std::string.
But also there, I plan some benchmarking (after 1.18 updates are working).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants