Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading optimizations #269

Open
wants to merge 10 commits into
base: dev
Choose a base branch
from
Open

Loading optimizations #269

wants to merge 10 commits into from

Conversation

gotmachine
Copy link
Contributor

@gotmachine gotmachine commented Oct 16, 2024

Note that this PR depends on / include #272

Various loading performance focused changes, part of the general loading re-implementation :

  • Added performance metrics logging upon reaching the main menu
  • Added (almost) entirely custom MU model parser, roughly 3 times the throughput of the stock parser (~300 MB/s on my machine). Will only really benefit to people having fast NVME drives with good random read performance.
  • KSPCF now maintains dictionaries of loaded models and texture assets by their url/name, and patch the stock GameDatabase.GetModel* / GameDatabase.GetTexture* method to use them instead of doing a linear search.
    This was especially bad with models, as the method would compare the requested string to the GameObject.name property for every model in the database.
    Unfortunately, we have no way to know if additional assets are loaded by custom means, so we have to fallback to a linear search when the requested texture isn't known and we can't assume that a texture that isn't found at some point won't be found in the future, so in cases where a not loaded / absent texture is queried, there is no performance improvement.
    Overall, this benefit to many scenarios, during initial loading for model parsing and part compilation, and more marginally for scenes switches when various initialization paths are re-acquiring a texture reference.

Other changes, also loading related :

  • As a part of the MinorPerfTweaks patch, patched the FlightGlobals.fetch property to not fallback to a FindObjectOfType() call when the FlightGlobals._fetch field is null, which is always the case during loading. In a stock + BDB test case, this alone was about 10% of the total loading time, 7+ seconds. The call doesn't seem necessary as FlightGlobals._fetch is set/unset from FlightGlobals.Awake() / FlightGlobals.OnDestroy(). I guess there are some very edge cases where a (just about to be destroyed) instance could be acquired by a call to FindObjectOfType(), but in any case I would qualify such behavior as a bug. On a side note, the same issue is present with PhysicsGlobals, but I haven't been able to detect any benefits in patching that one.
  • New patch PartParsingPerf
    • Slightly faster part icon generation. Part icon GameObject generation is done by cloning the part prefab and basically removing everything on it. The operation represent roughly 35% of the part compilation time, a good two thirds of that being in instantiating the copy of prefab, and running all the associated code (part / modules KSPField scaffolding and Awake() code), only to immediately destroy everything. Unfortunately, this can't really be avoided due to the need to run PartModule.OnIconCreate() and in some case, some model-manipulation code in Awake().
    • Faster Part fields parsing, by creating a dictionary of IL-emitted parser delegates instead of the very generic reflection based stock approach.

Overall, these changes can provide a quite significant boost to loading time, mainly to part compilation and model loading. On a hot boot (config/model/texture loading not throughput limited by I/O), total time from exe launch to main menu for a stock + BDB install is now around 35 seconds on my 5800X3D.

As usual, this probably need to be tested a bit more before being released.

@gotmachine gotmachine changed the base branch from master to dev October 16, 2024 16:24
@gotmachine gotmachine added the kspPerformance Possible performance improvement in KSP label Oct 18, 2024
gdb.progressTitle = "Loading model assets...";
yield return null;

// call non-stock model loaders
modelsByUrl = new Dictionary<string, GameObject>(allModelFiles.Count);
Copy link

@Cgettys Cgettys Oct 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not

modelsByUrl = new Dictionary<string, GameObject>(allModelFiles.Count, StringComparer.OrdinalIgnoreCase);
modelsByDirectoryUrl = new Dictionary<string, GameObject>(allModelFiles.Count, StringComparer.OrdinalIgnoreCase);

And the same for texturesByUrl?

It should have the same semantics as your existing equality logic, but probably will be faster based on your comment elsewhere about frequently falling through.

It might be slightly slower in the case where it's an exact match (since it has to do a tiny bit more work to compare), but not drastically - and it should eliminate fallback to the O(n) case for case insensitive matches.

I don't know if you still need the fallback (I saw in some cases that you might have to check a List because you could just directly modify it in stock KSP code for one of these optimizations), but thought I'd mention it.

https://learn.microsoft.com/en-us/dotnet/api/system.stringcomparer.ordinalignorecase?view=net-8.0
https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.dictionary-2.-ctor?view=net-8.0#system-collections-generic-dictionary-2-ctor(system-int32-system-collections-generic-iequalitycomparer((-0)))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Model search is case sensitive, but yeah that could make sense for textures.

TBH, occurrences of the methods being called with mismatching casing were pretty rare in my tests (less than 1%), and for the few cases where it happens, the requested casing will be added as duplicate entry, so the overhead will only occur on the first call for a specific casing. The case insensitive comparer will still induce quite a bit of overhead, so I'm not so sure that would be a net gain overall.

Falling back to a linear search on the original stock lists is necessary in any case (:P), as (a few) models/textures are also loaded from asset bundles and because there nothing preventing external code from modifying the lists directly. That still leave a hole if someone decide to remove models/textures from them, but that feel unlikely enough.

On a side note, upon further review, we should probably also patch GetTextureInfoIn() and GetModel(), as those are commonly used too, including during part compilation.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair - I assumed it was significantly more frequent based on this comment on the issue (which I happened across by chance when being curious about what the latest version you just published held :)).

"The dictionary approach is however not as beneficial here, because those methods allow to get a texture by its case-insensitive url, so we have to fallback to iterating on the list with a case insensitive string comparison if the requested url isn't correctly cased. The net result was still beneficial in my tests, but not as much as I hoped due to that slow path being taken relatively often."

InvariantCultureIgnoreCase and CurrentCultureIgnoreCase are fairly expensive; I'd be surprised (but would believe you) if OrdinalIgnoreCase was particularly expensive though. Then again, my knowledge may be more relevant for newer dotnet versions? (ex: https://devblogs.microsoft.com/dotnet/performance-improvements-in-net-8/).

But fair, you're trading an initial miss + a bit of memory in an uncommon case, to save a bit of arithmetic / bit-twiddling + possibly other overheads on every case - I'm not a KSP modding expert, your intuition is likely better than mine in this area and you're in a position to measure it for sure :).

Pity you can't get rid of that silly list - I was gonna suggest "well if you can modify, why not binary search" - but that won't work if code makes additions without being aware of the need to insert at the right position.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, unfortunately, the mono-derived BCL and JIT compiler used in KSP/Unity is missing all the juicy improvements from more or less the last 10 years (and in many cases, perform just plain worse than the good old .Net framework).

This being said, the case-insensitive dictionary still make sense and is a trivial change, so I will try and check what the profiling results have to say.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right... I haven't touched Unity in about half a decade. My frame of reference for "old dotnet performance" is .net framework... I forgot that Unity has its own fork of Mono.
And I'm assuming Unity's version of Mono is just different enough / just tightly coupled enough that we can't e.g. just drop in a modern version of the dotnet runtime either... Though it might be fun to try sometime 😄

…e called manually. Main intended usage is for patches requiring to be applied before ModuleManagerPostLoad.
@gotmachine
Copy link
Contributor Author

So, for reference, here are some profiling results regarding the GameDatabase methods :
This was tested on a stock + DLC + BDB + Tantares/TantaresLV + SXT + KWRocketry + a few NearFuture mods, for a total of 2676 parts and 142 IVAs.

The texture getting methods were called 13 326 times during loading :

  • Stock :
    image
  • KSPCF with the default comparer. The slow path was taken 26 times (0.22%) :
    image
  • KSPCF with the OrdinalIgnoreCase comparer. The slow path was never taken :
    image

So all in all, I'd say that the difference is pretty negligible, but using the default comparer is either faster or just as fast. I haven't verified but I would guess the majority of the slow path cases happened in GetTextureInfo() (and this is likely given the usage of it). The figures might start to invert a bit when the slow path is taken more often, but this feel unlikely.

The real gains are for models anyway :

  • Stock :
    image
  • KSPCF :
    image

…e file / class.

- Refactored those patches as a BasePatch using the new [ManualPatch] attribute (instead of patching manually with a separate harmony instance).
- Patched a few additional texture/model getting methods.
@Cgettys
Copy link

Cgettys commented Oct 20, 2024

So, for reference, here are some profiling results regarding the GameDatabase methods : This was tested on a stock + DLC + BDB + Tantares/TantaresLV + SXT + KWRocketry + a few NearFuture mods, for a total of 2676 parts and 142 IVAs.

The texture getting methods were called 13 326 times during loading :

* Stock :
  ![image](https://private-user-images.githubusercontent.com/24925209/378082809-0c4ae444-e5eb-45e5-9e4a-5cacd9de6a93.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjkzOTY5NDksIm5iZiI6MTcyOTM5NjY0OSwicGF0aCI6Ii8yNDkyNTIwOS8zNzgwODI4MDktMGM0YWU0NDQtZTVlYi00NWU1LTllNGEtNWNhY2Q5ZGU2YTkzLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDEwMjAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQxMDIwVDAzNTcyOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTc2NGE2ZWE5OTA3ZTI3YWMxZGY4NWQxZDBlNDM5ZTE1MjM3OGFkZjg1MjZhMDhiYmZkZWZiMjI4OWE4Y2QyYjAmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.xmajfV4yCc6lqfEHpCia0SYtHbwgkrVrKs3v5PF8XCA)

* KSPCF with the default comparer. The slow path was taken 26 times (0.22%) :
  ![image](https://private-user-images.githubusercontent.com/24925209/378083067-3fe1bebc-2a92-438f-b590-6332dd1338db.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjkzOTY5NDksIm5iZiI6MTcyOTM5NjY0OSwicGF0aCI6Ii8yNDkyNTIwOS8zNzgwODMwNjctM2ZlMWJlYmMtMmE5Mi00MzhmLWI1OTAtNjMzMmRkMTMzOGRiLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDEwMjAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQxMDIwVDAzNTcyOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTUxMjc4YTI4MTYwZGIzOTJmNmFmYzBhZjE0NzdkZmQyM2IwYzliMjQ1ZjU5N2M3MGY3ZGMwNjkwOGRjMzU5NTEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.vnG59RTXpxOdqnHkS_ifGlXnNl6T-FWYlt06CstXdd4)

* KSPCF with the OrdinalIgnoreCase comparer. The slow path was never taken :
  ![image](https://private-user-images.githubusercontent.com/24925209/378083243-80ec5f16-3207-4ba9-ade1-ef33d8b2e0c1.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjkzOTY5NDksIm5iZiI6MTcyOTM5NjY0OSwicGF0aCI6Ii8yNDkyNTIwOS8zNzgwODMyNDMtODBlYzVmMTYtMzIwNy00YmE5LWFkZTEtZWYzM2Q4YjJlMGMxLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDEwMjAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQxMDIwVDAzNTcyOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWRjZDlkNDFmZmYxOWQ1NWI5MjU0YTM3NzhkN2MzZTI1ZDA2ZmI1MDgzMTU2ZGIxNzZkN2Y1ZmJhN2M3ZmYwOTUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.hTVRMpG3K2SyokixNHNH9GWvJeS93CrdrHtvTB0jg7o)

So all in all, I'd say that the difference is pretty negligible, but using the default comparer is either faster or just as fast. I haven't verified but I would guess the majority of the slow path cases happened in GetTextureInfo() (and this is likely given the usage of it). The figures might start to invert a bit when the slow path is taken more often, but this feel unlikely.

The real gains are for models anyway :

* Stock :
  ![image](https://private-user-images.githubusercontent.com/24925209/378086200-c15673d8-68a1-46f3-acd1-f3fafa7c7fce.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjkzOTY5NDksIm5iZiI6MTcyOTM5NjY0OSwicGF0aCI6Ii8yNDkyNTIwOS8zNzgwODYyMDAtYzE1NjczZDgtNjhhMS00NmYzLWFjZDEtZjNmYWZhN2M3ZmNlLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDEwMjAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQxMDIwVDAzNTcyOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTY3YmVlM2JiM2M5Yjk4OWNhYjI1OTU1NDVhY2FmNjg4MmY1YjY1ZDhmZDFiOTc2ZGYzYmYzMzQ4MDQ5YjVhZDAmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.dsJeafHy9vDmIq71ZQX7LqGV53qnRy6Otbe3Y_E8n6o)

* KSPCF :
  ![image](https://private-user-images.githubusercontent.com/24925209/378086242-2e536a55-783e-442f-bd37-5e871d616f56.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjkzOTY5NDksIm5iZiI6MTcyOTM5NjY0OSwicGF0aCI6Ii8yNDkyNTIwOS8zNzgwODYyNDItMmU1MzZhNTUtNzgzZS00NDJmLWJkMzctNWU4NzFkNjE2ZjU2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDEwMjAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQxMDIwVDAzNTcyOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTdlZDcyNzNjYjMwMTYyOWM2ZTVkOWU2NmRhZTYyNzkxODg1NDkxN2RhYTAzMWQwZjFjNzUwZTcyNzQ3ZTg5ZjkmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.uaBVv5TlcOrPNFo8tD51p6UJLF6SLPfy8i7M6XnifkM)

I'd agree, thanks for sharing the numbers / satisfying my curiosity. And it's an interesting technique I hadn't though of for handling case insensitive lookup where usually the casing will match but it won't always.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kspPerformance Possible performance improvement in KSP
Development

Successfully merging this pull request may close these issues.

2 participants