Sort code output for more accurate analysis #495

bcssov · 2024-03-17T21:20:39Z

Jippi — Today at 9:46 PM
is the language used by Paradox order sensitive in terms of keys/blocks in the files? I've noticed a very high ratio of my conflicts are just things being ordered slightly different from each other - would be kinda cool if keys/blocks was sorted alphabetically if it doesn't matter to the engine - though probably very compute intenstive to do on the fly for the patch/diff and having to essentially copy all mods 🤔

Mario — Today at 9:49 PM
I don't think it is. I did consider that at one point to convert to dictionary and order it (for example). But additional processing would add an additional amount of analyzing time, all the while I wanted to reduce how much time it takes for it to analyze.

Mario — Today at 9:50 PM
I'm open to ideas and discussion about an approach that might work (as in not be too resource intensive and add too much time to the already long analysis). I do some small post processing to allow me to calculate the code hash and actually use that for comparing differences.

Jippi — Today at 9:53 PM
it's quite an interesting engineering problem 😄
how do Irony internally represent the files? is it an AST or plain files?

Mario — Today at 9:58 PM
It's converted to a string once all the post processing is done.

Mario — Today at 10:00 PM
Yes. Didn't see the need to keep the AST, though I can always break it down again if I have to. Just keeping an AST increases the number of objects and increases RAM usage.
Still, the ordering can be done during this phase when we still have access to AST. That woudn't be an issue, the issue was an additional performance overhead it would maybe create.

Jippi — Today at 10:02 PM
yeah, would probably be a bit compute intensive, and certainly IO intensive (poor HDD or slow SSD)
I wonder if it would be as useful / convenient if it was a button in the conflict resolver when you as a human see "oh, this seem like ordering is most of it, let me click this button and wait a bit for the system to parse, sort, write and reload the files" 🤔

Mario — Today at 10:05 PM
Maybe not, I do keep the whole file in memory when I do processing. It helps me avoid repeated IO loads.
Mainly what I see as bottleneck iterating over hunderds of thousands of objects and sorting them.

Jippi — Today at 10:08 PM
need GPU acceleration for file processing - slap some LLM on it and raise 200M USD for the project - ✅
in all seriousness, I wonder if having it as a button in the resolver so you can apply the sort logic in a case-by-case would be useful, at least to start with - if it turns out to be as useful as I think it could, then it can be opt-in for all files during processing, or remain a human triggered event in the resolver 🤔
or... you know what files have conflicts after the first processing, so you only sort + reprocess those files - and if sorting removes conflic, you remove them from the list before showing the UI.

Its usually in the 1000s or fewer (even 100s once you get the first mod update frenzy over with a new patch) - so would have significantly less CPU / IO strain on the system - and probably around mid-to-high single digit increase in processing time

Mario — Today at 10:13 PM
I've had all sorts of ideas, but lacked some help with that. I first thought about maybe uploading resolutions to the cloud for a start. So we lookup if someone resolved the conflicts for you already with the specified mod and hash combo. Then there was loot, all sorts of ideas to use in conflict rules. Mod a wins mod b.

Mario — Today at 10:13 PM
I actually thought of a similar thing to do only post processing near the end. And filter out these that might match.
Think this approach would actually work.
As for the LLM I won't even go into that, I actually considered that at one point. I know it's a joke.

Mario — Today at 10:15 PM
On average we would be sorting and comparing 5-10k objects instead of hunderds of thousands. And I can skip localization items.
Localization keeps a lot of items, now that I think about it it would work definitely. I will log this to github.

Jippi — Today at 10:16 PM
5-10k objects sound pretty snappy to me
Mario — Today at 10:16 PM
It does seem it should not add too much overhead.
Without localization you can reduce that to about 50%

bcssov added the feature New feature or request label Mar 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sort code output for more accurate analysis #495

Sort code output for more accurate analysis #495

bcssov commented Mar 17, 2024

Sort code output for more accurate analysis #495

Sort code output for more accurate analysis #495

Comments

bcssov commented Mar 17, 2024