Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate signal strength to all outputs #129

Merged
merged 21 commits into from
Dec 14, 2023

Conversation

BramOtte
Copy link
Contributor

@BramOtte BramOtte commented Oct 15, 2023

We duplicate the signal strength of a component to all outputs, so we can get the input signal strength without having to fetch the inputs each time. Additionally by having separate buckets for each signal strength the combined input strength is calculated with just a few u128 instructions instead of having to loop over all the input.
This results in the first tenth of the chungus benchmark to only take ~7.6 on my machine while without this change it takes ~9.9 seconds, so ~30% faster.
Tested on: Ubuntu 22.04.3 LTS, intel core I7-10750H

@Paul1365972
Copy link
Contributor

Wow wouldn't have thought that such huge performance improvements were still possible. Confirmed similar performance increases (37-40% on my windows intel i7-6700HQ machine).
However this change would hard lock this project to never support HSS (Signal strengths above 15, in vanilla 897 is the limit iirc), just something to consider.
By the way, did you somehow verify that redstone behaves the same? While testing with this patch there are about 1% less redstone ticks. Just making sure, but most likely nothing to worry about since there would probably be larger discrepancies if something actually went wrong.
Also probably a good idea to add a sanity check and panic if either the default or side input has more than 255 input connections.

@Paul1365972
Copy link
Contributor

Btw on line 272 of direct.rs there is one constant 16 that should have also been replaced by Self::NUM_QUEUES.

Copy link
Member

@StackDoubleFlow StackDoubleFlow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! I'm happy to merge this after a quick cargo fmt.

crates/core/src/redpiler/backend/direct.rs Show resolved Hide resolved
crates/core/src/redpiler/backend/direct.rs Show resolved Hide resolved
@@ -565,33 +599,26 @@ fn schedule_tick(
scheduler.schedule_tick(node_id, delay, priority);
}

fn link_strength(link: DirectLink, nodes: &Nodes) -> u8 {
nodes[link.to].output_power.saturating_sub(link.weight)
const INPUT_MASK: u128 = u128::from_ne_bytes([0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe a better name being BOOL_INPUT_MASK?
Better to be specific since it's module scoped.

@BramOtte
Copy link
Contributor Author

BramOtte commented Dec 5, 2023

Great work! I'm happy to merge this after a quick cargo fmt.

alright I'l do that there is also one other minor fix I will add shortly
Also a question when I do cargo fmt another parts of the code also changed I assume I should not include these in my pull request? (this happens in crates/core/src/plot/mod.rs and crates/core/src/world/storage.rs)

@StackDoubleFlow
Copy link
Member

It's normally good practice not to, but I'll allow it in this case. Eventually I'd like to have CI fail if code isn't formatted properly so that it stops seeping into the repo.

@BramOtte
Copy link
Contributor Author

BramOtte commented Dec 5, 2023

alright that should be it then I also changed ForwardLink::new and tested it with chungus2 and all is fine and well (or at least if you ignore the known issue with chungus2).
previously it would crash on non-optimized builds because ss was too large so I made it clamp instead of crash.
It might also indicate that something else in the code is going wrong but for now this will suffice.

@Paul1365972
Copy link
Contributor

Maybe just putting a graph.retain_edges(|g, edge| g[edge].ss <= 15); at the top of the compile function (or in a compiler pass) would make sense here as well.

@StackDoubleFlow
Copy link
Member

Maybe just putting a graph.retain_edges(|g, edge| g[edge].ss <= 15); at the top of the compile function (or in a compiler pass) would make sense here as well.

Like in the ClampWeights pass?

@StackDoubleFlow
Copy link
Member

Please rebase and I'll merge

@Paul1365972
Copy link
Contributor

You're right, completly forgot about that pass. In that case I think making the ClampWeights pass mandatory (or just merging it into the InputSearch pass) and keeping the assert would be a good idea, not that it will make much of a difference.

@BramOtte
Copy link
Contributor Author

BramOtte commented Dec 8, 2023

Please rebase and I'll merge

I don't really know how rebase works so I merged it instead.
From what I understand how rebase works is it is basically equivilant to merging on master.
So I created a new branch from the current state of master: https://github.com/BramOtte/MCHPRS/tree/duplicate-signal-strength-rebase
aswell and merged my changes onto that incase you find that more convenient.

never mind, I pushed another change just merge this branch

@StackDoubleFlow
Copy link
Member

Your previous merge commit was empty, there are still conflicts with master.

@BramOtte
Copy link
Contributor Author

I do not understand what exactly went wrong, but I got the rebase to work, turnsout I just had to force push to make it work.
So you should be able to merge with no issues now

@StackDoubleFlow StackDoubleFlow merged commit 6ae0623 into MCHPR:master Dec 14, 2023
1 check passed
BramOtte added a commit to BramOtte/MCHPRS that referenced this pull request Jan 3, 2024
* decrease size of DirectLink to 4 bytes

* duplicate signal strength to all outputs
This allows input states to be checked with simd
because the input nodes don't need to be fetched

* cleanup old code

* remove erroring line

* make index of NodeId private again

* Make TickScheduler::NUM_QUEUES constant

* remove from_ne_bytes from last_index_positive

* only get needed repeater inputs

* replace magic number with NUM_QUEUES

* don't crash the program when ss is larger than 15
when creating a forward link

* fix minor typo

* rename INPUT_MASK to BOOL_INPUT_MASK

* Run cargo fmt

* address nits

* Add check for maximum inputs.
This isn't really neccesary right now but if we ever add optimizations which combine nodes it might become likely that a node will have more than 255 inputs.

* make poper use of input counters in from_compile_node

* Make clamp_weights pass mandatory

* Use get_unchecked_mut() for incrementing signal strength counters

* skip updating if output power did not change

* wrap input arrays into a struct to ensure they are aligned

* run cargo fmt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants