-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement element-addressable memory #1598
Implement element-addressable memory #1598
Conversation
6eadbd9
to
74efdb5
Compare
@bobbinth I am done implementing the processor part of the change, which is ready for review. All tests in the processor pass (except 2 which run the verifier, or verify the correctness of the bus). Left to do after (high-level):
|
83c63ef
to
ec1094f
Compare
31c3e2a
to
f07da49
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thank you! I reviewed most of the non-test and non-MASM code - though, I only had a superficial look at the constraint evaluation code. Left some comments inline - mostly small/nits.
A couple of questions:
- How did these changes impact performance?
- Would the current design support reading multiple elements from the same word in the same cycle? (see here for context).
let c_i = |f_i| { | ||
frame.read_write_next() | ||
+ binary_not(frame.read_write_next()) | ||
* binary_not(frame.element_or_word_next()) | ||
* binary_not(f_i) | ||
}; | ||
|
||
(c_i(f0), c_i(f1), c_i(f2), c_i(f3)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use the intermediate variable z_i
and add the explanations in this comment with the modifications needed (e.g., use next
variables in the definition and remove f_scb
from the expression defining c_i
.).
This will help in understanding the logic and catching soundness bugs in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added z_i
, and a bunch of comments (including the docstring for the method). Let me know if anything important is still missing.
|
||
#[inline(always)] | ||
fn element_or_word(&self) -> E { | ||
self.current()[MEMORY_ELEMENT_OR_WORD_COL_IDX] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this might be a personal cognitive bias but I would've preferred IS_ELEMENT_OP
or something similar as a name for the flag indicating whether the current op is element or word based.
Similarly for MEMORY_READ_WRITE_COL_IDX
, taking reads as the default or base op, I would have called it instead IS_MEMORY_WRITE_COL_IDX
and similarly for related methods and constants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, fixed. I had started with similar names but reverted for some reason that I can't recall... Definitely more readable now.
/// the specified delta type, which determines the column over which the delta and delta inverse | ||
/// values of the trace would be calculated. | ||
/// | ||
/// - When the delta type is Context, the address and clock columns can be anything. | ||
/// - When the delta type is Address, the context must remain unchanged but the clock can change. | ||
/// - When the delta type is Batch, the context must remain unchanged but the clock can change. | ||
/// - When the delta type is Clock, both the context and address columns must remain unchanged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be context and batch.
Also is it Batch or batch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
Also is it Batch or batch?
I went with capitalized when designating a variant of MemoryTestDeltaType
, and not capitalized otherwise.
processor/src/chiplets/memory/mod.rs
Outdated
/// clock cycles computed as described above. | ||
/// - `f_scb` is a flag indicating whether the context and the batch are the same as in the next |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: " .. and the batch of the current row are the same as in the next one."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
mem_loadw | ||
|
||
# load pi_{i+512} | ||
# load pi_{i+2048} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the index of the coefficient of the polynomial should stay the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted
padw | ||
dup.12 | ||
mem_loadw | ||
#=> [PI, PI_{i+512}, C, pi_ptr, ...] | ||
#=> [PI, PI_{i+2048}, C, pi_ptr, ...] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted
#! The i-th coefficient s1_i of s1 is equal to c_i - (pi_i - pi_{512 + i}) which is equal to | ||
#! c_i + pi_{512 + i} - pi_i. Now, we know that the size of the pi_i coefficients is bounded by | ||
#! J := 512 * q^2 and this means that J + pi_{512 + i} - pi_i does not Q-underflow and since | ||
#! The i-th coefficient s1_i of s1 is equal to c_i - (pi_i - pi_{2048 + i}) which is equal to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the index of the coefficient of the polynomial should stay the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right; reverted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably add a disclaimer that most, if not all, of the procedures in this file assume that the input pointers are word-aligned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Looks great, thank you! |
@bobbinth @Al-Kindi-0 ready for another look. I addressed all comments, and |
How does the performance look like in this branch vs. |
No change in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thank you! This is almost a full review (I have just a handful of files to go through).
btw, I think we should probably merge this into a new branch - e.g., something like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went through the constraints and all looks good except for a few questions and comments.
result: &mut [E], | ||
memory_flag_no_last_row: E, | ||
) -> usize { | ||
result[0] = memory_flag_no_last_row |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this consistent with the original definition current
and what is next
is a bit different.
If the code is correct then it would be nice to update the constraints appearing in the discussions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes good point. In the discussions, given a frame, the result for "same context and word address" was stored in the current
row, whereas in the implementation it is stored in the next
row. I only did that since it was easier to implement given how Memory::fill_trace()
is structured, but both are equivalent.
And after double-checking, this version is also correct given how memory_flag_not_last_row
works (i.e. it is turned off when the next
row of the frame lies outside the chiplet, so we never access the memory of the wrong chiplet).
If the code is correct then it would be nice to update the constraints appearing in the discussions.
This would be hard since it spans many posts, half of which are written by Bobbin. I will instead properly document the current constraints in the docs (#1616).
let f3 = frame.idx1_next() * frame.idx0_next(); | ||
|
||
let c_i = |f_i| { | ||
// z_i is set to 1 when `v'[i]` is not being accessed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be:
z_i is set to 1 when we are operating on elements but not the i-th element
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but not the i-th element
It's not quite the i-th element that's not set to 1; doesn't that mean that all i
are not set to 1?
I reworded to something that's hopefully better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not quite the i-th element that's not set to 1; doesn't that mean that all i are not set to 1?
Yeah it is the z_i we are talking about.
I still feel that the sentence doesn't capture the function of z_i but it should be easy to get it from the expression in the second line, hopefully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated it to be your suggestion z_i is set to 1 when we are operating on elements but not the i-th element
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thank you! I didn't review the constraints in too much detail, but will do this when reviewing the docs PR.
No additional comments from this review, but there are a couple of comments remain from the previous review (re handling padding for locals and updating cycles counts so that at least what we have there is not wrong).
All looks good to me now, thanks again! |
Closes #1064
Note that I changed the
batch
column to hold4 * previous_defn_of_batch
(now effectively the first address in the batch). This is to avoid a needless division per row in the processor.This PR does not address the concern raised #1064 (comment); we should do that one in a subsequent PR, as it is a backend thing, and getting this PR out the door will allow
miden-base
and the compiler to start working fixing their code.