Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement element-addressable memory #1598

Merged
merged 19 commits into from
Jan 10, 2025

Conversation

plafer
Copy link
Contributor

@plafer plafer commented Dec 10, 2024

Closes #1064

Note that I changed the batch column to hold 4 * previous_defn_of_batch (now effectively the first address in the batch). This is to avoid a needless division per row in the processor.

This PR does not address the concern raised #1064 (comment); we should do that one in a subsequent PR, as it is a backend thing, and getting this PR out the door will allow miden-base and the compiler to start working fixing their code.

@plafer plafer force-pushed the plafer-1064-element-addressable-mem branch 2 times, most recently from 6eadbd9 to 74efdb5 Compare December 12, 2024 21:17
@plafer plafer requested a review from bobbinth December 12, 2024 21:17
@plafer
Copy link
Contributor Author

plafer commented Dec 12, 2024

@bobbinth I am done implementing the processor part of the change, which is ready for review. All tests in the processor pass (except 2 which run the verifier, or verify the correctness of the bus).

Left to do after (high-level):

  • fix bus
  • update air constraints
  • update the book
  • a bit of cleanup (the TODO(plafer)'s)

@plafer plafer force-pushed the plafer-1064-element-addressable-mem branch 3 times, most recently from 83c63ef to ec1094f Compare December 18, 2024 22:05
@plafer plafer force-pushed the plafer-1064-element-addressable-mem branch from 31c3e2a to f07da49 Compare December 19, 2024 20:53
@plafer plafer marked this pull request as ready for review December 21, 2024 18:40
Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you! I reviewed most of the non-test and non-MASM code - though, I only had a superficial look at the constraint evaluation code. Left some comments inline - mostly small/nits.

A couple of questions:

  • How did these changes impact performance?
  • Would the current design support reading multiple elements from the same word in the same cycle? (see here for context).

CHANGELOG.md Outdated Show resolved Hide resolved
air/src/trace/mod.rs Show resolved Hide resolved
docs/src/user_docs/stdlib/mem.md Outdated Show resolved Hide resolved
stdlib/docs/mem.md Outdated Show resolved Hide resolved
miden/src/repl/mod.rs Outdated Show resolved Hide resolved
processor/src/chiplets/aux_trace/mod.rs Outdated Show resolved Hide resolved
air/src/trace/chiplets/memory.rs Outdated Show resolved Hide resolved
air/src/trace/chiplets/memory.rs Outdated Show resolved Hide resolved
assembly/src/assembler/mod.rs Outdated Show resolved Hide resolved
docs/src/user_docs/stdlib/collections.md Outdated Show resolved Hide resolved
Comment on lines 169 to 176
let c_i = |f_i| {
frame.read_write_next()
+ binary_not(frame.read_write_next())
* binary_not(frame.element_or_word_next())
* binary_not(f_i)
};

(c_i(f0), c_i(f1), c_i(f2), c_i(f3))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use the intermediate variable z_i and add the explanations in this comment with the modifications needed (e.g., use next variables in the definition and remove f_scb from the expression defining c_i.).
This will help in understanding the logic and catching soundness bugs in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added z_i, and a bunch of comments (including the docstring for the method). Let me know if anything important is still missing.

air/src/constraints/chiplets/memory/mod.rs Outdated Show resolved Hide resolved

#[inline(always)]
fn element_or_word(&self) -> E {
self.current()[MEMORY_ELEMENT_OR_WORD_COL_IDX]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this might be a personal cognitive bias but I would've preferred IS_ELEMENT_OP or something similar as a name for the flag indicating whether the current op is element or word based.
Similarly for MEMORY_READ_WRITE_COL_IDX, taking reads as the default or base op, I would have called it instead IS_MEMORY_WRITE_COL_IDX and similarly for related methods and constants.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, fixed. I had started with similar names but reverted for some reason that I can't recall... Definitely more readable now.

air/src/constraints/chiplets/memory/tests.rs Outdated Show resolved Hide resolved
/// the specified delta type, which determines the column over which the delta and delta inverse
/// values of the trace would be calculated.
///
/// - When the delta type is Context, the address and clock columns can be anything.
/// - When the delta type is Address, the context must remain unchanged but the clock can change.
/// - When the delta type is Batch, the context must remain unchanged but the clock can change.
/// - When the delta type is Clock, both the context and address columns must remain unchanged.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be context and batch.
Also is it Batch or batch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Also is it Batch or batch?

I went with capitalized when designating a variant of MemoryTestDeltaType, and not capitalized otherwise.

/// clock cycles computed as described above.
/// - `f_scb` is a flag indicating whether the context and the batch are the same as in the next
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: " .. and the batch of the current row are the same as in the next one."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

mem_loadw

# load pi_{i+512}
# load pi_{i+2048}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the index of the coefficient of the polynomial should stay the same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted

padw
dup.12
mem_loadw
#=> [PI, PI_{i+512}, C, pi_ptr, ...]
#=> [PI, PI_{i+2048}, C, pi_ptr, ...]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted

#! The i-th coefficient s1_i of s1 is equal to c_i - (pi_i - pi_{512 + i}) which is equal to
#! c_i + pi_{512 + i} - pi_i. Now, we know that the size of the pi_i coefficients is bounded by
#! J := 512 * q^2 and this means that J + pi_{512 + i} - pi_i does not Q-underflow and since
#! The i-th coefficient s1_i of s1 is equal to c_i - (pi_i - pi_{2048 + i}) which is equal to
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the index of the coefficient of the polynomial should stay the same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right; reverted

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably add a disclaimer that most, if not all, of the procedures in this file assume that the input pointers are word-aligned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@Al-Kindi-0
Copy link
Collaborator

Looks great, thank you!
I did an initial round of reviews and left some comments inline. Went through all files in detail except for the constraints one as it seems to have some todos.

air/src/trace/mod.rs Outdated Show resolved Hide resolved
core/src/debuginfo/source_file.rs Show resolved Hide resolved
miden/src/helpers.rs Outdated Show resolved Hide resolved
processor/src/chiplets/aux_trace/mod.rs Outdated Show resolved Hide resolved
@plafer plafer requested review from Al-Kindi-0 and bobbinth January 7, 2025 22:40
@plafer
Copy link
Contributor Author

plafer commented Jan 7, 2025

@bobbinth @Al-Kindi-0 ready for another look. I addressed all comments, and TODO(plafer). I will open an issue tomorrow for what's left after this PR (namely docs and implementing the additional required range-checks).

@bobbinth
Copy link
Contributor

bobbinth commented Jan 7, 2025

@bobbinth @Al-Kindi-0 ready for another look. I addressed all comments, and TODO(plafer). I will open an issue tomorrow for what's left after this PR (namely docs and implementing the additional required range-checks).

How does the performance look like in this branch vs. next?

assembly/src/assembler/instruction/mem_ops.rs Show resolved Hide resolved
assembly/src/assembler/mod.rs Outdated Show resolved Hide resolved
processor/src/errors.rs Outdated Show resolved Hide resolved
@plafer
Copy link
Contributor Author

plafer commented Jan 8, 2025

How does the performance look like in this branch vs. next?

No change in the program_execution benchmarks

@plafer plafer requested a review from bobbinth January 8, 2025 15:01
Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you! This is almost a full review (I have just a handful of files to go through).

assembly/src/assembler/procedure.rs Outdated Show resolved Hide resolved
assembly/src/assembler/mod.rs Outdated Show resolved Hide resolved
assembly/src/assembler/procedure.rs Outdated Show resolved Hide resolved
assembly/src/errors.rs Show resolved Hide resolved
processor/src/chiplets/aux_trace/mod.rs Outdated Show resolved Hide resolved
stdlib/asm/crypto/dsa/rpo_falcon512.masm Outdated Show resolved Hide resolved
stdlib/asm/crypto/dsa/rpo_falcon512.masm Show resolved Hide resolved
stdlib/asm/crypto/dsa/rpo_falcon512.masm Show resolved Hide resolved
stdlib/asm/crypto/dsa/rpo_falcon512.masm Show resolved Hide resolved
stdlib/asm/crypto/dsa/rpo_falcon512.masm Show resolved Hide resolved
@bobbinth
Copy link
Contributor

bobbinth commented Jan 9, 2025

btw, I think we should probably merge this into a new branch - e.g., something like element-mem. This way, we can start migrating miden-base to this branch but at the same time, would be able to make additional PRs against it.

Copy link
Collaborator

@Al-Kindi-0 Al-Kindi-0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went through the constraints and all looks good except for a few questions and comments.

air/src/constraints/chiplets/memory/mod.rs Outdated Show resolved Hide resolved
air/src/constraints/chiplets/memory/mod.rs Outdated Show resolved Hide resolved
air/src/constraints/chiplets/memory/mod.rs Outdated Show resolved Hide resolved
result: &mut [E],
memory_flag_no_last_row: E,
) -> usize {
result[0] = memory_flag_no_last_row
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this consistent with the original definition $f_{scb} = [1 - (n_0 + (1-n_0) \cdot n_1)] \cdot f_{mem}'$ ? The expression is correct but what is current and what is next is a bit different.
If the code is correct then it would be nice to update the constraints appearing in the discussions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes good point. In the discussions, given a frame, the result for "same context and word address" was stored in the current row, whereas in the implementation it is stored in the next row. I only did that since it was easier to implement given how Memory::fill_trace() is structured, but both are equivalent.

And after double-checking, this version is also correct given how memory_flag_not_last_row works (i.e. it is turned off when the next row of the frame lies outside the chiplet, so we never access the memory of the wrong chiplet).

If the code is correct then it would be nice to update the constraints appearing in the discussions.

This would be hard since it spans many posts, half of which are written by Bobbin. I will instead properly document the current constraints in the docs (#1616).

air/src/constraints/chiplets/memory/mod.rs Outdated Show resolved Hide resolved
let f3 = frame.idx1_next() * frame.idx0_next();

let c_i = |f_i| {
// z_i is set to 1 when `v'[i]` is not being accessed.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be:
z_i is set to 1 when we are operating on elements but not the i-th element

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but not the i-th element

It's not quite the i-th element that's not set to 1; doesn't that mean that all i are not set to 1?

I reworded to something that's hopefully better

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not quite the i-th element that's not set to 1; doesn't that mean that all i are not set to 1?

Yeah it is the z_i we are talking about.
I still feel that the sentence doesn't capture the function of z_i but it should be easy to get it from the expression in the second line, hopefully.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated it to be your suggestion z_i is set to 1 when we are operating on elements but not the i-th element

air/src/constraints/chiplets/memory/mod.rs Outdated Show resolved Hide resolved
air/src/constraints/chiplets/memory/mod.rs Outdated Show resolved Hide resolved
air/src/constraints/chiplets/memory/mod.rs Outdated Show resolved Hide resolved
air/src/constraints/chiplets/memory/mod.rs Outdated Show resolved Hide resolved
@plafer plafer changed the base branch from next to element-addressable-memory January 9, 2025 20:48
@plafer plafer requested review from Al-Kindi-0 and bobbinth January 9, 2025 20:50
Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you! I didn't review the constraints in too much detail, but will do this when reviewing the docs PR.

No additional comments from this review, but there are a couple of comments remain from the previous review (re handling padding for locals and updating cycles counts so that at least what we have there is not wrong).

@Al-Kindi-0
Copy link
Collaborator

All looks good to me now, thanks again!

@plafer plafer merged commit 0e7c1d6 into element-addressable-memory Jan 10, 2025
9 checks passed
@plafer plafer deleted the plafer-1064-element-addressable-mem branch January 10, 2025 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants