Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A little review #1

Open
flo-l opened this issue Jan 4, 2019 · 4 comments
Open

A little review #1

flo-l opened this issue Jan 4, 2019 · 4 comments

Comments

@flo-l
Copy link

flo-l commented Jan 4, 2019

Hey! We were chatting about bex on reddit the other day. I glimpsed over the docs and code of the base module and took some notes I wanted to share. I really appreciate you writing this crate. It is already quite a large code base and a useful one. I want to help you polish it a little by sending PRs. This feedback is intended to be the start of a little discussion about what to refactor.

In general: The rust api guidelines are a good resource to design an idiomatic and reusable rust crate.

Bex notes

top level

  1. It could use top level docs, like examples or hints where to look for what. itertools is a good example.
  2. NID, VID and SID are not documented in base, only in bdd. I feel like these names should be self-explaining, eg. VariableId instead of VID.
  3. code style: there is a preferred way of formatting rust code. The rustfmt tool can do it automatically.

trait TBase

  1. needs doc comments for its methods.
  2. needs a better description.
  3. getters should be prefixed with "get_", eg. get_nid() instead of nid().

struct Base

  1. new() should not be taking arguments (like eg. Vec::new()). Define a from_raw_parts(...) instead.
  2. can derive Default
  3. public fields should all be private, use getters, setters and iterators instead.
  4. Methods desperately need docs and examples

base::Op

  1. each variant needs doc comment explaining its purpose.
  2. the names should be self-explaining (subjective, I know..). O and I don't make sense to me for example.
@tangentstorm
Copy link
Owner

Oh, hi there. I never saw the notification about your reply on reddit. Thanks for checking my code out!

I've written several versions of this system in other languages (python, elixir, J). It's sort of become one of the first things I take on after "hello world" when I'm learning a new language. The other implementations have always been so slow that they're basically just toys, though.

So, this started out as another throwaway implementation. But I really like rust, and it's sort of become a little game for me to see how fast I can make it go... And at some point, I decided it might actually be worth sharing. Since then, I've mostly been working on bdd.rs, which is why it's the one with all the nice comments and everything else is still shrouded in mystery. :)

This also started as part of another project in a private repo, and the most illuminating example code was still private. I just copied it over as examples/bdd-solve.rs.

As to your notes:

top level:

  1. more top level docs are on my to-do list... including starting with what BDDs even are and why anyone should care.
  2. NID and VID are my own terms, but I've been using them for so long, they're just second nature to me. But yes, they need to be explained up top. I think SID was probably a mistake and will go away. The new multi-threaded implementation also sprouted a QID for queries... Sorry :)
  3. You have a valid point, and I'd hate for my coding style to scare people away. Part of the reason it looks the way it does is that I'm just used to working with very terse code. (I program in K3 at my day job), but also I do most of my coding for this project on tiny 12-inch laptop on the subway, and vertical space is at a premium. (You might notice that there are ^L (ASCII form-feed) characters scattered throughout the code. On my screen, I'm looking at one little "page" of code at a time. I try to fit every little function or group of functions onto one such page, and so vertical space is at a premium. I find it extremely readable that way... But OTOH, the code looks awful to me on github or in other editors, when it's all presented as just one long file... I'll try and think about what to do here...

trait TBase
Definitely needs better docs. If it's not clear, the intended connotation is "database" rather than "base class" or anything like that. I'm pretty sure "bdd base" is the industry standard name for this general class of software in bdd.rs (It's what Donald Knuth calls them, anyway - he has a whole volume of "the art of computer programming" on bdds)... The thing in base.rs probably ought to be called an "AST base" (for abstract syntax tree...) (or perhaps "bex base" for boolean expression?)

In any case, the interface exposed in base.rs is very similar to the one in bdd.rs ... I really think these should be refactored into just one "base" trait, so that int.rs or the interactive shell (... which you haven't seen, but I just copied over to examples/bex-shell.rs) can work with bases of ASTs or BDDs or any of various other implementations. There are a bunch of interesting representations with different properties that work well for different tasks... ZDDs, algebraic normal form, maybe CNF (a normal form used in sat solvers)...

As for nid()... Is it really a getter? A nid isn't exactly a property of a Base. I don't have a strong opinion about the name, though.

struct Base
All of that sounds good to me. But keep in mind the intent is to unify the two "base" interfaces.

base::Op
So... these all represent standard boolean operators. O and I represent the constant functions 0 (false) and 1 (true). Var(x) just refers to input bit x.

The two-argument ones are hopefully self-explanatory. They're the bitwise operators you most often find in programming languages... I kind of want to have one type that uses this small subset of the 16 two-argument boolean operators, and another type that offers all of them as an optimization... So that Not(Xor(x,y)) becomes Eq(x,y), etc.

Likewise, there are 256 three-argument boolean functions, but only a couple I care about:

Ch(x,y,z) probably ought to be called Ite for consistency with bdd.rs. It's an if/then/else construct. (The name "ch" is used for this in cryptographic functions. It's short for "choose").

Mj(x,y,z) is the majority function. If you're adding two binary numbers, you're dealing with 3 bits in each position: one bit from each input, plus one carry bit. Then you have two output bits: the result bit is (x xor y xor z) (which could easily have its own name - Par(x,y,z) for parity) and the carry bit is Mj(x,y,z)..

Anyway, sorry for the wall of text. Thanks for the prod to add those examples.

I'd love to hear a little more about your project and what you're using all this for. :)

@flo-l
Copy link
Author

flo-l commented Jan 5, 2019

Thanks for this detailed answer!

I wrote my own version of a boolean ast for my project. I only needed to construct the ast and then convert it to cnf. I looked for a library that could do that, but couldn't find one. Because you asked: I tried to find sha256 collisions with a SAT solver ;)

It's not clear to me what an idiomatic API for boolean expressions would look like. A small API surface would be good. That is: A trait for the Base + at least on implementor of that trait and a trait for a boolean expression + at least one implementor. Base would have methods to create fresh Literals etc. and boolean expressions should be composable to more boolean expressions.

It remains an open question if the composition part would need a reference to the Base or not. The latter would be more user friendly, while the former allows more representations and optimizations...

Also, as you said, there are infinite operators, like maj etc. Enums are inherently inextensible. Maybe it makes more sense to use a trait and trait objects for operators. That would be quite similar to a class hierarchy in cpp, with virtual fn calls.

@tangentstorm
Copy link
Owner

I think at the very least, the interface would have all boolean operators that rust the language provides, and perhaps even all boolean operators of two arguments, as well as a handful of the common 3-argument ones.

But these could all have a default implementation in terms of and/xor/not or if-then-else.

The way I pictured it, each sort of base would have an associated NID type... These could be just integers in the simplest case (like base.rs) or more complicated structures. For bdd.rs, it's conceptually a struct with various meta-information about the node so that for many operations, the node itself never has to be looked up from memory. (I actually used a struct at one point, but it turned out to be faster to use a bit-packed u64).

Regarding the composition part needing the base: if you look in int.rs, you'll see there's a compound struct called a BaseBit -- a (BaseRef, NID) pair. This allows you to implement all the standard rust operators on the values. So you can use that fancy overloaded syntax if you want, or you can just call base methods on individual NIDs.

The next higher up abstraction is an array of these bits that act like integers. It's funny you mentioned SHA256, because that was also my original use case: I implemented sha256 for u32, and then once I had that working and tested, passed in my x32 type instead.

Maybe it makes more sense to use a trait and trait objects for operators. That would be quite similar to a class hierarchy in cpp, with virtual fn calls.

I'm not sure I understand this. Can you elaborate?

@flo-l
Copy link
Author

flo-l commented Jan 8, 2019

I think at the very least, the interface would have all boolean operators that rust the language provides, and perhaps even all boolean operators of two arguments, as well as a handful of the common 3-argument ones.

But these could all have a default implementation in terms of and/xor/not or if-then-else.

+1

NID as associated type on base makes sense. I also did it that way.

Having Bit and BaseBit is elegant, sounds great! For the first prototype BaseBit can be left out.

The next higher up abstraction is an array of these bits that act like integers. It's funny you mentioned SHA256, because that was also my original use case: I implemented sha256 for u32, and then once I had that working and tested, passed in my x32 type instead.

Hah, I did the same. However, I'd envision a boolean only lib at first. Bitvectors could live in their own crate.

Regarding trait objects: If you define an enum with a variant for each operator, users can't add new operators, because they can't add enum variants. The solution is to use a trait instead. Users can create their own types for custom operators and implement the trait for them. There is a drawback though: The nodes would need to store custom operators as trait objects. It can be made quite efficient by having an enum variant for custom operators (as trait object). A trait object is a pointer to a struct or enum and a pointer to a virtual function table. It hides the memory layout of an implementor of a trait, allowing users to call methods of the trait anyway. Virtual dispatch is used, which makes performance worse compared to direct method calls.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants