Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a trait/derive macro for transforming Rust type definitions into Binja Type objects #5256

Open
wants to merge 16 commits into
base: dev
Choose a base branch
from

Conversation

mkrasnitski
Copy link
Contributor

@mkrasnitski mkrasnitski commented Apr 10, 2024

Supersedes #4545.

This time around I was able to see this work through to a good enough state to not publish as a draft. Obviously, I would very much like feedback on this, as introducing a proc-macro into the codebase adds a clear extra maintenance burden that normal Rust code doesn't come with.

Overview

This PR adds the following trait:

pub trait AbstractType {
    #[doc(hidden)]
    const LAYOUT: std::alloc::Layout = std::alloc::Layout<Self>();

    fn resolve_type() -> Ref<Type>;
}

Implementing this trait allows creating a Type object corresponding to the layout and representation of whatever type implements it. For example, take the following type:

#[derive(AbstractType)]
#[repr(C)]
struct S {
    first: u64,
    second: u8,
    third: i32
}

Calling S::resolve_type() produces the following type (shown after defining it in a view):

image

Derive macro

The accompanying derive macro is very convenient for automatically generating implementations of the AbstractType trait, as long as the deriving type meets the following requirements:

  • For structs/unions, the type must be #[repr(C)], and must have named fields.
  • For enums, the type must not be #[repr(C)] but instead #[repr(<int>)] (e.g. u8, u64). Also, variants must not contain any data, and must have an explicit discriminant. Note that usize and isize are not supported as enum reprs.

Alignment

The derive macro supports #[repr(packed)] and #[repr(align)] on types and will propagate that information into the generated Binja Type (i.e. marking a struct #[repr(packed)] will result in a __packed type in Binja).

Named types

Named types are supported by decorating a field with a #[binja(named)] attribute. As an example:

#[derive(AbstractType)]
#[repr(u64)]
enum E {
    One = 1,
    Two = 2,
}

#[derive(AbstractType)]
#[repr(C)]
struct S {
    first: u64,
    #[binja(named)]
    second: E,
}

The result of S::resolve_type() is the following type:

image

Note that defining S in a view does not require that E already be defined. As long as E is eventually defined, then clicking on the type of second will redirect to the correct type.

Pointers

If a struct contains pointer fields (of type *const T or *mut T) it must be decorated with #[binja(pointer_width = <int>)]. Since pointers can change width depending on architecture, users must specify the width ahead of time. As an example, if we modify S from above to look like:

#[derive(AbstractType)]
#[repr(C)]
#[binja(pointer_width = 8)]
struct S {
    first: u64,
    #[binja(named)]
    second: *const E,
}

Then we will get the following:

image

@plafosse plafosse requested review from CouleeApps, ElykDeer and rssor and removed request for CouleeApps April 17, 2024 17:29
Copy link
Member

@ElykDeer ElykDeer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR! Sorry for the delay in getting back to you. This looks cleaner than I ever imagined it would be, and I'd love to get this merged.

The following need to occur before this can be merged:

  • @psifertex : Please let us know what we need to do about licensing; This PR adds dependencies for API consumers who explicitly turn on an additional feature (in their Cargo.toml) and does not add any dependencies to the build (unless/until we start using this feature internally). One concern I have at a cursory glance is that Elain only specifies that it's MIT/Apache in their Cargo.toml and doesn't otherwise have an actual copy of the license in the source code/repo.
  • @mkrasnitski : I'm not going to be comfortable merging this until we have tests to ensure it doesn't go stale... But given that it generates API code that'll get checked by the compiler every build, I'm uniquely unconcerned about runtime bugs! I think adding something to the examples that has as much code coverage in the macro as possible will suffice.
  • @mkrasnitski : This has to be one of the best documented PRs we've ever gotten. Please copy-pasta that into some doc strings too. Feel free to add to the developer guide, even, if you think this necessitates that.
  • @mkrasnitski : Please add the #[name = "..."] thing.
  • @mkrasnitski : Similarly, I like the idea of the struct decorator for pointer widths rather than specifying on each field.

Other feedback/questions:

  • I'm not sure I understand how #[repr(align)] works. Could you please provide an example?
  • I'm fine with #[repr(C)] and #[repr(<int>)] (I think we support u128 as a type in the core btw, and I think this repr solution is actually really nice for the enums), but is there a way to omit #[named]? Is it possible to fall back on to being named if the type is otherwise unrecognized?

...this PR also actually has me thinking about a #[tokio::main] sorta thing that'll do the script initialization/shutdown for you.

@ElykDeer ElykDeer self-assigned this May 10, 2024
@ElykDeer ElykDeer added Type: Enhancement Issue is a small enhancement to existing functionality Component: Rust API Issue needs changes to the Rust API labels May 10, 2024
@mkrasnitski
Copy link
Contributor Author

I'm not going to be comfortable merging this until we have tests to ensure it doesn't go stale...

For sure, I'll look into adding some tests. I probably should have included them from the start since I was using a standalone Rust plugin to test how the generated types would look in the UI. I see that Type implements PartialEq, would this work for my purposes, so that I could test using assert_eq? I don't know how type equality is done in Binja, and what exactly is compared.

Please add the #[name = "..."] thing.

Yeah, this is a good idea in principle. I'm thinking there should be a single attribute here instead of two (named and name), so that the mental model becomes: if a string literal value isn't provided, then the name of the Rust type is used as the default. Bikeshedding on an attribute name is definitely welcome here, as well as for the #[pointer_width] attribute (that's not a very good name imo).

I'm not sure I understand how #[repr(align)] works. Could you please provide an example?

Decorating a type declaration with #[repr(align(N))] changes that type's minimum alignment to be at least N. For example, say a struct's normal alignment is 4 bytes. Then, decorating the struct definition with #[repr(align(8))] will change it to be 8-byte aligned. However, decorating it with #[repr(align(2))] will keep the alignment at 4. I also uses this internally in my macro for doing type-mocking of pointer types:

// Assuming we know T
const ALIGN: usize = std::mem::align_of::<T>();
const SIZE: usize = std::mem::size_of::<T>();

#[repr(C)]
#[repr(align(ALIGN))] // invalid syntax - we use `elain` to accomplish this
struct Mock {
    t: [u8; SIZE],
}

We've basically created a type with arbitrary size and alignment, since we can change the values of the SIZE and ALIGN constants. And, since align_of<[u8; N]> == align_of<u8> == 1, we know that align_of::<Mock> == ALIGN since ALIGN is always at least 1.

Is there a way to omit #[named]? Is it possible to fall back on to being named if the type is otherwise unrecognized?

There might be some cases where users might want to keep the type definition inline rather than reference a named type. I think for that reason, always falling back to the Rust name as a default might be too opinionated.

@ElykDeer
Copy link
Member

Thanks for the additional feedback. I already mentioned this in Slack, but don't worry about assert_eqing types against each other; so long as the procedural macro is able to generate valid API code that gets checked by cargo when we build, I'll be happy with it.

Bikeshedding on an attribute name is definitely welcome here, as well as for the #[pointer_width] attribute (that's not a very good name imo).

Personally I'm fine with name, named, and pointer_width. If you wanna merge name and named, I'm more partial to named since it has to do with named_types.

There might be some cases where users might want to keep the type definition inline rather than reference a named type. I think for that reason, always falling back to the Rust name as a default might be too opinionated.

What about the opposite then? Add a #[inline]/#[inline_type]

@mkrasnitski
Copy link
Contributor Author

mkrasnitski commented May 15, 2024

What about the opposite then? Add a #[inline]/#[inline_type]

IMO this feels slightly awkward, since then the default behavior with no attributes would be to create a named type using the Rust type's name, and then providing #[name = "..."] would add a custom name whereas #[inline_type] would remove the name. So basically we have two switches controlling a single property.

Personally I'm fine with name, named, and pointer_width. If you wanna merge name and named, I'm more partial to named since it has to do with named_types.

For some reason, I think if we wrap all three in a #[binja(...)] attribute it might actually look fine. So, it would be #[binja(named)], #[binja(name = "...")], and #[binja(pointer_width = "u64")]. Then, it's kind of like those attributes belong to the binja namespace, whereas when they nakedly decorate a field they feel a bit out of place. Is binja alright for this or would binaryninja be preferred? I don't know how hard y'all stick to not using the Binja nickname in user-facing code/docs.

Btw, should the value of pointer_width be changed to an int literal instead of a string literal? That way the "width" can be interpreted more literally, and it seems that Binja supports pointers of arbitrary width.

@mkrasnitski
Copy link
Contributor Author

Alright, I think all the behavioral changes are taken care of. To briefly demonstrate: a type that derives AbstractType might now look like this:

#[derive(AbstractType)]
#[repr(u64)]
enum E {
    One = 1,
    Two = 2,
}

#[derive(AbstractType)]
#[repr(C)]
#[binja(pointer_width = 8)]
struct S {
    first: u64,
    #[binja(named)]
    second: *const E,
}

What remains to be done is adding documentation and tests. Ideally I would like #5435 to be resolved and then rebase on top of it, however that might take some time so in the meantime I'll go with the original suggestion of adding to examples/ since those aren't run in CI.

We accomplish this by generating alternative struct layouts with swapped
out field types, and recursively calculating size and alignment using
associated constants on the `AbstractType` trait
The attribute supports three properties:
 - Specifying pointer width is done for the whole struct/union now
   instead of each field using a `#[binja(pointer_width = <int>)]`
   property.
 - Specifying named fields is done either with `#[binja(name = "...")]`,
   or `#[binja(named)]` which will use the name of the Rust type as a
   default value.
@mkrasnitski
Copy link
Contributor Author

mkrasnitski commented Jun 11, 2024

@KyleMiles This PR should be ready for re-review now. In total, the new changes since you last reviewed are the commit range 98fde41 through 9a23a8e (comparison).

In summary:

  • Added support for 128-bit reprs on enums
  • Switched field attributes to use the #[binja(...)] format instead of raw paths
  • Added documentation/comments to macro implementation
  • Added a dev guide for using the macro (rendered)
  • Added tests in the form of an abstract-type example

The only thing missing at this point is support for fields of type void *, however I'm not sure how to support it given that raw void is not a valid type for a field. Additionally, the documentation for std::ffi::c_void explains that void corresponds to () but void * corresponds to*{const,mut} c_void rather than *{const,mut} ().

`StructureBuilder::insert` doesn't affect the alignment of the type, so
we need to always explicitly specify alignment.
@emesare
Copy link
Member

emesare commented Jul 29, 2024

I have discussed this with @mkrasnitski and the plan forward for this is as a separate crate. We need to actually version the binaryninja crate with an attached ABI version (i.e. 0.1.0-69) before this can be split off.

We should aim to have a myriad of helper crates instead of maintaining large effort abstractions/helpers in this repository. This will allow greater flexibility and a clear line between where the API ends and the helpers begin, something that is not clear within our python bindings.

We are going to keep this PR open until the above is at-least partially completed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Rust API Issue needs changes to the Rust API Type: Enhancement Issue is a small enhancement to existing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants