Skip to content

Conversation

thejtshow
Copy link
Contributor

@thejtshow thejtshow commented Apr 10, 2025

Hello! This is an awesome project and I'm eager to share some work I think would greatly benefit the flatbuffers community.

Description

This PR is my attempt to create a full-featured Wireshark dissector generator (see #153 and #8333).

This implementation is based on bfbs_gen_lua.h|cpp and relies only on reflection to retrieve all the necessary data during code gen (no runtime reflection used by the generated code). One thing I attempted to do is put all of the actual lua logic into supporting files in the wireshark/ directory - it was much easier to iterate on the code in this way. Almost all generated code boils down to "call this generic function with these specific parameters".

The reason why I built this wholly separate from the existing lua generator, is because the existing generator hides a lot of the metadata behind hard-coded locals, and I think the output of this PR could be used as a great hands-on teaching tool for flatbuffers internals for those of us curious (or masochistic) enough to dive into the details.

File naming convention is a little weird for generated/supporting files as Wireshark itself has some odd file loading quirks (see wireshark/README.md

Output currently has two modes, regular and verbose. Regular is intended to just shows you the data you want to see:

image

Verbose is intended to tell you what every single byte (except for padding) in a buffer is used for:

image

*(both of these screenshots are of auto generated data full of gibberish - no, you aren't supposed to be seeing "real" strings)

This PR is still a WIP and I intend to edit this description down to just the relevant information once I handle the last few features/issues. I wanted to get this up now to collect feedback and get some questions answered.

Still TODO:

  • Generate standalone dissector objects for root types (so you can point Wireshark at your proto and dissect without writing any lua, if your packet is pure flatbuffer data)
  • Support multiple root_type objects in the plugins directory
    • Support two different root types including the same shared type
  • Support vector of union
    • figured out my bfbs parsing error
    • need to figure out how to call nested functions appropriately - may need a new generic handler function
  • Support displaying non empty default strings and default vectors
  • Support proper display of bit_flags
    • bfbs-builtins now required
    • need to generate a fresh set of ProtoField objects per bit_flags object
    • need to decide if/how we can/should support multi-bit flags (MSB or LSB, width attribute)
  • support automatically parsing nested_flatbuffer
    • bfbs-builtins now required
    • just need to parse the attributes and add them to the dissection
  • Verify/fix windows paths
  • Support Wireshark 3 - requires require statements
  • add new attribute display which will take hex, binary, octal, decimal and configure Wireshark's output format for that scalar type
  • Add a healthy dose of comments
  • Squash everything into a single commit
  • Handle (relevant) attributes (if possible?)
  • Testing? Unsure what automated testing would look like for this feature. I have worked to manually verify every feature I could. I'm sure I've missed plenty.
  • Documentation

Open Questions

  1. Is there a canonical way to handle paths between operating systems in flatbuffers? I am trying to use full paths as namespacing for generated files I need to figure out how to properly handle complex includes. I might have missed something simple here
    1. there seem to be some helper functions in util.h but unsure if those are fair game in a bfbs generator (there was a comment in lua about not using any flatc internals)
  2. Am I reinventing the wheel anywhere? Am I doing anything bone headed that is super trivial to do with tools I missed?
  3. What would be required documentation to add to get this feature merged?
  4. Does anyone have any better ideas for file naming conventions? I went with a dumb, straight forward approach, but it doesn't exactly jibe with the rest of the repository.

Possible Future Work

  1. built-in gRPC support. This might be as simple as either doing nothing or adding some simple extra code to the generated root types, but I'm not familiar enough with gRPC to integrate this right now. I think nothing stops one from writing a custom lua dissector that calls both the gRPC built in dissector then one of these generated ones.
  2. flexbuffer support. No clue what this would look like, as I have no experience with flexbuffers, and would also likely need some attribute visibility modifications to be able to act on them.
  3. The current architecture does not support filtering vectors/arrays of scalars in Wireshark's filter bar as these do not have their own specific protocol field types. (i.e. you can't say "show me packets which have a value of 5 in this array"). This may be a minor update I can tackle in the initial release.

Copy link

google-cla bot commented Apr 10, 2025

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@github-actions github-actions bot added c++ CI Continuous Integration codegen Involving generating code from schema documentation Documentation lua labels Apr 10, 2025
@thejtshow
Copy link
Contributor Author

Found a small bug when the payload doesn't start at the beginning of the buffer, and another small issue with parse_struct. Additionally, loading is a little weird on Wireshark < 4.0. Will address these later today.

@thejtshow
Copy link
Contributor Author

Currently tracking a limitation where you can only have one dissector present in the plugins directory at a time. Wireshark doesn't like you sharing common ProtoFields apparently.

@thejtshow
Copy link
Contributor Author

hey @dbaileychess or @aardappel -- would either of you be willing to provide some insight on the open questions on my PR? :) I would really appreciate the feedback as this addition has been very useful and I would like to continue improving it.

@aardappel
Copy link
Collaborator

I have no experience with Wireshark, but this seems cool functionality to have.

The reason why I built this wholly separate from the existing lua generator, is because the existing generator hides a lot of the metadata behind hard-coded locals, and I think the output of this PR could be used as a great hands-on teaching tool for flatbuffers internals for those of us curious (or masochistic) enough to dive into the details.

I think that is a high cost decision, both in terms of having a lot of duplicated/unshared code with the main Lua implementation, and also that using reflection (esp in a language like Lua) instead of hardcoded accessors is going to make things reaaally slow.

That to me would seem to outweight the benefits.

We generally are wary of adding a lot of new code, especially duplicated code, to the project because at some point someone who is not the original contributor will end up paying for the maintenance cost. Now Lua changes likely have to be made in 2 places. For example, at some point we ditched our JavaScript generator in favor of compiling thru the Typescript generator because of maintenance cost.

As such, all of this would be a lot more palatable if it was instead an option to the Lua generator that keeps as much shared as possible.

Is there a canonical way to handle paths between operating systems in flatbuffers? I am trying to use full paths as namespacing for generated files. I might have missed something simple here

I think there are some functions in util.cpp/.h that we use for this purpose thruout the codebase.

Am I reinventing the wheel anywhere? Am I doing anything bone headed that is super trivial to do with tools I missed?

Again not familiar with Wireshark and how this would be used, but see the above.

What would be required documentation to add to get this feature merged?

A similar single .md files to other implementations would be sufficient. Better yet, a section in the Lua docs as per the above.

Does anyone have any better ideas for file naming conventions? I went with a dumb, straight forward approach, but it doesn't exactly jibe with the rest of the repository.

Not sure :)

@thejtshow
Copy link
Contributor Author

@aardappel thank you for the thorough response!

I think that is a high cost decision, both in terms of having a lot of duplicated/unshared code with the main Lua implementation, and also that using reflection (esp in a language like Lua) instead of hardcoded accessors is going to make things reaaally slow.

I certainly agree that this feature adds a lot of cost -- before I continue I should clarify -- when I mentioned relies only on reflection to retrieve all the necessary data I was referring to the generator itself -- I implemented my generator as a bfbs gen instead of an idl gen. The generated code is much like the original Lua implementation where offsets are hard coded and the code is quite performant. I do not perform any runtime reflection for parsing buffers. I will update the PR description to be more clear on this point.

I would love to use or extend the current lua generator to accomplish this task.. unfortunately I think that wireshark's.... "creative" lua integrations make this not feasible (especially for the level of information I want to expose to wireshark).

I realized (belatedly) that this PR implements something much closer to the flatc --annotate function than actual code gen for creation and parsing, if that helps thinking about this PR. :)

@aardappel
Copy link
Collaborator

Well, it would be feasible to integrate the two, the question is if that would be better or worse. If the integrated version is just a ton of if-then without much shared code, then indeed a separate implementation like your current one is better.

That is hard for me to judge though.. I'd like some other opinions on this PR. Would be great to hear from @dbaileychess and others with Lua or codegen experience.

@thejtshow thejtshow force-pushed the feature/wireshark branch 2 times, most recently from e6e121a to 443b593 Compare August 14, 2025 01:40
…y working copy of 00_wireshark.lua back into the repo folder
1. if offset is nonzero, the parse function failed
2. struct and table name lookup now done by member_list
fix root type arguments
fix union dependency collection
start on fixing bit flags now that bfbs-builtins are required.
@thejtshow thejtshow marked this pull request as draft August 29, 2025 12:51
@thejtshow thejtshow changed the title [WIP] Feature: Wireshark Dissector Generator Feature: Wireshark Dissector Generator Aug 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ CI Continuous Integration codegen Involving generating code from schema documentation Documentation lua
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants