Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Packet builder #209

Draft
wants to merge 21 commits into
base: main
Choose a base branch
from
Draft

Packet builder #209

wants to merge 21 commits into from

Conversation

Y-Less
Copy link

@Y-Less Y-Less commented Mar 10, 2023

Another draft PR for comments, since it is a larger one. I've basically written a class to do all the maths and position tracking for building up packets with lots of compressed data in.

The current code has lots of instances of code to build packets like:

data[0] = EXTENDED_CONNECTION_ABORT_MULTIPLEXOR;
data[1] = static_cast<std::uint8_t>(reason);
data[2] = 0xFF;
data[3] = 0xFF;
data[4] = 0xFF;
data[5] = static_cast<std::uint8_t>(pgn & 0xFF);
data[6] = static_cast<std::uint8_t>((pgn >> 8) & 0xFF);
data[7] = static_cast<std::uint8_t>((pgn >> 16) & 0xFF);

Or deconstruct packets like:

auto &data = message->get_data();
parentInterface->languageCommandTimestamp_ms = SystemTiming::get_timestamp_ms();
parentInterface->languageCode.clear();
parentInterface->languageCode.push_back(static_cast<char>(data.at(0)));
parentInterface->languageCode.push_back(static_cast<char>(data.at(1)));
parentInterface->timeFormat = static_cast<TimeFormats>((data.at(2) >> 4) & 0x03);
parentInterface->decimalSymbol = static_cast<DecimalSymbols>((data.at(2) >> 6) & 0x03);
parentInterface->dateFormat = static_cast<DateFormats>(data.at(3));
parentInterface->massUnitSystem = static_cast<MassUnits>(data.at(4) & 0x03);
parentInterface->volumeUnitSystem = static_cast<VolumeUnits>((data.at(4) >> 2) & 0x03);
parentInterface->areaUnitSystem = static_cast<AreaUnits>((data.at(4) >> 4) & 0x03);
parentInterface->distanceUnitSystem = static_cast<DistanceUnits>((data.at(4) >> 6) & 0x03);
parentInterface->genericUnitSystem = static_cast<UnitSystem>(data.at(5) & 0x03);
parentInterface->forceUnitSystem = static_cast<ForceUnits>((data.at(5) >> 2) & 0x03);
parentInterface->pressureUnitSystem = static_cast<PressureUnits>((data.at(5) >> 4) & 0x03);
parentInterface->temperatureUnitSystem = static_cast<TemperatureUnits>((data.at(5) >> 6) & 0x03);

This is (IMHO) very complex to read and error-prone, though admittedly it is very explicit in what is happening and where things are being stored.

Anyway, those two examples rewritten with this PR would look something like:

isobus::ParameterGroupBuilder builder {};
builder.write<std::uint8_t>(EXTENDED_CONNECTION_ABORT_MULTIPLEXOR);
builder.write(static_cast<std::uint8_t>(reason));
builder.pad(24);
builder.write(pgn, 24);
isobus::ParameterGroupBuilder builder(message->get_data());
parentInterface->languageCommandTimestamp_ms = SystemTiming::get_timestamp_ms();
builder.read((char *)&parentInterface->languageCode, 16);
builder.skip(4);
builder.read(parentInterface->timeFormat, 2);
builder.read(parentInterface->decimalSymbol, 2);
builder.read(parentInterface->dateFormat);
builder.read(parentInterface->massUnitSystem, 2);
builder.read(parentInterface->volumeUnitSystem, 2);
builder.read(parentInterface->areaUnitSystem, 2);
builder.read(parentInterface->distanceUnitSystem, 2);
builder.read(parentInterface->genericUnitSystem, 2);
builder.read(parentInterface->forceUnitSystem, 2);
builder.read(parentInterface->pressureUnitSystem, 2);
builder.read(parentInterface->temperatureUnitSystem, 2);

Might just need a tiny bit of tweaking for types/widths on some of the variables, but otherwise that's complete.

Now I've already mixed up pad (for writing) and skip (for reading) in my own code, so it might make more sense to have the reading and writing capabilities split in to two different classes, but the core algorithms are there. Interesting, one thing I noticed is that the spec goes in to some detail about how odd-bit-width data spans byte boundaries, and this code correctly implements that, but then all the PGNs in the spec use padding bits to explicitly avoid those cases. And thus this code also checks for those common cases where data is byte-aligned or fully contained within a byte to better optimise for those cases.

@Y-Less Y-Less temporarily deployed to Integrate Pull Request March 10, 2023 09:35 — with GitHub Actions Inactive
@Y-Less
Copy link
Author

Y-Less commented Mar 10, 2023

I don't know where the best place for this is. I've not committed any tests (but did run some and am using it locally). This is mostly just the initial basic algorithm. And even within that I'm sure some parts could be compressed - there are a few bits of repeated code, especially in the sections handing large data spanning multiple bytes, but honestly I don't think rearranging the code is worth it to save a few lines at the expense of complexity.

@Y-Less
Copy link
Author

Y-Less commented Mar 10, 2023

Two big formatting things I've noticed that probably need changing - I've used #pragma once instead of include guards, and early returns, neither of which seem tot be used anywhere else in the codebase.

@ad3154
Copy link
Member

ad3154 commented Mar 12, 2023

I'll take a look at it and make some comments. Something like this has been requested before, so it could be helpful for people who do not want to do manual shifting and masking. The style is fairly off from the rest of the repo, so a general comment would be to try and match the rest of the repo's style if possible, I'll try and comment some examples.

To answer some of your questions up-front, for #pragma generally we lean on the Autosar rule:
image
Since it's compiler specific behavior and is not always even implemented, it should be avoided for a highly cross-platform library like ours.

As for multiple return paths, I like to lean on MISRA's rule:
image

This is a fairly controversial rule in C++ I will admit, as there's nothing "wrong" with early returns. Most of the benefit though is that code will always have a normal code flow top to bottom, there's only 1 place where you need to put a breakpoint to catch the return value of the function (makes debugging easier I would argue, especially on devices that have a maximum number of breakpoints (!!) like the stm32f4 which can only have something like 7 breakpoints), and lastly it helps ensure allocated resources get cleaned up, since you'd either need to copy the lines of code that do that for each return statement, or do it once at the bottom (though, a pure RAII approach mitigates this). The main downside to a single return path as I see it is that there is additional indenting in the code due to making those early returns into more highly nested conditionals, but I have rarely found that a compelling reason to lose the benefits while debugging. I don't think I'd prevent a merge due to an early return, but I don't prefer them, especially in a library marketed towards an industry that will skew towards that Autosar/Misra side of things.

Copy link
Member

@ad3154 ad3154 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made a first pass through it. Generally I think the library could use something that is similar to this, but it seems like a number of your code's assumptions have made their way into here, and I have concerns about it not working correctly depending on a machine's endianness. It also isn't quite generic enough to fit a library use-case, up to the ETP protocol size limit.

So, I guess where we go from there is kind of up to you. This kind of thing is an ideal candidate for heavy unit testing since it's easy to provide lots of stateless test cases, so if you want to keep working on it and improving it, and if you can write unit tests for it, I think we could revisit adding it in. Or, if you think it's meeting your needs in your software, you are of course welcome to keep using it in your fork.

It can be tough to convey emotion through a GitHub PR review, so I just wanted to say that I do appreciate all your PR submissions, including this one. I probably spent at least a good hour doing this review (it takes time to read the code to come up-to-speed on what it's doing), which I wouldn't do if I didn't think it was valuable.

@@ -0,0 +1,503 @@
#pragma once
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do add this file, it would need a header comment block like the rest of the files, and generally it would need doxygen style comments for all members, the file, and the class itself so that the doxygen generates with no warnings. If you have doxygen installed, you can check this by running doxygen doxyfile at the root of the repo and checking for errors in the output.

// I wrote a huge amount of this file once, then somehow it vanished. I don't know how!
class ParameterGroupBuilder
{
private:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer public stuff at the top. When someone comes to look at the class to learn the interface, they'll start at the top of the file, so making people scroll is just wasting their time.

{
private:
std::size_t writeOffset = 0;
std::size_t readOffset = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess technically the max message length is 117440505, so a std::uint32_t should be enough. I would suggest using std::uint32_t instead if possible, since size_t may be smaller than 32 bits on some (admittedly small) platforms.


bool write_bits(std::uint8_t const * data, std::size_t bits)
{
if (bits == 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, prefer the constant on the left of an ==. This is sort of a MISRA thing, where you reduce the likelihood of mis-typing = instead of ==, because = in that case would not compile instead of creating a runtime bug where = is always true.

}

public:
ParameterGroupBuilder() :
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can probably be = default

// boundaries, and how to align the bits when a data-type that is not a multiple of eight bits
// crosses said boundaries. Then every single parameter group uses padding to avoid those
// cases! The five-bit tractor command types have three bits of padding to make them exactly
// one byte. Why even specify the scheme if it isn't used?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They do a good job of padding, that is true, but I have often seen SPNs span bytes in an unaligned fashion in proprietary messaging, so it is more common than you might think

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more a rant than anything. the code does handle byte spans correctly (at least I believe it does).

template <typename T>
bool read(T & data)
{
return read_bits((std::uint8_t *)&data, sizeof (T) * 8);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Casting to uint8_t * from T is not endianness friendly, plus it's a C style cast which should not be used. This happens multiple places, and may require a fairly substantial refactor to fully resolve.

bool read<char *>(char * & data)
{
// Read until NULL.
return read<std::uint8_t*>((std::uint8_t * &)data);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like a dangerous assumption that a uint8_t * will have null termination here and other places. That's a non-trivial assumption that cannot be enforced. std::vector<std::uint8_t> is not a std::string()::c_str()

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is, and I know how that bad assumption sneaked in (I did a find/replace on unsigned char to uint8_t). TBH I was already unhappy with the way strings were handled in here, so I'll have to revisit that entire part.

while (bits)
{
// Move in the new data.
*data = (*data & (255 >> space)) | ((buffer[byte] >> input) << output);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

255 in a few places can be replaced with a constant in <limits>

@sonarcloud
Copy link

sonarcloud bot commented Mar 12, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@Y-Less
Copy link
Author

Y-Less commented Mar 13, 2023

Thanks for the comments. I realised it was very far from the rest of the code style, sorry about that. While I'm happy to work on the code style changes, are there any more generic comments about the general approach/API/etc?

It can be tough to convey emotion through a GitHub PR review, so I just wanted to say that I do appreciate all your PR submissions, including this one. I probably spent at least a good hour doing this review (it takes time to read the code to come up-to-speed on what it's doing), which I wouldn't do if I didn't think it was valuable.

And I appreciate your time and effort for reviews. Don't worry, I didn't infer any ill intent/bad emotions at all; I've been on your side of code reviews plenty of times before.

@ad3154
Copy link
Member

ad3154 commented Mar 14, 2023

are there any more generic comments about the general approach/API/etc?

Overall I think the API is pretty intuitive, you build up a packet sequentially and likewise disassemble it sequentially, and it hides the masking and shifting well. "pad"and "skip" and even "read" and "write" might be not the most clear on what's going on but the core of it is all there for sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants