Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed DDOP upload with GS4 #465

Open
trm01 opened this issue Apr 21, 2024 · 10 comments
Open

Failed DDOP upload with GS4 #465

trm01 opened this issue Apr 21, 2024 · 10 comments
Labels
investigating Looking into this issue / need more info

Comments

@trm01
Copy link

trm01 commented Apr 21, 2024

Running the TaskControl on the GS4 on main branch 2c58692e fails to complete the DDOP upload.

I was able to work around the issue by changing the following:

--- a/isobus/include/isobus/isobus/can_network_configuration.hpp
+++ b/isobus/include/isobus/isobus/can_network_configuration.hpp
@@ -90,8 +90,8 @@ namespace isobus
std::uint32_t maxNumberTransportProtocolSessions = 4; ///< The max number of TP sessions allowed
std::uint32_t minimumTimeBetweenTransportProtocolBAMFrames = DEFAULT_BAM_PACKET_DELAY_TIME_MS; ///< The configurable time between BAM frames
std::uint8_t networkManagerMaxFramesToSendPerUpdate = 0xFF; ///< Used to control the max number of transport layer frames added to the driver queue per network manager update
-               std::uint8_t numberOfPacketsPerDPOMessage = 16; ///< The number of packets per DPO message for ETP sessions
-               std::uint8_t numberOfPacketsPerCTSMessage = 16; ///< The number of packets per CTS message for TP sessions
+               std::uint8_t numberOfPacketsPerDPOMessage = 255; ///< The number of packets per DPO message for ETP sessions
+               std::uint8_t numberOfPacketsPerCTSMessage = 255; ///< The number of packets per CTS message for TP sessions
};
} // namespace isobus

Environment

I've reproduced it with two Ubuntu systems, on my desktop and on a embedded ARM64 ECU.
The upload completes when attached to a TopCon controller, but not the Deere. I also messed around with the examples and can get it working if I shut off the VT upload (so only the TC DDOP).

  • OS: Ubuntu 20.04
  • Compiler: GCC 11.4.0
  • CAN Driver: Socket CAN

Additional context

FailedTcDdopUpload.log

@trm01 trm01 added the investigating Looking into this issue / need more info label Apr 21, 2024
@trm01
Copy link
Author

trm01 commented Apr 21, 2024

I forgot the log output:

[Debug][NM]: A control function claimed address 28 on channel 0
[Debug][NM]: A control function claimed address 240 on channel 0
[Debug][NM]: A control function claimed address 251 on channel 0
[Debug][NM]: A control function claimed address 237 on channel 0
[Debug][NM]: A control function claimed address 38 on channel 0
[Info][NM]: Partnered control function with name a0008200042d289d has claimed address 247 on channel 0.
[Debug][NM]: A control function claimed address 210 on channel 0
[Debug][NM]: A control function claimed address 42 on channel 0
[Debug][NM]: A control function claimed address 238 on channel 0
[Debug][NM]: A control function claimed address 248 on channel 0
[Debug][AC]: Internal control function a00c8000afe00002 could not use the preferred address, but has claimed address 128 on channel 0
[Debug][TC]: Startup delay complete, waiting for TC server status message.
[Debug][TC]: TC Server supports version 3 with 5 booms, 255 sections, and 5 position based control channels.
[Warn][TC]: Timeout waiting for version request from TC. This is not required, so proceeding anways.
[Warn][TC]: The TC is < version 4 but no VT was provided. Language data will be requested globally, which might not be ideal.
[Debug][VT/TC]: Language and unit data received from control function 240 language is: en
[Debug][VT/TC]: Language and unit data received from control function 38 language is: en
[Info][TC]: DDOP will be generated using the server's version instead of the specified version. New version: 3
[Debug][TC]: DDOP Generated, size: 2328
[Debug][TC]: Server indicates there may be enough memory available.
[Debug][ETP]: New tx session for 0x0CB00. Source: 128, destination: 247
[Debug][TP]: Received Request To Send (RTS) with a CTS packet count of 255, which is greater than the configured maximum of 16, using the configured maximum instead.
[Debug][TP]: Received Request To Send (RTS) with a CTS packet count of 255, which is greater than the configured maximum of 16, using the configured maximum instead.
[Warn][TC]: Recieved unexpected object pool transfer response
[Debug][TP]: Received Request To Send (RTS) with a CTS packet count of 255, which is greater than the configured maximum of 16, using the configured maximum instead.
[Debug][TP]: Received Request To Send (RTS) with a CTS packet count of 255, which is greater than the configured maximum of 16, using the configured maximum instead.
[Debug][TP]: Received Request To Send (RTS) with a CTS packet count of 255, which is greater than the configured maximum of 16, using the configured maximum instead.
[Debug][TP]: Received Request To Send (RTS) with a CTS packet count of 253, which is greater than the configured maximum of 16, using the configured maximum instead.
[Debug][TP]: Received Request To Send (RTS) with a CTS packet count of 237, which is greater than the configured maximum of 16, using the configured maximum instead.
[Debug][TP]: Received Request To Send (RTS) with a CTS packet count of 221, which is greater than the configured maximum of 16, using the configured maximum instead.
[Debug][TP]: Received Request To Send (RTS) with a CTS packet count of 205, which is greater than the configured maximum of 16, using the configured maximum instead.
[Debug][TP]: Received Request To Send (RTS) with a CTS packet count of 189, which is greater than the configured maximum of 16, using the configured maximum instead.
[Debug][TP]: Received Request To Send (RTS) with a CTS packet count of 173, which is greater than the configured maximum of 16, using the configured maximum instead.
[Debug][TP]: Received Request To Send (RTS) with a CTS packet count of 157, which is greater than the configured maximum of 16, using the configured maximum instead.
[Debug][TP]: Received Request To Send (RTS) with a CTS packet count of 141, which is greater than the configured maximum of 16, using the configured maximum instead.
[Debug][TP]: Received Request To Send (RTS) with a CTS packet count of 125, which is greater than the configured maximum of 16, using the configured maximum instead.
[Debug][TP]: Received Request To Send (RTS) with a CTS packet count of 109, which is greater than the configured maximum of 16, using the configured maximum instead.
[Debug][TP]: Received Request To Send (RTS) with a CTS packet count of 93, which is greater than the configured maximum of 16, using the configured maximum instead.
[Debug][TP]: Received Request To Send (RTS) with a CTS packet count of 77, which is greater than the configured maximum of 16, using the configured maximum instead.
[Debug][TP]: Received Request To Send (RTS) with a CTS packet count of 61, which is greater than the configured maximum of 16, using the configured maximum instead.
[Debug][TP]: Received Request To Send (RTS) with a CTS packet count of 45, which is greater than the configured maximum of 16, using the configured maximum instead.
[Debug][TP]: Received Request To Send (RTS) with a CTS packet count of 29, which is greater than the configured maximum of 16, using the configured maximum instead.
[Debug][ETP]: Session Closed
[Debug][ETP]: Completed tx session for 0x0CB00 from 247
[Error][TC]: Timeout waiting for object pool transfer response. Resetting client connection.
[Debug][TC]: Startup delay complete, waiting for TC server status message.
[Debug][TC]: TC Server supports version 3 with 5 booms, 255 sections, and 5 position based control channels.
[Warn][TC]: Timeout waiting for version request from TC. This is not required, so proceeding anways.
[Warn][TC]: The TC is < version 4 but no VT was provided. Language data will be requested globally, which might not be ideal.
[Debug][VT/TC]: Language and unit data received from control function 240 language is: en
[Debug][VT/TC]: Language and unit data received from control function 38 language is: en
[Debug][TC]: Using previously generated DDOP binary
[Debug][TC]: Server indicates there may be enough memory available.
[Debug][ETP]: New tx session for 0x0CB00. Source: 128, destination: 247
[Debug][TP]: Received Request To Send (RTS) with a CTS packet count of 255, which is greater than the configured maximum of 16, using the configured maximum instead.
[Debug][TP]: Received Request To Send (RTS) with a CTS packet count of 255, which is greater than the configured maximum of 16, using the configured maximum instead.

@trm01
Copy link
Author

trm01 commented Apr 21, 2024

I just came to another realization:
The workaround in in the diff above, only seems to fix the issue on once I roll back to 89b8e

@ad3154
Copy link
Member

ad3154 commented Apr 22, 2024

Interesting... sending a clear to send with more frames than our request is unusual behavior on John Deere's side. And, getting an object pool transfer response so early is strange when I don't see any abort messages in the log....

Could we also ask for a can trace? That would help us quite a bit in this case. On Ubuntu, it'd be candump -l <name of can interface> to start a log, then control + C to stop collecting the log. Uploading this would be very appreciated.

@trm01
Copy link
Author

trm01 commented Apr 22, 2024

Interesting... sending a clear to send with more frames than our request is unusual behavior on John Deere's side. And, getting an object pool transfer response so early is strange when I don't see any abort messages in the log....

Could we also ask for a can trace? That would help us quite a bit in this case. On Ubuntu, it'd be candump -l <name of can interface> to start a log, then control + C to stop collecting the log. Uploading this would be very appreciated.

I uploaded the candump -l can0 in the original message:
candump-2024-04-21_195501.log

I've been trying to gather more information on potentially relate issue that is causing a failed upload. There is a irect correlation to adding these specific devices to the bus and preventing the upload from completing. My best guess right now is that they are broadcasting an address claim PGN on the bus every 5 seconds and that is causing all the nodes to reply. I end up seeing several of these from the example code:

[Debug][NM]: External CF 'a0022008042d289d' is now active at address '42' on channel '0'.
... then shortly after:
[Info][NM]: Control function with address 42 and NAME a0022008042d289d is now offline on channel 0.

@ad3154
Copy link
Member

ad3154 commented Apr 22, 2024

Yeah, so, looking at the CAN trace, the Deere display isn't aborting the ETP session, and is sending an incorrect transfer response, which is what's causing the issue. I also don't see any obvious protocol exceptions like timeouts, though their device is being weird by clearing us to send more packets than requested - that is certainly unusual.

My best guess right now is that they are broadcasting an address claim PGN on the bus every 5 seconds

This log file at least doesn't have that going on.

Regardless, I would definitely say this is a Deere software issue. Their device is behaving very strangely. Going by what the CAN trace says, which has to be the source of "who is wrong" here, it is pretty clear to me that they are doing bad things.

I will say, you can change those sizes you mentioned without changing the code instead!

See CANNetworkManager::CANNetwork.get_configuration() which allows you to get the configuration object for the stack. Then call set_number_of_packets_per_dpo_message and set_max_number_of_network_manager_protocol_frames_per_update and maybe set_number_of_packets_per_cts_message on that object.

Changing the max packet count seems like it would be an OK workaround if it helps.

If you have a CAN trace of this other situation with all the address claims, I can take a look at that as well, but I'm not really sure what else to say without a contact at Deere to have them investigate their bad behavior in this case.

@trm01
Copy link
Author

trm01 commented Apr 23, 2024

Ok, thanks for looking into it and the tip. I will look into changing that through the set_number_of_packets_per_dpo_message call instead. It's too bad (for me) that it is on the JD side. For now, I'll keep using the commit a few back, at least I have some success with it.

I had the device that was address claiming unplugged at the time of this upload, in effort to reduce the number variables at play.

I wonder if this could be is a CAN hardware/bus issue. I'll see if I can dig up some CAN logs from the JD side, at least that will at least confirm your findings.

I'll be away fro the tractor for a few days, but I will see about getting another capture this week.

I appreciate the help.

@trm01
Copy link
Author

trm01 commented Apr 28, 2024

Attached are some logs and candump of the demo working and not working when i add the ECU doing the address claiming.
There are can two logs in each test. One was from the one end of the bus, the other was from in the cab. From what I can tell they are identical, which has me throwing the termination idea out the door (for now). Also, there are other devices near the end of the bus that have a ISO VT, and the terminal is up and running on the GS4.

Any tips are greatly appreciated.

Thanks again!
taskControllerDemo-failedAddressClaims.zip

@trm01
Copy link
Author

trm01 commented Apr 28, 2024

Here are some additional crumbs, I was able to get it upload and work as expected by commenting out the following code:

diff --git a/isobus/src/can_network_manager.cpp b/isobus/src/can_network_manager.cpp
index 8dc5598..257514d 100644
--- a/isobus/src/can_network_manager.cpp
+++ b/isobus/src/can_network_manager.cpp
@@ -1042,7 +1042,7 @@ namespace isobus
{
CANMessage currentMessage = get_next_can_message_from_rx_queue();

-                       update_address_table(currentMessage);
+                       // update_address_table(currentMessage);
process_can_message_for_address_violations(currentMessage);

// Update Special Callbacks, like protocols and non-cf specific ones

I don't know enough about the stack to understand why, or even what consequences I'll see by leaving that code out. But hopefully it helps understand the issue I'm facing a bit more.

Thanks

@ad3154
Copy link
Member

ad3154 commented Jun 18, 2024

Update: Might be related to/same issue as #476, possible fix in #479

@ad3154
Copy link
Member

ad3154 commented Jun 18, 2024

@trm01 If you're still around, since #479 this might be fixed in the stack, could be worth re-testing with the frames per DPO set to 255 and your above modification removed, if you find time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
investigating Looking into this issue / need more info
Projects
None yet
Development

No branches or pull requests

2 participants