Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initialisation timings seem finicky. #199

Closed
Y-Less opened this issue Mar 3, 2023 · 9 comments
Closed

Initialisation timings seem finicky. #199

Y-Less opened this issue Mar 3, 2023 · 9 comments
Labels
investigating Looking into this issue / need more info

Comments

@Y-Less
Copy link

Y-Less commented Mar 3, 2023

Describe the bug

I've written two programs, one (which I've dubbed "server") setting up an InternalControlFunction for isobus::NAME::Function::Engine and the other ("client") setting up a PartneredControlFunction for same. If I add a large wait to the code of the client and start the server within that wait things seem to pick up correctly. But almost any other variation fails to detect something, i.e. the partner function's address and port are under.

I'm really not sure what's going on, and nailing down the timings in the two projects to see what initialisation order works is not easy. I know this is an extremely vague report with no code, I'll have to put some more work in to get those narrowed down.

Basically I can detect and use a remote function call (just detecting the partner control function) only in extremely exact circumstances. The standard setup would not allow this at all, one or both of the nodes will be on and off at wildly different times.

Before I do more digging to try and narrow down an MCVE I was just wondering if you had seen something similar already. If it helps, this loop is never entered because the list is empty:

						for (auto currentActiveControlFunction = activeControlFunctions.begin(); currentActiveControlFunction != activeControlFunctions.end(); currentActiveControlFunction++)

Then:

					partner->initialized = true;

Is marked as true anyway.

Supporting Documentation
If your issue is CAN behavior related, it may be helpful to have a CAN trace of your issue that clearly shows the undesirable behavior. Please attach one if you think it might help, or if requested. Try to capture address claiming in addition to the bad behavior if possible. We prefer files captured with candump -l or .asc ascii format log files. Binary Vector files .blf will not be accepted.

Environment (please complete the following information):

  • OS: [e.g. Ubuntu 20.04]
  • Compiler [e.g. GCC 9.4.0]
  • CAN Driver [e.g. Socket CAN]

Additional context
Add any other context about the problem here that you think might help us recreate or otherwise address your issue if applicable.

@Y-Less Y-Less added the investigating Looking into this issue / need more info label Mar 3, 2023
@ad3154
Copy link
Member

ad3154 commented Mar 3, 2023

The behavior you describe with the active control function list should be fine, because if it has not address claimed at that time (or at least the network manager hasn't processed it's claim yet) then it will not be added to the active list until it claims an address. So that at least seems normal.

Some things to keep in mind as you dig deeper:

  • Address claiming takes time and happens in the background, which requires some amount of waiting that cannot be abstracted away. In reality it can take over 1.5 seconds (worst case) to complete. So before you can talk from an internal control function, it helps to check get_address_valid and likewise check the partner's get_address_valid to know if you can even talk to them. If the addresses never become valid, then there is almost certainly a problem communicating with the bus.
  • There could be an issue in the CAN hardware plugin you are using if the two programs are trying to share that piece of hardware, or whatever is the back end for that if you are using that new one in NTCAN hardware layer #198

@Y-Less
Copy link
Author

Y-Less commented Mar 4, 2023

So the obvious conclusion that the new hardware layer is at fault is fair, but because it sometimes claims and sometimes doesn't I don't think so - messages are exchanged and there's not much else to that part.

So to further demonstrate I built two tests based on one of the unit tests. This is the basic version, copied from the original unit test but with some parts renamed for clarity compared to the second test. I've added massive delays everywhere to give the propagation as much chance as possible. I also switched to using the NTCAN virtual network (42) rather than the in-built virtual network and the code still works, just to further narrow down causes. So this code works:

#include <Windows.h>

#include "isobus/hardware_integration/can_hardware_interface.hpp"
#include "isobus/hardware_integration/virtual_can_plugin.hpp"
#include "isobus/hardware_integration/ntcan_fifo_plugin.hpp"
#include "isobus/isobus/can_NAME_filter.hpp"
#include "isobus/isobus/can_constants.hpp"
#include "isobus/isobus/can_internal_control_function.hpp"
#include "isobus/isobus/can_network_manager.hpp"
#include "isobus/isobus/can_partnered_control_function.hpp"

#include <chrono>
#include <thread>
#include <iostream>

using namespace isobus;

#define STR_(t) (#t)
#define STR(t) STR_(t)
#define EXPECT_TRUE(t) do { if (t) std::cout << "pass" << std::endl; else std::cout << "test failed: " << STR(t) << std::endl; } while (0)

int main()
{
	std::shared_ptr<NTCANFIFOPlugin> server = std::make_shared<NTCANFIFOPlugin>(42);
	std::shared_ptr<NTCANFIFOPlugin> client = std::make_shared<NTCANFIFOPlugin>(42);
	CANHardwareInterface::set_number_of_can_channels(2);
	CANHardwareInterface::assign_can_channel_frame_handler(0, server);
	CANHardwareInterface::assign_can_channel_frame_handler(1, client);
	CANHardwareInterface::start();

	CANHardwareInterface::add_can_lib_update_callback(
		[](void*) {
			CANNetworkManager::CANNetwork.update();
		},
		nullptr);
	CANHardwareInterface::add_raw_can_message_rx_callback(
		[](HardwareInterfaceCANFrame& rxFrame, void* parentPointer) {
			CANNetworkManager::CANNetwork.can_lib_process_rx_message(rxFrame, parentPointer);
		},
		nullptr);

	std::this_thread::sleep_for(std::chrono::milliseconds(5000));

	NAME firstName(0);
	firstName.set_arbitrary_address_capable(true);
	firstName.set_industry_group(1);
	firstName.set_device_class(0);
	firstName.set_function_code(static_cast<std::uint8_t>(isobus::NAME::Function::CabClimateControl));
	firstName.set_identity_number(1);
	firstName.set_ecu_instance(0);
	firstName.set_function_instance(0);
	firstName.set_device_class_instance(0);
	firstName.set_manufacturer_code(69);
	InternalControlFunction icf(firstName, 0x1C, 0);

	//std::cout << "Start the client now." << std::endl;
	//std::this_thread::sleep_for(std::chrono::milliseconds(15000));
	//std::cout << "Start the server now." << std::endl;
	
	std::this_thread::sleep_for(std::chrono::milliseconds(5000));

	const isobus::NAMEFilter filter(NAME::NAMEParameters::FunctionCode, static_cast<std::uint8_t>(NAME::Function::CabClimateControl));
	PartneredControlFunction pcf(1, { filter });

	std::this_thread::sleep_for(std::chrono::milliseconds(5000));

	EXPECT_TRUE(pcf.get_address_valid());

	CANHardwareInterface::stop();
}

Secondly, I split that code up in to two separate programs, one acting as the server (internal) and one as the client (partnered). Again with huge delays and some instructions on when to start the other program. if you start the client first and start the server when instructed everything works. If you do the reverse, it doesn't. And I've tried many other combinations of timings and start orders. Am I missing something obvious in this code, because I can't see why initialisation order matters so much.

Client:

#include <Windows.h>

#include "isobus/hardware_integration/can_hardware_interface.hpp"
#include "isobus/hardware_integration/ntcan_fifo_plugin.hpp"
#include "isobus/isobus/can_NAME_filter.hpp"
#include "isobus/isobus/can_constants.hpp"
#include "isobus/isobus/can_internal_control_function.hpp"
#include "isobus/isobus/can_network_manager.hpp"
#include "isobus/isobus/can_partnered_control_function.hpp"

#include <chrono>
#include <thread>
#include <iostream>

using namespace isobus;

#define STR_(t) (#t)
#define STR(t) STR_(t)
#define EXPECT_TRUE(t) do { if (t) std::cout << "pass" << std::endl; else std::cout << "test failed: " << STR(t) << std::endl; } while (0)

int main()
{
	std::shared_ptr<NTCANFIFOPlugin> client = std::make_shared<NTCANFIFOPlugin>(42);
	CANHardwareInterface::set_number_of_can_channels(1);
	CANHardwareInterface::assign_can_channel_frame_handler(0, client);
	CANHardwareInterface::start();

	CANHardwareInterface::add_can_lib_update_callback(
		[](void*) {
			CANNetworkManager::CANNetwork.update();
		},
		nullptr);
	CANHardwareInterface::add_raw_can_message_rx_callback(
		[](HardwareInterfaceCANFrame& rxFrame, void* parentPointer) {
			CANNetworkManager::CANNetwork.can_lib_process_rx_message(rxFrame, parentPointer);
		},
		nullptr);

	std::this_thread::sleep_for(std::chrono::milliseconds(5000));

	std::cout << "Start the server now." << std::endl;
	std::this_thread::sleep_for(std::chrono::milliseconds(5000));

	const isobus::NAMEFilter filter(NAME::NAMEParameters::FunctionCode, static_cast<std::uint8_t>(NAME::Function::CabClimateControl));
	PartneredControlFunction pcf(0, { filter });

	std::this_thread::sleep_for(std::chrono::milliseconds(5000));

	EXPECT_TRUE(pcf.get_address_valid());

	CANHardwareInterface::stop();
}

Server:

#include <Windows.h>

#include "isobus/hardware_integration/can_hardware_interface.hpp"
#include "isobus/hardware_integration/ntcan_fifo_plugin.hpp"
#include "isobus/isobus/can_NAME_filter.hpp"
#include "isobus/isobus/can_constants.hpp"
#include "isobus/isobus/can_internal_control_function.hpp"
#include "isobus/isobus/can_network_manager.hpp"
#include "isobus/isobus/can_partnered_control_function.hpp"

#include <chrono>
#include <thread>
#include <iostream>

using namespace isobus;

#define STR_(t) (#t)
#define STR(t) STR_(t)
#define EXPECT_TRUE(t) do { if (t) std::cout << "pass" << std::endl; else std::cout << "test failed: " << STR(t) << std::endl; } while (0)

int main()
{
	std::shared_ptr<NTCANFIFOPlugin> server = std::make_shared<NTCANFIFOPlugin>(42);
	CANHardwareInterface::set_number_of_can_channels(1);
	CANHardwareInterface::assign_can_channel_frame_handler(0, server);
	CANHardwareInterface::start();

	CANHardwareInterface::add_can_lib_update_callback(
		[](void*) {
			CANNetworkManager::CANNetwork.update();
		},
		nullptr);
	CANHardwareInterface::add_raw_can_message_rx_callback(
		[](HardwareInterfaceCANFrame& rxFrame, void* parentPointer) {
			CANNetworkManager::CANNetwork.can_lib_process_rx_message(rxFrame, parentPointer);
		},
		nullptr);

	std::this_thread::sleep_for(std::chrono::milliseconds(5000));

	NAME firstName(0);
	firstName.set_arbitrary_address_capable(true);
	firstName.set_industry_group(1);
	firstName.set_device_class(0);
	firstName.set_function_code(static_cast<std::uint8_t>(isobus::NAME::Function::CabClimateControl));
	firstName.set_identity_number(1);
	firstName.set_ecu_instance(0);
	firstName.set_function_instance(0);
	firstName.set_device_class_instance(0);
	firstName.set_manufacturer_code(69);
	InternalControlFunction icf(firstName, 0x1C, 0);

	std::cout << "Start the client now." << std::endl;
	std::this_thread::sleep_for(std::chrono::milliseconds(20000));

	CANHardwareInterface::stop();
}

@Y-Less
Copy link
Author

Y-Less commented Mar 4, 2023

Note that the local version also works fine with VirtualCANPlugin, and I'd be interested to see if the separated version works with another network that can connect processes (the whole reason I wrote the NTCAN backend in the first place was because I couldn't see another one that did that virtually on one PC).

@ad3154
Copy link
Member

ad3154 commented Mar 4, 2023

Thanks for putting in all this work to help narrow it down, I appreciate it.

I can definitely tell you why it works one way and not the other after reading this!
The issue is, because only the server has an ICF, it's the only one that address claims (and the only one that requests the address claim PGN). The client is completely silent, in a sniffing only mode. So if you start up the client after the server's address claim, you will never see anything.

This is normal behavior I'd say if you have one sniffer and you miss the messages you are interested in. If you want to ensure they see each other, you will need an ICF in the client to communicate from with a unique NAME. That way, it will request the address claim PGN and always get a copy of the address table.

@ad3154
Copy link
Member

ad3154 commented Mar 4, 2023

As you can see, here is the entirety of the CAN communication from your example:
image
The first message is the server's request for address claim, and the second message is the server's claim itself. So the timing would really be important if the client is just sniffing the bus.

@GwnDaan
Copy link
Member

GwnDaan commented Mar 5, 2023

Hi @Y-Less, like @ad3154 says, I feel like this currently is the expected behaviour of the stack. Though a concrete solution in your example could be; claim an address with InternalControlFunction with your client as well, then the server will announce it's address again. Therefore your client also knows it's available to partner. IIRC.

Also we could consider making it so defining a ParneredControlFunction also sends the message to "request" the addresses of all the devices on the bus? Like the ICF does, but then don't proceed to the actual claim itself. I don't know if that's against the ISO standard? And if it even would work haha? @ad3154

@ad3154
Copy link
Member

ad3154 commented Mar 5, 2023

I would prefer to avoid adding transmits with no ICF if possible, as it's not really in the spirit of J1939 nor ISOBUS, but I wouldn't be totally against adding a manual request for address claim to the network manager. There are some niche times where it might be useful I'd imagine, though we should document it well and add a logger output that says "hey maybe don't do this" haha.

Generally if you want to participate on the bus in any way, you should have an address, even if you never do anything with it. Like, claiming a name as a "on-board diagnostic data logger" (function 130 for non-specific device class 0) if you're just a sniffer is more correct probably than truly not participating on the bus.

@GwnDaan
Copy link
Member

GwnDaan commented Mar 8, 2023

Ahh, I see that makes sense. Let's not transmit anything when no ICF are present yea.

@Y-Less
Copy link
Author

Y-Less commented Mar 10, 2023

Just saying that this was indeed the problem, so thank you.

@Y-Less Y-Less closed this as completed Mar 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
investigating Looking into this issue / need more info
Projects
None yet
Development

No branches or pull requests

3 participants