-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trigger latencies + monitoring (v5) #342
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the point of view of what information is published, this seems ok.
The usage of the locking is doubtful at the moment.
include/trigger/Latency.hpp
Outdated
std::atomic<uint64_t> m_latency_in; // Member variable to store latency_in | ||
std::atomic<uint64_t> m_latency_out; // Member variable to store latency_out | ||
std::atomic<double> m_clock_ticks_conversion; // Dynamically adjusted conversion factor for clock ticks | ||
mutable std::mutex m_mutex; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class implementation is a bit weird from the technical point of view.
The mutex requirement is unclear because you already have atomic counters.
If you need we can discuss it.
|
||
// Message for MLT TD requests latency vars | ||
// Latency represents the difference between current system (clock) time and the requested TD readout window (start/end) | ||
// Units are ms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recomment to publish us
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks very good! There are only 2 serious comments:
do_configure
is deprecated in v5, and we need to useinit
instead (orconf
, but only in non-DAQModule
classes).- There's a logic issue with when we're loading the "current system time". We should do it at latency measurement time, like we did in v4, not at sending-to-opmon time. This will add unnecessary latency, especially in slow-flow system (e.g. 1hz rates)
plugins/CustomTCMaker.cpp
Outdated
@@ -107,6 +107,15 @@ CustomTCMaker::generate_opmon_data() | |||
info.set_tc_failed_sent_count( m_tc_failed_sent_count.load() ); | |||
|
|||
this->publish(std::move(info)); | |||
|
|||
if ( m_running_flag.load() && m_latency_monitoring.load() ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if ( m_running_flag.load() && m_latency_monitoring.load() ) { | |
if ( m_latency_monitoring.load() && m_running_flag.load() ) { |
Not an important comment, I'm only writing this because I've learned something new a few weeks back and want to share :) Unnecessary micro-optimisation, but in C++ order inside if statement can matter of operations like &&
, ||
(not true for all operators).
So in your case, whilst we're running, we will always load 2 booleans, even if we don't want monitoring. With the suggestion above, if monitoring is off, we will only load 1 boolean. Matters more task-heavy workflows so probably not here, but maybe in e.g. TPDataProcessor
where every little counts.
plugins/CustomTCMaker.cpp
Outdated
|
||
m_latency_monitoring.store( m_conf->get_latency_monitoring_conf()->get_enable_latency_monitoring() ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do_configure
is deprecated in v5.
All the config should be inside of void init(std::shared_ptr<appfwk::ModuleConfiguration> mcfg);
Technically there's also a new conf
, but only in non-DAQModule
classes I think.
plugins/RandomTCMakerModule.cpp
Outdated
} | ||
|
||
void | ||
RandomTCMakerModule::do_configure(const nlohmann::json& /*obj*/) | ||
{ | ||
//m_conf = obj.get<randomtriggercandidatemaker::Conf>(); | ||
m_latency_monitoring.store( m_conf->get_latency_monitoring_conf()->get_enable_latency_monitoring() ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto re. do_configure
being deprecated in v5, should move to init.
Thanks @ArturSztuc and @mroda88 for insights. Here is v2, changes:
In appmodel:
This was tested:
Some issues and potential improvements are mentioned #347, for future consideration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
The only further comment I have, and something I didn't notice earlier, is that the most of the StandaloneTCMakers
don't need both latency in and latency out. They create a new object and then send it straight away, so the time difference between these two is in nanoseconds. Latency out at sending would be enough (only for Random & Custom TC maker, HSI of course does receive an input and processes it).
I don't mind if that's in this PR or a separate one.
m_tc_made_count++; | ||
|
||
TLOG_DEBUG(1) << get_name() << " at timestamp " << m_timestamp_estimator->get_timestamp_estimate() | ||
<< ", pushing a candidate with timestamp " << candidate.time_candidate; | ||
|
||
if (m_latency_monitoring.load()) m_latency_instance.update_latency_out( candidate.time_candidate ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need both latency in and latency in and out here? The time difference here is literally the time taken to do m_tc_made_count++
, which will be in nanoseconds. Out latency would be more than enough.
I think in & out makes sense if we have some input data, a processing stage, and output data. In the standalone TC makers we just have the output data
m_tc_made_count++; | ||
|
||
TLOG_DEBUG(1) << get_name() << " at timestamp " << m_timestamp_estimator->get_timestamp_estimate() | ||
<< ", pushing a candidate with timestamp " << candidate.time_candidate; | ||
|
||
if (m_latency_monitoring.load()) m_latency_instance.update_latency_out( candidate.time_candidate ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto here, we only need one latency in standalone makers, having in and out doesn't make sense, and will give identical numbers (unless we switch to nanoseconds or picoseconds.
Thanks Artur, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for all the work!
This PR implements monitoring of latencies across trigger.
Requires appmodel PR: DUNE-DAQ/appmodel#126.
Changes:
For appmodel changes:
other small changes: