Skip to content

Binary output format#7

Draft
Pavel-Sedlacek wants to merge 10 commits intoskoudmar:graphsfrom
Pavel-Sedlacek:main
Draft

Binary output format#7
Pavel-Sedlacek wants to merge 10 commits intoskoudmar:graphsfrom
Pavel-Sedlacek:main

Conversation

@Pavel-Sedlacek
Copy link
Contributor

@Pavel-Sedlacek Pavel-Sedlacek commented Feb 21, 2026

The primary use case of the tool is for the data analysed to be used later on in some other tool (or this one in future). Loading and parsing an entire .json file to render a chart for a single entry is wasteful and takes too long.

This PR implements a new binary format output. Data from the analyses is written to an SQLite database. As of this commit, all analyses reside in the same table and have the format (name, binary_data_blob). The binary blob is generated as the serde structure serialised through postcard (a compact and fast binary format).

Pavel Sedlacek added 2 commits February 21, 2026 14:29
The primary use case of the tool is for the data analysed to be used later on in some other tool (or this one in future). Loading and *parsing* an entire `.json` file to render a chart for a single entry is wasteful and takes too long.

This commit implements a new binary format output. Data from the analyses is writen to an SQLite database. As of this commit all analyses reside in the same table and have the format (name, binary_data_blob). The binary_data_blob is generated as the `serde` structure serialised through postcard (a compact and fast binary format).

It also changes the `analyse` subcommand arguments defaults to generate only the binary output.
This commit implements a function and CLI for extracting data from the binary output and fixes few naming inconsistencies in related structures
Copy link
Owner

@skoudmar skoudmar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for the PR.

If you use cargo clippy, the code in the PR will be more uniform with the codebase. I know there are already too many warnings, so let's not add any more, please.

Also, maybe try Copilot, it found the most significant errors:

Found 3 blocking issues in PR #7: src/analyses/mod.rs:106 writes "message_latencies" but src/extract/mod.rs:56 reads "message_latency" (so latency extraction fails); src/utils/binary_sql_store.rs:24/53/67 never clears existing bundles, always INSERTs, then reads with
LIMIT 1 (so reruns can return stale data); and src/analyses/analysis/dependency_graph.rs:521 uses todo!() for missing callback callers (runtime panic path).

@wentasah Can you review and test it after the changes? I have already spent way more time with this PR than I planned. This review is not complete.

let timers = self.timer_nodes.iter().map(|(k, v)| {
let n = k.0.lock().unwrap();
ActivationDelayExport {
interface: format!("Timer({})", n.get_period().unwrap_or(0).to_string()),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
interface: format!("Timer({})", n.get_period().unwrap_or(0).to_string()),
interface: format!("Timer({})", n.get_period().unwrap_or(0)),

to_string applied to a type that implements Display in format! args
for further information visit https://rust-lang.github.io/rust-clippy/rust-1.93.0/index.html#to_string_in_format_args

.map_or(WeakKnown::Unknown, |node_weak| {
get_node_name_from_weak(&node_weak.get_weak())
})
.unwrap_or("".to_owned()),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.unwrap_or("".to_owned()),
.unwrap_or(String::new()),

This is preferred and potentially faster.

.map_or(WeakKnown::Unknown, |node_weak| {
get_node_name_from_weak(&node_weak.get_weak())
})
.unwrap_or("".to_owned()),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.unwrap_or("".to_owned()),
.unwrap_or(String::new()),

let n = k.0.lock().unwrap();

PublicationDelayExport {
interface: format!("Publisher({})", n.get_topic().to_string()),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
interface: format!("Publisher({})", n.get_topic().to_string()),
interface: format!("Publisher({})", n.get_topic()),

.map_or(WeakKnown::Unknown, |node_weak| {
get_node_name_from_weak(&node_weak.get_weak())
})
.unwrap_or("".to_owned()),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.unwrap_or("".to_owned()),
.unwrap_or(String::new()),

Self::Unknown | Self::Dropped => default,
}
}

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add this to compile the message_latency suggestion.

Suggested change
#[inline]
pub fn unwrap_or_else(self, default: impl FnOnce() -> T) -> T {
match self {
Self::Known(value) => value,
Self::Unknown | Self::Dropped => default(),
}
}

Comment on lines 272 to 304
fn from(value: MessageLatencyStats) -> Self {
let subscriber = value.subscriber.lock().unwrap();
let subscriber_node = subscriber
.get_node()
.map(|node| get_node_name_from_weak(&node.get_weak()).unwrap_or("Unknown".to_string()))
.unwrap_or("Unknown".to_string());
let publisher_node = value.publisher.as_ref().map_or_else(
|| "Unknown".to_string(),
|p| {
let publisher = p.lock().unwrap().get_node();
publisher
.map(|node| {
get_node_name_from_weak(&node.get_weak()).unwrap_or("Unknown".to_string())
let target_node = value
.subscriber
.lock()
.map(|s| {
s.get_node().map(|v| {
get_node_name_from_weak(&v.get_weak()).unwrap_or("Unknown".to_string())
})
})
.map(|v| v.to_string())
.unwrap_or("Unknown".into());

let source_node = value
.publisher
.map(|p| {
p.lock()
.map(|s| {
s.get_node().map(|v| {
get_node_name_from_weak(&v.get_weak()).unwrap_or("Unknown".to_string())
})
})
.unwrap_or("Unknown".to_string())
},
);
.map(|v| v.to_string())
.unwrap_or("Unknown".into())
})
.unwrap_or("Unknown".into());

Self {
topic: value.topic,
subscriber_node,
publisher_node,
source_node,
target_node,
latencies: value.latencies,
}
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use closures for Unknown strings. If the strings are not needed, they are not allocated.

I did use them in some places, so let's unite the design here.

Requires: https://github.com/skoudmar/Ros2TraceAnalyzer/pull/7/changes#r2836312238

Suggested change
fn from(value: MessageLatencyStats) -> Self {
let subscriber = value.subscriber.lock().unwrap();
let subscriber_node = subscriber
.get_node()
.map(|node| get_node_name_from_weak(&node.get_weak()).unwrap_or("Unknown".to_string()))
.unwrap_or("Unknown".to_string());
let publisher_node = value.publisher.as_ref().map_or_else(
|| "Unknown".to_string(),
|p| {
let publisher = p.lock().unwrap().get_node();
publisher
.map(|node| {
get_node_name_from_weak(&node.get_weak()).unwrap_or("Unknown".to_string())
let target_node = value
.subscriber
.lock()
.map(|s| {
s.get_node().map(|v| {
get_node_name_from_weak(&v.get_weak()).unwrap_or("Unknown".to_string())
})
})
.map(|v| v.to_string())
.unwrap_or("Unknown".into());
let source_node = value
.publisher
.map(|p| {
p.lock()
.map(|s| {
s.get_node().map(|v| {
get_node_name_from_weak(&v.get_weak()).unwrap_or("Unknown".to_string())
})
})
.unwrap_or("Unknown".to_string())
},
);
.map(|v| v.to_string())
.unwrap_or("Unknown".into())
})
.unwrap_or("Unknown".into());
Self {
topic: value.topic,
subscriber_node,
publisher_node,
source_node,
target_node,
latencies: value.latencies,
}
}
fn from(value: MessageLatencyStats) -> Self {
let target_node = value
.subscriber
.lock()
.map(|s| {
s.get_node().map(|v| {
get_node_name_from_weak(&v.get_weak()).unwrap_or_else(|| "Unknown".to_string())
})
})
.map_or_else(|_| "Unknown".to_string(), |v| v.to_string());
let source_node = value
.publisher
.and_then(|p| {
p.lock()
.map(|s| {
s.get_node().map(|v| {
get_node_name_from_weak(&v.get_weak())
.unwrap_or_else(|| "Unknown".to_string())
})
})
.ok()
.map(|v| v.to_string())
})
.unwrap_or_else(|| "Unknown".to_string());
Self {
topic: value.topic,
source_node,
target_node,
latencies: value.latencies,
}
}

Comment on lines +25 to +27
sqlite_connection
.execute("DROP TABLE IF EXISTS blobs", ())
.map_err(|e| BinarySQLStoreError::SQLiteError(e))?;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why are you deleting table from new file that is empty. If you expect that something else accessed the table between the creation of the database file and now, the initialization should be done in a transaction to prevent race conditions. If nothing will access it then this code is redundant.

),
v => v.to_string(),
},
None => todo!(),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you forget to implement it or if it should panic use panic!

@wentasah
Copy link
Collaborator

@wentasah Can you review and test it after the changes? I have already spent way more time with this PR than I planned. This review is not complete.

Definitely. But it will be a bit slower as I'm on vacation next week. I should have some time at least some evenings.

@Pavel-Sedlacek
Copy link
Contributor Author

Thank you for all the feedback, I will start working on the fixes tomorrow. I'm sorry for taking this much of your time, if you don't manage to write a detailed description for each change, don't worry, just a say what you'd like to get fixed in general if you can, thanks. I expect there to be only one other major PR, so there is no rush.

Some of the issues you mentioned (namely the analysis name inconsistencies) were fixed in later commits, which I did not / forgot to include here, but the code as a whole is tested and works (to the best of my knowledge), just not these partial PRs.

@Pavel-Sedlacek Pavel-Sedlacek marked this pull request as draft February 22, 2026 09:00
Fixes several issues regarding useless closures, String allocations, wrongly named properties, ...
Copy link
Collaborator

@wentasah wentasah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far just a few comments about UX. Try to address them and I'll then go deeper.


/// Output formats to save the analyses as
///
/// Defaults to only 'bundle'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--help says confusingly:

          Defaults to only 'bundle'
          
          [default: binary]

So what is the default?

///
/// Defaults to only 'bundle'
#[arg(long, short = 'f', value_delimiter = ',', default_values_t = vec![OutputFormat::Binary])]
output_format: Vec<OutputFormat>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value is a vector, but from the doc string, it's not clear what the individual values mean and how to specify them. I guess the doc should mention "This option can be specified multiple times" (but see below).

From the code, it looks like this option acts as a filter to not write certain files. This is quite weird user interface. I guess users don't care about the formats but about the analyses, which they can already enable individually.

I think you just want a switch to say whether to use the new "bundle" format or the old formats. Can you think of a use case, where one wants both formats at the same time?

pub const REAL_UTILIZATION: &str = "real_utilization.txt";
pub const SPIN_DURATION: &str = "spin_duration.json";

pub const BINARY_BUNDLE: &str = "binary_bundle.sqlite";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is not much meaningful to users. It describes an implementation detail - it's binary. What about calling it r2ta_results.sqlite?

use clap::{Args, ValueEnum, ValueHint};
use derive_more::Display;

const DEFAULT_BUNDLE_NAME: &str = "binary_bundle.sqlite";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better not to duplicate the default from analysis_args.rs.

use derive_more::Display;

const DEFAULT_BUNDLE_NAME: &str = "binary_bundle.sqlite";
const DEFAULT_OUTPUT_NAME: &str = "extract_data.json";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is is necessary to have the default output name? I guess the extract command will be called mainly by the GUI and it can always supply some value (or even read it from stdout).

pub struct ExtractArgs {
/// Identifier of the element for which to extract the data
///
/// - For nodes (graphviz nodes) the namespace (ROS node), type (ROS interface) and parameters (ROS topic) need to be specified
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrap the text to reasonable width. Separate the paragraph from the next with an empty line (otherwise --help prints both on a single line.

Give an example, in which format these things are specified, because it's not clear from your description.

Also, namespace and ROS node name are different things. Clarify what you mean.

/// - For nodes (graphviz nodes) the namespace (ROS node), type (ROS interface) and parameters (ROS topic) need to be specified
/// - For edges (graphviz edges) name (type + topic) of the source and target node should be provided
///
/// The expected format is URL encoded map
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct would be URL-encoded, but I don't know which kind of "map" do you have in mind so more information (and an example) is needed.

CallbackDuration,
/// Delays between callback or timer activations
#[display("Delays between activations")]
ActivationsDelay,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ActivationsDelay,
ActivationDelays,

(this is for --help output)

/// The expected format is URL encoded map
element_id: String,

/// The property to extract from the node
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// The property to extract from the node
/// The property to extract from the node.

This commit addresses some UX concerns regarding the CLI and documentation
Copy link
Owner

@skoudmar skoudmar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few more comments.

README.md Outdated
## Analyze
This command analyzes the traces and saves relevant information for later use into JSON, TXT and DOT files.

<!-- `$ cargo run analyze -h | sed 's/ \[default:/\n \[default:/g'` -->
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<!-- `$ cargo run analyze -h | sed 's/ \[default:/\n \[default:/g'` -->
<!-- `$ cargo run analyze --help | sed 's/ \[default:/\n \[default:/g'` -->

Please keep the full, detailed --help output instead of a summary with -h.

Comment on lines +95 to +99
/// Flag whether to bundle all outputs into a single file or not
///
/// Defaults to only 'bundle'
#[arg(long, short = 'f', value_delimiter = ',', default_values_t = vec![OutputFormat::Binary])]
output_format: Vec<OutputFormat>,
/// Defaults to only `true`
#[arg(long, default_value = "true")]
bundle_output: bool,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clap has a default action ArgAction::SetTrue:

  • If this flag is not provided, it is true (the default value)
  • If this flag is provided, it is set to true (by the default action)

So this can never be false.

Note: Changing the action to something like set to false is an antipattern. A better approach is to rename it to no_bundle_output or a similar name and invert the logic.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even better would be --output-separate-files or something similar.

Fixed impossible bundle_output flag for the analyze subcommand
@wentasah
Copy link
Collaborator

It seems that the "bundle" format uses two different binary formats. One is "sqlite" and the second is "postcard". sqlite is used only for what was previously a separate file, and postcard for the content of the file. The division between the two seems a bit arbitrary. If the trace is huge, extracting the data for a single figure (let's say message latency) would currently require deserializing message latencies of all messages and then throwing all of them but one. Wouldn't it be better to have separate sqlite rows for latencies of different messages so that extract will deserialize just what's needed?

And similarly for other types of data.

@Pavel-Sedlacek
Copy link
Contributor Author

I did consider that as an option. It will take a bit more manual SQL, but may be worth in the end. I will have a look at it, but the most reasonable solution seems to save the data for each element as a postcard blob.

This commit changes how the data is stored inside the database. Before it stored each analyssed property as row with all the data in a binary blob, now it stores each property in its own table and each element as a row
@skoudmar
Copy link
Owner

skoudmar commented Feb 26, 2026

The discussion does not give me the impression of thorough research or a plan. So I would like to get some things clarified.

  1. What is the purpose of the binary bundle?

    • Is it primarily so that you can do fast, selective lookups on the data because the JSON was slow and needed full parsing?
  2. Do you expect or want to support cases where the binary bundle is used by third-party software?

    • Will it be deserialized by third-party Rust software?
    • Will it be deserialized by third-party non-Rust software?
  3. What other options have you considered for serialization? This will depend on the previous answer, but some options are:

    • rkyv with mmaped zerocopied b-tree map
    • Cap-n-proto

If you have already decided to use SQLite because you think it is the best option:

  1. What will you store in one row: individual results? One series of values for one entity? The entire analysis results?

  2. What will you use as a primary key?

  3. Will you build an index to speed up queries?

  4. How will you handle variable-length data, e.g., strings, arrays?

  5. Will you use a foreign key, i.e., reference data from other tables? If so, I will require a datamodel diagram.

@Pavel-Sedlacek
Copy link
Contributor Author

Pavel-Sedlacek commented Feb 26, 2026

I believe there was enough research, but to answer your questions:

  1. It is meant to speed up the loading of data (be it callback duration, message latency, ...) for a single element (dependency graph node) as parsing the JSON takes too long

  2. I personally do not really expect this to be utilised by any third party tools.

  3. Alternatives I did consider are:

  • bincode tar archives (the crate got discontinued in the meanwhile)
  • flatbuffers (which from what I understand is similar to the Cap-n-proto, both of which require an external schema definition) and just archiving (tar, zip)

I'm in no way decided for SQLite, while I think it is a decent option here, there certainly may be better options (rkyv seems to be one of those). But to address those concerns:
4. Right now each analysis output resides in its own table and each row contains ID of the element and serialised data for that analysis (say callback durations)
5. The PK is the hash of custom ID of an element (consisting of ROS node, Interface type and topic)
6. The querying is always done only on the primary key which is indexed by default
7. There is no need, really. There are always only two columns

  • PK which is an INT (the ID is hashed to an integer)
  • The data is BLOB which, depending on the actual size, will get stored separate to the indexed rows
  1. I did not plan to make it this "robust" and don't really see a point. Storing the elements in a separate table and referencing them from the various analyses would not help in any way, it would just make the output larger

@wentasah wentasah changed the base branch from main to graphs March 2, 2026 15:30
This commit simplifies the identification of nodes during data extraction from output bundle by using the dependency graph serial ids instead of full name. This also restrucutred the SQLite schema by adding new columns to the tables.

All properties stored in the bundled output are obtained from the dependency graph analysis.

The dependency graph DOT file is now included in the bundled SQLite file and can be extracted through the CLI.
@Pavel-Sedlacek
Copy link
Contributor Author

Pavel-Sedlacek commented Mar 4, 2026

I changed the way the data is stored in the SQLite file as discussed. The file now contains a version and the dependency graph and the tables with analysed properties have these schemas

image image

Copy link
Collaborator

@wentasah wentasah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few more comments related to UI.


/// Flag whether to bundle all outputs into a single file or not
///
/// Defaults to only `true`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only? And this can be removed as clap prints default values automatically, where it makes sense.

///
/// Defaults to only `true`
#[arg(long)]
no_bundle_output: bool,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think that non-negative name (such as --legacy-output-files or --separate-output-files would be a better name.

README.md Outdated
## Viewer
This command is reserved for later use. Builtin .dot graphs viewer.

<!-- `$ cargo run viewer --help | sed 's/ \[default:/\n \[default:/g'` -->
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<!-- `$ cargo run viewer --help | sed 's/ \[default:/\n \[default:/g'` -->
<!-- `$ cargo run viewer --help` -->

The sed command seems to add second empty line before each default. I don't understand why it's needed. I think one empty line is sufficient.

README.md Outdated

<!-- `$ cargo run analyze --help | sed 's/ \[default:/\n \[default:/g'` -->
```
Analyze a ROS 2 trace and generate graphs, JSON or bundle outputs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Analyze a ROS 2 trace and generate graphs, JSON or bundle outputs
Analyze a ROS 2 trace and store the result either as a binary bundle or separate files. See the extract subcommand for how to work with the binary bundle.

README.md Outdated
Comment on lines +130 to +131
--binary-bundle [<FILENAME>]
Filename or directory of the binary bundle output
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it really be a directory? If yes, then the metavar should be something like FILENAME_OR_DIRECTORY. But I think it can be just a filename (either absolute or relative). If it's relative, it's relative to OUT_DIR.

README.md Outdated
Comment on lines +301 to +302
-i, --input-path <INPUT>
The input path, either a file of the data or a folder containing the default named file with the necessary data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which kind of input is this? Should that be the r2ta_results.sqlite file? Or something else?

README.md Outdated
```

## Extract
This command retreives data from binary analysis output for the specified ROS interface or channel
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This command retreives data from binary analysis output for the specified ROS interface or channel
This command retrieves various data from the "binary bundle" produced by the analysis subcommand.

README.md Outdated

<!-- `$ cargo run extract --help | sed 's/ \[default:/\n \[default:/g'` -->
```
Retreive data from bundled analysis results file into JSON format
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't mention JSON as graph is probably in dot format. Or I prefer it to be.

README.md Outdated
Options:
-i, --input-path <INPUT> The input path, either a file of the data or a folder containing the default named file with the necessary data
-v, --verbose... Increase logging verbosity
-o, --output-path <OUTPUT> The output path, either a folder to which the file will be generated or a file to write into
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why to distinguish between files and filters again? I'd prefer just -o, --output <FILE>. If this is not specified, the output would go to stdout.

README.md Outdated
help Print this message or the help of the given subcommand(s)

Options:
-i, --input-path <INPUT> The input path, either a file of the data or a folder containing the default named file with the necessary data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-i, --input-path <INPUT> The input path, either a file of the data or a folder containing the default named file with the necessary data
-i, --input <FILE> The input binary bundle. [default: r2ta_results.sqlite]

This commit addresses several issues related to the CLI
Copy link
Collaborator

@wentasah wentasah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next set of comments.

/// Flag whether to bundle all outputs into a single file or not
///
/// Defaults to only `true`
/// Flag whether to bundle all outputs into a single file or export each analysis as a separate file
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Flag whether to bundle all outputs into a single file or export each analysis as a separate file
/// Store the results into multiple files rather than to the binary bundle

README.md Outdated
-o, --output-path <OUTPUT> The output path, either a folder to which the file will be generated or a file to write into
-q, --quiet... Decrease logging verbosity
-h, --help Print help
-i, --input-path <FILENAME> Path to the r2ta_results.sqlite file from which to retreive the data [default: r2ta_results.sqlite]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outdated. Needs to be regenerated.

/// The output path, either a folder to which the file will be generated or a file to write into
#[clap(long, short = 'o', value_name = "OUTPUT", value_hint = ValueHint::AnyPath)]
#[clap(long, short = 'o', value_name = "FILENAME", value_hint = ValueHint::AnyPath)]
output_path: Option<PathBuf>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
output_path: Option<PathBuf>,
output: Option<PathBuf>,

(to be consistent with input)

#[clap(long, short = 'i', value_name = "FILENAME", value_hint = ValueHint::FilePath, default_value = analysis_args::filenames::BINARY_BUNDLE)]
input: Option<PathBuf>,

/// The output path, either a folder to which the file will be generated or a file to write into
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc comment should stay here. Perhaps something like: "Store the extracted data to the given FILENAME"

#[display("viewer")]
Viewer(viewer_args::ViewerArgs),

/// Retreive data from bundled analysis results file into JSON format
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Retreive data from bundled analysis results file into JSON format
/// Retrieve data from binary bundle produced by the analysis

store.write_into(
"metadata",
"(version, graph)",
[(1, dot_graph.to_string())].into_iter(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version 1 should be defined somewhere else so that extract and other commands can access it.

It would be better if the version is actually stored in binary_sql_store.rs and used just internally. See other comments.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree and I would prefer to not expose any implementation details (like structure or table/column names) of the SQLite tables outside of the sql store. If it needs to be exposed use rust typing system instead of strings (structs, enums, ...).

Comment on lines +116 to +137
AnalysisProperty::MessageLatencies.table_name(),
&[
"id int",
"source_node text",
"destination_node text",
"topic text",
"data blob",
],
)?;
store.write_into(
AnalysisProperty::MessageLatencies.table_name(),
"(id, source_node, destination_node, topic, data)",
a.message_latencies(&dot_graph).iter().map(|m| {
(
m.id,
&m.name.source_node,
&m.name.destination_node,
&m.name.topic,
postcard::to_allocvec(&m.messages_latencies).unwrap(),
)
}),
5,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The definition of the "binary store" format is split at multiple places. Here, in extract and I don't know where else. It would be better if all code related to the actual format is stored at one place, perhaps in binary_sql_store.rs. This would provide typed interface for writing and reading data. This way, it would be easier to maintain store version numbers correctly, i.e. incrementing it when something changes in the format and checking, which read operations are compatible with which versions.

Comment on lines +58 to +66
store
.read::<CallbackDurationExport>(
property.table_name(),
"id, data, interface, node",
"id = ?1",
(element_id,),
)
.map_err(DataExtractionError::SourceDataParseError)?
.callback_durations,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted elsewhere, this would be better moved into binary_sql_store.rs. You should also add a check for store version number compatibility to detect attempts to read from old (or newer) format versions. For now, the version check could produce just a warning about version mismatch. Failing immediately is probably neither good nor necessary.

This commit adds proper versioning to the output files and aggregates all code related to the structure of the output file in a single place.
store.write_into(
"metadata",
"(version, graph)",
[(1, dot_graph.to_string())].into_iter(),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree and I would prefer to not expose any implementation details (like structure or table/column names) of the SQLite tables outside of the sql store. If it needs to be exposed use rust typing system instead of strings (structs, enums, ...).

}

pub fn extract_graph(input: &Path) -> color_eyre::eyre::Result<String> {
let store = BinarySqlStoreV1::from_file(input, false)?;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not create the sql store file here if it does not exist. rusqlite::Connection::open creates missing files.

input: &Path,
element_id: i64,
property: &AnalysisProperty,
) -> color_eyre::eyre::Result<ChartableData> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not create the sql store file here if it does not exist. rusqlite::Connection::open creates missing files.

Comment on lines +10 to +12
/// Path to the r2ta_results.sqlite file from which to retreive the data
#[clap(long, short = 'i', value_name = "FILENAME", value_hint = ValueHint::FilePath, default_value = super::analysis_args::filenames::BINARY_BUNDLE)]
input: Option<PathBuf>,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it only filename or also directory path? ExtractArgs::input_path suggests that it can also be a directory.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Path to the r2ta_results.sqlite file from which to retreive the data
#[clap(long, short = 'i', value_name = "FILENAME", value_hint = ValueHint::FilePath, default_value = super::analysis_args::filenames::BINARY_BUNDLE)]
input: Option<PathBuf>,
/// Path to the `r2ta_results.sqlite` file from which to retrieve the data
#[clap(long, short = 'i', value_name = "FILENAME", value_hint = ValueHint::FilePath, default_value = super::analysis_args::filenames::BINARY_BUNDLE)]
input: Option<PathBuf>,

if args.bundle_output()
&& let Some(path) = args.binary_bundle_path()
{
let mut store = BinarySqlStoreV1::from_file(&path, true)?;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this inside the following if because it is not used elsewhere.


if reusing_file {
if clear {
store.clear()?;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this clear and refresh the metadata or does this just delete them?

Changed the way missing and existing .sqlite result files are handled during creation
Copy link
Collaborator

@wentasah wentasah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments based on our off-line discussion.

publisher_node: String,
latencies: Vec<i64>,
#[derive(Debug, Serialize, Deserialize)]
pub struct MessageLatencyExport {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has the same name as struct MessageLatencyExport in dependency_graph.rs. It's probably not a good idea.

#[derive(Debug, Serialize, Deserialize)]
pub struct MessageLatencyExport {
pub topic: String,
pub source_node: String,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment explaining a change from subscriber/publisher to source/destination. Something like

Suggested change
pub source_node: String,
pub source_node: String, // typically publisher or service client/server? or timer?

Comment on lines -75 to +85
let spin_durations: Vec<SpinDurationEntry> = self
.processing_durations
.iter()
.map(|(node, durations)| SpinDurationEntry {
node: node.0.lock().unwrap().get_full_name().unwrap().to_owned(),
spin_duration: durations.clone(),
})
.collect();

serde_json::to_writer(file, &spin_durations)
serde_json::to_writer(file, &self.export())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed to separate body to other function. Revert it back.

Comment on lines -83 to 100
let latencies: Vec<ExportEntry> = self
.latencies
.iter()
.map(|(callback, latencies)| ExportEntry {
topic: callback
.0
.lock()
.unwrap()
.get_caller()
.unwrap()
.get_caller_as_string()
.unwrap(),
latencies: latencies.clone(),
})
.collect();

serde_json::to_writer(file, &latencies)
serde_json::to_writer(file, &self.export_latencies())
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert (see below).

if args.bundle_output()
&& let Some(path) = args.binary_bundle_path()
{
if let Some(a) = &self.dependency_graph {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if let Some(a) = &self.dependency_graph {
if let Some(graph) = &self.dependency_graph {


store.insert(
BinarySqlStoreV1Table::Property(AnalysisProperty::MessageLatencies),
a.message_latencies(&dot_graph).iter().map(|m| {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer something like:

Suggested change
a.message_latencies(&dot_graph).iter().map(|m| {
graph.message_latencies(&dot_graph.edge_ids).iter().map(|m| {

Comment on lines +659 to +666
let from = dot_graph.node_to_id[&k.source()];
let to = dot_graph.node_to_id[&k.target()];

let edge_id = dot_graph
.edges
.iter()
.enumerate()
.find(|(_, e)| e.source == from && e.target == to);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't calculate this multiple times, move it to display_as_dot or DisplayAsDot::new.


fn metadata_table(&self) -> Self::Table;

fn tables(&self) -> &HashMap<Self::Table, SqlTable>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn tables(&self) -> &HashMap<Self::Table, SqlTable>;
fn tables(table: Self::Table) -> SqlTable;

}
}

pub trait BinarySqlStoreBase: Sized {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be separate from BinarySqlStore?

Comment on lines +689 to +701
impl FromRow for ActivationDelayExport {
fn from_row(row: &rusqlite::Row) -> Result<Self, rusqlite::Error> {
Ok(ActivationDelayExport {
id: row.get("id")?,
name: RosInterfaceCompleteName {
interface: row.get("interface")?,
node: row.get("node")?,
},
activation_delays: postcard::from_bytes(&row.get::<_, Vec<_>>("data")?)
.expect("Data must be a serialised list of integers"),
})
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't define this here, keep the database structure solely in binary_sql_store. It may not be generic at all. Conversion to row can be done directly in store.insert_actiovation_delay() or similar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants