`mcap filter`: add last-per-channel filtering semantics #1456

james-rms · 2025-09-23T04:51:24Z

Changelog

mcap filter: added --include-last-per-channel-topic-regex option. This includes the last message before the filter start on selected topics. This can be used to retain infrequently-logged topics in filtered MCAPs.

Docs

None.

Description

Adapting @DaleKoenig's changes from #1426, this PR adds last-per-channel inclusion semantics for selected topics.

Differences from Dale's PR include:

Adds test coverage
Tightens up some error messaging around invalid regexes
Refers to the feature as "last-per-channel" rather than "latched". Latching of topics is a ROS-ism, I think it's a little clearer to omit that.
Remove short form of flag '-l' - this feels specialized enough that it doesn't warrant a short flag, and I think we should keep -l reserved for a more fundamental feature down the line.
Removes the behavior where the last-per-channel message has its log time updated to the filter start time. That feels too invasive to me to change the logged timestamp of a message, and I think the only provided benefit is that the resulting MCAP time range fits the start and end time.

Before	After
`mcap filter` excludes all messages before the requested filter start time. This can mean that infrequently logged topics like `/tf_static` get dropped entirely, and the user may not want that.	`mcap filter --include-last-per-channel-topic-regex /tf_static` will include the last logged `/tf_static` message before the filter start time, as well as all messages matching the filter.

Fixes: #1454

…nnel-regex

DaleKoenig

Thanks for taking this over! I think not updating the timestamp of the message will have some downsides - mainly that there may be a large gap between the first publishes and the data of interest when opening the mcap in foxglove. However, I think modifying the timestamp is not ideal either, so leaving it unchanged is reasonable.

go/cli/mcap/cmd/filter.go

defunctzombie · 2025-09-26T18:40:20Z

👍 to the semantics and having this be part of the CLI.

Do we need the -topic-regex suffix rather than having a separate --include-last-per-channel which would indicate that for any topic selected by the filter this rule would apply? So you would do:

mcap filter --include-last-per-channel -y /tf_static -y /imu

My thinking is that the typical desire would be to have any of the topics you wanted to filter on present in the truncated file without having to remember special ones to state separately in the flag.

Code review I leave to someone more familiar with the structure.

james-rms · 2025-09-28T22:08:42Z

See @DaleKoenig 's response on the last PR when I asked this question:

I do not think this feature makes sense to apply to topics that are not expected to have transient local handling as:

There are many topics that I would not want this to apply to. For example, if a publish on a topic triggers some event, then pulling a message from that topic a minute forward in order to add it to the record could trigger events to happen during record playback that would not be expected.

defunctzombie · 2025-09-29T00:27:48Z

For example, if a publish on a topic triggers some event, then pulling a message from that topic a minute forward in order to add it to the record could trigger events to happen during record playback that would not be expected

We no longer "pull" the topic forward and keep the original timestamp. Does this comment still apply in that context?

DaleKoenig · 2025-09-29T00:43:17Z

For example, if a publish on a topic triggers some event, then pulling a message from that topic a minute forward in order to add it to the record could trigger events to happen during record playback that would not be expected

We no longer "pull" the topic forward and keep the original timestamp. Does this comment still apply in that context?

I think that part no longer applies. However I would still worry about the performance impact of creating an extra in-memory copy of every message previous to the specified start time, rather than just the (usually infrequent) topics that were transient-local originally

jtbandes · 2025-09-30T18:33:14Z

go/cli/mcap/cmd/filter.go

+			for i := range opts.includeLastPerChannelTopics {
+				matcher := opts.includeLastPerChannelTopics[i]


nit: would this work? (same could be applied to other matchers below)

Suggested change

for i := range opts.includeLastPerChannelTopics {

matcher := opts.includeLastPerChannelTopics[i]

for _, matcher := range opts.includeLastPerChannelTopics {

go/cli/mcap/cmd/filter.go

jtbandes · 2025-09-30T18:45:16Z

go/cli/mcap/cmd/filter.go

+					// We might still need to write the channel
+					channel, ok := channels[mostRecent.ChannelID]
+					if !ok {
+						continue
+					}
+					if !channel.written {


nit: is there a way to avoid the duplication here with the channel-writing code that already exists immediately below this block? At first glance, it looks like we should be able to take advantage of mostRecent.ChannelID being the same as message.ChannelID and avoid doing this twice.

jtbandes · 2025-09-30T18:47:57Z

go/cli/mcap/cmd/filter.go

 			if err != nil {
 				return err
 			}
+			mostRecent, ok := mostRecentMessageBeforeRangeStart[message.ChannelID]


nit: Should we use messagesBeforeRangeStartWritten to short-circuit this logic if the messages have already been written, since they will never be written again?

jtbandes · 2025-09-30T18:48:59Z

go/cli/mcap/cmd/filter.go

+						mostRecentMessageBeforeRangeStart[message.ChannelID] = message
+						// Copy the data buffer explicitly, to avoid keeping a reference to the greater
+						// `buf` array that underlies `message.Data`.
+						mostRecentMessageBeforeRangeStart[message.ChannelID].Data = append([]byte{}, message.Data...)


caveat that I am not super familiar with this filter code and it's been a long time since I looked at it...but...are we assuming here that the file is ordered by log time? I'm not sure if repeated calls to lexer.Next(buf) is giving us in-order reading using the index, but I suspect not. If I'm right, then it means this feature might not do what it claims to do if the file is disordered. I'm not sure what it would take to fix that, but if we don't think it's worth it to fully support disordered files, should we at least try to detect and warn/error when it happens, and maybe also document the limitation in the cli help?

jtbandes · 2025-09-30T18:51:49Z

go/cli/mcap/cmd/filter_test.go

+			flags: &filterFlags{
+				startNano:                   50,
+				includeLastPerChannelTopics: []string{"camera_.*"},
+				includeTopics:               []string{"camera_a"},


would it be worth adding a test case for excludeTopics as well?

jtbandes · 2025-09-30T19:01:15Z

go/cli/mcap/cmd/filter.go

+						mostRecentMessageBeforeRangeStart[message.ChannelID] = message
+						// Copy the data buffer explicitly, to avoid keeping a reference to the greater
+						// `buf` array that underlies `message.Data`.
+						mostRecentMessageBeforeRangeStart[message.ChannelID].Data = append([]byte{}, message.Data...)


on another note, the pre-emptive copying strikes me as possibly pessimizing, since we will create new copies of almost every preceding message until we reach the desired range. I see that @DaleKoenig already mentioned this too :)

I'm not sure I understand exactly what the copy is fixing -- would storing a reference to the underlying buf mostly be a concern if the number of channels is high (or total message size across channels is large?) Are we specifically concerned that we might end up holding multiple preceding whole chunks in memory at once? At minimum, I think this comment could be expanded to clarify what we are avoiding (assuming you did some performance testing and found that this solution is best)

Another quick note that we could probably improve it a bit by re-using the buffer if we are replacing an item in the map. Since it's already been copied, if the buffer is large enough we should be able to copy into it without allocating again.

In my implementation, the reasoning was that the filter reads through the cli sequentially and old chunks are not kept in memory, so it is necessary to keep a copy of anything we might want to hold onto to only write at a later time. So it did not seem feasible to keep a reference to the old data without copying it. Implementing a method of keeping the old chunks around when they contain messages we want to write later seemed too invasive/difficult.

Co-authored-by: Jacob Bandes-Storch <[email protected]>

DaleKoenig and others added 4 commits July 4, 2025 12:00

Allow pulling latched topics to the start of the filtered segment

76b6581

Fix lint issues

fbd5833

Add test coverage, rename to avoid 'latched' terminology

af5d929

remove timestamp doctoring

fca985f

james-rms requested review from defunctzombie and jtbandes September 23, 2025 04:51

james-rms requested a review from gasmith as a code owner September 23, 2025 04:51

james-rms mentioned this pull request Sep 23, 2025

mcap CLI: Allow pulling latched topics to the start of the filtered segment #1426

Closed

Merge remote-tracking branch 'origin/main' into jrms/add-last-per-cha…

96fdfe7

…nnel-regex

github-actions bot deployed to mcap (Preview) September 23, 2025 04:58 View deployment

james-rms changed the title ~~Jrms/add last per channel regex~~ mcap filter: add last-per-channel filtering semantics Sep 23, 2025

github-actions bot deployed to mcap (Preview) September 23, 2025 05:03 View deployment

james-rms requested a review from sofuture September 23, 2025 21:22

DaleKoenig reviewed Sep 24, 2025

View reviewed changes

go/cli/mcap/cmd/filter.go Outdated Show resolved Hide resolved

fix lint

92f91ef

james-rms force-pushed the jrms/add-last-per-channel-regex branch from 7eaedc4 to 92f91ef Compare September 24, 2025 02:28

github-actions bot deployed to mcap (Preview) September 24, 2025 02:32 View deployment

jtbandes approved these changes Sep 30, 2025

View reviewed changes

jtbandes reviewed Sep 30, 2025

View reviewed changes

Update go/cli/mcap/cmd/filter.go

47a0f70

Co-authored-by: Jacob Bandes-Storch <[email protected]>

github-actions bot deployed to mcap (Preview) October 1, 2025 00:57 View deployment

		for i := range opts.includeLastPerChannelTopics {
		matcher := opts.includeLastPerChannelTopics[i]

	for i := range opts.includeLastPerChannelTopics {
	matcher := opts.includeLastPerChannelTopics[i]
	for _, matcher := range opts.includeLastPerChannelTopics {

mcap filter: add last-per-channel filtering semantics #1456

Are you sure you want to change the base?

mcap filter: add last-per-channel filtering semantics #1456

Conversation

james-rms commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog

Docs

Description

Uh oh!

DaleKoenig left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

defunctzombie commented Sep 26, 2025

Uh oh!

james-rms commented Sep 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

defunctzombie commented Sep 29, 2025

Uh oh!

DaleKoenig commented Sep 29, 2025

Uh oh!

jtbandes Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jtbandes Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

jtbandes Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jtbandes Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

jtbandes Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

jtbandes Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

DaleKoenig Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

`mcap filter`: add last-per-channel filtering semantics #1456

`mcap filter`: add last-per-channel filtering semantics #1456

james-rms commented Sep 23, 2025 •

edited

Loading

james-rms commented Sep 28, 2025 •

edited

Loading

jtbandes Sep 30, 2025 •

edited

Loading