Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specificity and Improvements to captures array #124

Closed
jacobagilbert opened this issue May 25, 2021 · 12 comments
Closed

Specificity and Improvements to captures array #124

jacobagilbert opened this issue May 25, 2021 · 12 comments

Comments

@jacobagilbert
Copy link
Member

jacobagilbert commented May 25, 2021

It is unclear to me whether the captures array elements must cover the entire file duration. Currently it is allowed to define a nonzero sample_start for the first captures element, but it is not possible to define an end point for this capture other than to create another element. It is also not explicit within the spec if optional fields in captures[0] apply until captures[1] is defined (or EOF), or if they persist until explicitly redefined.

I would also propose that a mechanism to exclude portions of data from the captures array should exist but is missing. For example one use of the captures array to provide frequency information for a segment of data; if I have a device that does not retune instantly, this would look like some duration of data at frequency 1, some duration of invalid data, then data at frequency 2, etc. One could argue that annotations could be used to flag this, however it seems awkward to use annotations to convey information that is otherwise conveniently included in the captures array. The same can be true for gain or saturation metadata.

Two suggestions for which I do not have a strong opinion on the best approach:

  • Create a 'data_valid' boolean entry in the captures array
  • Create a 'comment' section that can be used for this purpose
  • Create a sample_end entry that can be used to specify discontinuous chunks of the file.

I also propose the addition of an optional 'gain' field into the core captures definition for the same reasons frequency is included.

These are very simple PRs (some language on capture element scope, addition of a few fields) but I would like some opinions before I do that.

@bhilburn
Copy link
Contributor

Responding in bullet form to organize what I think are related but separate lines of discussion:

  • Regarding discontinuous chunks of the file, that's the intention of the global_index field. Does that not provide the relevant functionality for this item?
  • Excellent point re: optional fields carrying over into the next captures element. I consider this a bug and it should be addressed explicitly by adding text to the spec that says they do NOT carry over, IMO.

Given the two points above, it seems like the additional ask, here, would be to add an optional "num samps" sort of field to captures, is that right? I think this is a topic we've discussed at-length in some of the GRCon SigMF breakouts, and each time have arrived at the same conclusion of not doing it. What are your thoughts?

@jacobagilbert
Copy link
Member Author

I think discontinuities are handled effectively with the global_index provided they are manifested as dropped samples (i.e.: not in the sigmf-data). The issue here is that currently there is no way to specify where a segment stops other than to specify a new segment (for which 'bad' data included in the sigmf-data for whatever reason is one example of why you may want to do this), and there is no way to categorize segments other than by frequency or datetime.

Adding a sample_count field (as there is for annotations) is one solution, though it sounds like the drawbacks have been discussed at length. I think my best suggestion instead is to add an optional description field to captures as there is in global, a simple, non-breaking change.

I agree about language specifying that captures fields apply from the explicitly identified sample_start until the sample_start of the next capture array item or EOF. This inherently prevents capture overlap which is a major reason not to include a sample_count field.

@bhilburn
Copy link
Contributor

Ahh, I see your point, @jacobagilbert. So, it's not that there are "missing samples", but rather samples that you want to ignore because they are bad for whatever reason.

Setting aside the other arguments against a sample_count sort of field, I actually don't think that would be a good solution, here, anyway -- those samples are "real", and genuinely part of the capture, but rather you are trying to indicate something about the utility of them.

I'm spinning on your proposal to add some sort of description field (optional) to each captures element. My knee-jerk is that on the surface that seems like the right approach, but looking deeper my concern is that it means everyone will have their own "standard" for how to describe common things they want do in it. I think your use case, for example, is probably not uncommon.

So, new question: is it worth creating a canonical extension that adds a field to captures for this purpose and specifies labels that can be used for common things to be indicated?

@jacobagilbert
Copy link
Member Author

Samples to ignore, or maybe just flag as not useful, or something that does not fall into "freq W.XYZ" which is the only non-temporal mechanism currently included.

I definitely see your point about description being a free for all; its sort of intended to be non-critical IMO, and maybe it should be there but not be abused for this. If this is really common enough we could just have a 'valid' bool where 0 would allow a machine-readable indication not to use this capture segment and the reason (mid-tune, saturation, or otherwise not useful data) could live in the description in a human readable format. Not trying to pollute this, but i feel fairly strongly that 'description' is a good thing to have as optional for general human readable information.

@bhilburn
Copy link
Contributor

I think the last piece of what you wrote is what I'm questioning, @jacobagilbert --

for general human readable information.

I think if we are going to introduce this feature, there's a lot of benefit to making it machine-readable, which means high specificity. For example, there may be readers that want to automatically "shade" samples that are marked as "not useful" but still display that period of time in a visualizer.

@jacobagilbert
Copy link
Member Author

jacobagilbert commented May 26, 2021

Yes that is a perfect example of how one might use this to focus on important portions of large datasets. Just to summarize (make sure we are on same page) where I think we are on this whole issue:

  1. [AGREE] Add text specifying scope of captures metadata fields explicitly
  2. [UNSURE] Address important/not-important captures segments by either:
    a. Add an optional contents (or some other name) field which is an enum of {'normal', 'invalid', 'saturation', 'unknown', ...?}
    b. Add an optional valid boolean field to indicated whether or not this capture segment should be considered valid, and an optional description field similar to that in the core global object.
  3. [UNSURE] Add optional core:gain captures object.

@bhilburn
Copy link
Contributor

Agreed on the first one. That should be a quick bug & PR, which I think can be filed outside of this ticket and pursued.

On the second, I agree it's a useful feature, but I think we have to work out the best way to go about it (I like some aspects of your (a) and (b), but not others. I think this topic ought to be the focus of the discussion in this ticket now.

I'm for sure not sold on a core:gain field, actually. My issue is that's really more of a description of the radio system that made the capture than the capture itself, and is not actually needed to "understand" the data. I understand how this is related, but I think it's a different topic. I think this discussion ought to get broken out into a separate ticket to think through.

@jacobagilbert
Copy link
Member Author

OK, will PR 1 and move 3 to a different issue; definitely not urgent and I don't have a pressing use case for that.

Thanks.

jacobagilbert pushed a commit to jacobagilbert/SigMF that referenced this issue May 27, 2021
Per discussion in sigmf#124 this makes the scope of captures fields
explicitly defined.

Signed-off-by: Jacob Gilbert <[email protected]>
@jacobagilbert
Copy link
Member Author

Done, and while resolution of the second item would be nice for v1.0 it is not necessary so im going to push this out of the critical path while we mull on it.

@gmabey
Copy link
Contributor

gmabey commented Jun 14, 2021

Does merge #129 effectively close this issue then?

@gmabey
Copy link
Contributor

gmabey commented Jun 14, 2021

I ask because I still see it in the list of v1.0.0 issues.

@bhilburn
Copy link
Contributor

You're right, this can be closed, and remaining discussion can be moved to #130

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants