Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to match label with EEG_ (or anything_) #70

Open
josephsdavid opened this issue Apr 18, 2023 · 5 comments
Open

Unable to match label with EEG_ (or anything_) #70

josephsdavid opened this issue Apr 18, 2023 · 5 comments

Comments

@josephsdavid
Copy link

I have a header that looks a bit like this:

EDF.SignalHeader("EEG_C3-A2", "", "uV", -313.0f0, 313.0f0, -32768.0f0, 32767.0f0, "", 200)

specifically, the label has an EEG_ prefix (same issue arises with ECG_ etc.), and it cannot be matched unless I manually get rid of the EEG_.

@josephsdavid
Copy link
Author

interestingly, if instead i replace _ with , almost everything works as expected!

@kleinschmidt
Copy link
Member

Root cause of this is that the regex we use to separate the signal and label is

m = match(r"[\s\[,\]]*(?<signal>.+?)[\s,\]]*\s+(?<spec>.+)"i, label)

Specifically, [\s,\]]*\s+ which matches zero or more whitespace, ,, or ], followed by one or more whitespace. I wonder if we should use something like [\s,\]]*[\s-_]+ instead so that we'd match EEG_ or EEG- even though those are not STRICTLY allowed by the spec.

@kleinschmidt
Copy link
Member

AFAICT the only thing we're using the extracted signal match group for is to throw it away if it matches one of the values in the set of labels being considered...

@kleinschmidt
Copy link
Member

The spec isn't super clear on this. The EDF+ spec talks about a "standard 'label' text" but it's not obvious that labels are REQUIRED to have this structure (some of the samples they provide do not)

2.1. The standard 'label' structure
The header field 'label' offers 16 ASCII characters. The standard structure consists of three components, from left to right:

  • Type of signal (for example EEG).
  • A space.
  • Specification of the sensor (for example Fpz-Cz).

As in all fields, this standard text must be left justified in the 16-character field and then filled out with spaces. In this example, the standard label reads 'EEG Fpz-Cz '. Further possibilities are listed in the signals table below.

that being said, it's pretty clear that the EDF+ "standard" has an opinion that the signal/sensor spec separator shoudl be a space, but in practice we don't always see that (e.g. the example above from an open sleep dataset...)

@kleinschmidt
Copy link
Member

This is a bit complicated unfortunately, since the signal itself may contain _, so we can't :just: include _ as a possible separator character (since we need to use lazy regex match for the signal portion).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants