API and GUI: Creating a highly informative example/jupyter notebook for IC labeling #12

adam2392 · 2022-04-06T18:02:43Z

We'll need an example/notebook walking through any team member on how the guidelines for labeling IC components are.

Ideally this notebook/example can be used by team asynchronously to

load the data
run ICA
manually annotate the components with their best guess and then
save the labels and ICA to centralized repo.

Reference: https://www.sciencedirect.com/science/article/pii/S0165027015000928?via%3Dihub

mscheltienne · 2022-04-13T14:02:03Z

Additional reference:

adam2392 · 2022-05-04T14:22:54Z

^ In addition to the above tutorials, and in line with discussion today on Zoom, we want a lightweight pipeline in-house to annotate IC components that has the following features:

GUI to easily have user visualize IC components from BIDS formatted dataset stored on disc
interface to select different IC components as different labels (e.g. brain, eye blink, heart beat, etc.)
1-click commands to save to disc, preferably in line with BIDS format

I wonder if others have already done so. I can ask in the MNE-core dev meeting.

@mscheltienne brought up mnelab, which might have a subset of the above features.

adam2392 · 2022-05-10T02:45:16Z

For discussion next week: I was thinking that if we want to produce an "annotation" pipeline, we want to fire up the following subplots:

topoplot
the actual IC time-series
power spectrum of the IC time-series
dipole plot (is this easy to compute?)
single-trial time-course (?)

Possibly also with a corresponding time-series plot of all ICA components. This subplot would be repeated for all ICA components, and the user would be required to create a list of ICA component labels. The question is how to make this interactive. At the very minimum, we should have a "utility" function perhaps in mne-icalabel that automatically produces all those subplots for a single ICA component, perhaps something like:

plot_ica_features(raw, ica, component_index=1)

Following the plots shown here: https://labeling.ucsd.edu/tutorial/labels

mscheltienne · 2022-05-10T08:50:04Z

Agree. The single-trial time-course is optional I think, personally, I'm mostly looking at the 3 first features you listed.
The utility function is a good idea, and it would be nice if it was easy to embed the figure(s) into a Qt widget.

Also, we need to decide on the label classes:
-> Brain / No brain (as I think is the case on swap4ica and on the MARA dataset)
-> Brain / Ocular / Cardiac / Muscular / ... (as e.g. ICLabel)

adam2392 · 2022-05-11T18:59:23Z

Is there any reason to do all the sub-classes of non-brain (i.e. ocular, cardiac, etc.)?

I feel like the utility is low. On the other hand, labeling them accurately can be beneficial for power-users who want to work with a specific signal(?).

I suppose labeling heart-beat is also nice, cuz hypothetically someone can use EEG and back out "EOG" and "heart-rate" data.

hoechenberger · 2022-05-11T19:05:56Z

For me personally having EOG and ECG labeled as such would be very beneficial

adam2392 · 2022-05-11T19:09:17Z

Okay, I was on the border, so I'm convinced :p. I suppose we should just follow the ICLabel set of labels then.

@hoechenberger do you know a good pipeline accessible in MNE that we can annotate ICA components like this? I think the current GUI just marks "include/exclude".

mscheltienne · 2022-05-11T19:19:07Z

I think ocular and heartbeat could be labeled as such, as both are super easy to spot. For the others.. I'm not sure there is much gain, and the distinction between 'channel noise' and 'muscle' might be a bit difficult to do.
Let's keep in mind that the first objective of those datasets that we will label is to benchmark the models/methods. The training part requires a larger labelled dataset, e.g. MARA or swap4ica, which are labeled binary with brain / no brian only.

hoechenberger · 2022-05-11T20:11:26Z

@hoechenberger do you know a good pipeline accessible in MNE that we can annotate ICA components like this? I think the current GUI just marks "include/exclude".

I'm not exactly sure I fully understand the question; what I do (and what we do in the MNE-BIDS-Pipeline) is that we run ICA.find_bads_eXg() on epochs centered around heartbeat or ocular artifacts (created via create_eXg_epochs()), and then we can select the best-matching components based on the score these functions return.

mscheltienne · 2022-05-11T20:16:37Z

We are looking for a nice GUI, simple to use that displays a couple of features per component (topoplots, PSD, time-series, ...) and lets you select which category these components are; and then we want to adapt it to load and save simply our files and ICs in a format that we can use for benchmarking or training.
Nothing automatic, just some good old interactive buttons :)

hoechenberger · 2022-05-11T20:22:00Z

Honestly I think I would cook up an extremely simple Qt window that:

embeds the ICA.plot_properties() figure
places a couple buttons below or next to it
appends the selection (via button press) to a TSV file

In the MNE-BIDS-Pipeline we create TSVs like this:

component       type    description     status  status_description
0       ica     Independent Component   good    n/a
1       ica     Independent Component   good    n/a
2       ica     Independent Component   good    n/a
3       ica     Independent Component   bad     Auto-detected ECG artifact
4       ica     Independent Component   bad     Auto-detected ECG artifact
5       ica     Independent Component   good    n/a
6       ica     Independent Component   good    n/a
7       ica     Independent Component   good    n/a
8       ica     Independent Component   bad     Auto-detected EOG artifact
9       ica     Independent Component   good    n/a
10      ica     Independent Component   bad     Auto-detected EOG artifact
11      ica     Independent Component   good    n/a

mscheltienne · 2022-05-11T20:24:04Z

That's what I had in mind as well. Maybe we can re-use some widget or stuff from mnelab (I did not look into it yet).

adam2392 · 2022-05-18T14:13:20Z

Reference to the ICA step that creates the "derived" files: https://github.com/mne-tools/mne-bids-pipeline/blob/main/scripts/preprocessing/_04a_run_ica.py

adam2392 · 2022-05-24T20:14:47Z

Now that #60 has been merged, the next steps are to:

prototype a GUI that essentially calls the API for updating the file
test the GUI on a dataset to see if it functions properly

Ideally we can get this done by IIRC we said last meeting early June(?), say June 10th? Need this pipeline to make the hs student able to annotated during the month of July.

for point 1. above, we'll need to make IO safe calls to update a tsv file from an interactive GUI, but otw I don't think too many technical challenges. The visualization aspects for the ICA, we can maybe borrow from MNE, or MNE-BIDS-Pipeline, or just embed the ICA.plot_properties() figure statically and then have a few buttons?

mscheltienne · 2022-05-25T14:09:21Z

This dataset can be used for testing: https://openneuro.org/datasets/ds004132/versions/1.0.0
It's 64 channels ANT device recordings from 19 people (I will update it with more recordings).

I'll add soon a 200+ EGI device dataset.

adam2392 · 2022-06-01T14:23:26Z

TODO:

Confirm correctness of ANT BIDS conversion
Confirm correctness of Mara BIDS conversion
Spin up prototype of GUI and ping Mathieu

adam2392 · 2022-06-14T16:01:10Z

I've been lagging @mscheltienne with some stuff that's come up. Apologies! Hoping to put up a sketch of the code at the very least tonight tho so we can take a "look together".

adam2392 · 2022-06-15T15:19:54Z

Data specs:

maximum of 1 minute length (maybe 2 minutes), but we should fix an upper bound, so the default of the GUI plots it. @mscheltienne maybe you can link the post on the MNE forum you mentioned
^ then save the ICA instances computed (see example below)

Workflow specs:

open up BIDS file corresponding to the ICA + the Raw/Epochs instance (at most 1 minute lengths)
show the plot_properties plots with 1 minute of the time-series by default
label components with color-coded labels
closing the window will now save back to the BIDs files.
update script to do the next file

Ideally we can embed the time-series plot using a pyQT GUI which is dynamic (e.g. mne-qt-browser does this for Raw).

cc: @anandsaini024 @mscheltienne

adam2392 · 2022-06-15T15:28:06Z

Dataset tasks

Mara dataset:

just need to pipe the ICA decomposition to BIDS
Unique cuz we already have "labels", so just need BIDS formatting

ANTS (to preprocess and label):

filtering the Raw and labeling bad electrodes, rejecting EEG bridge channeling, running pyprep
running ICA and saving to BIDS

Nihon Kohden Epilepsy EEG dataset (to preprocess and label):

if we have Raw, prolly want to run the same preprocessing as ANTS

Let's converge a pipeline for every dataset:

E.g. https://github.com/mscheltienne/neurotin-analysis/tree/main/neurotin/preprocessing

TODO:

@mscheltienne and @adam2392 to converge on the GUI prototype
@mscheltienne or @anandsaini024 to send over public script for the entire raw and ICA preprocessing they've done already, so we can hopefully generalize to apply to the other datasets in a similar fashion
@adam2392 to spec out basic script that runs the GUI for different datasets to label components

Lmk if I missed anything?

mscheltienne · 2022-06-15T15:29:11Z

Epilepsy EEG -> EGI (different manufacturer, sponge-based electrode)?

adam2392 · 2022-06-15T15:58:18Z

Epilepsy EEG -> EGI (different manufacturer, sponge-based electrode)?

Epilepsy EEG is Nihon Kohden and not sure on the electrode :p.

mscheltienne · 2022-06-15T16:03:43Z

Oh OK ahah

For the preprocessing pipeline, this is what I would go for:

import itertools

import mne
import pyprep
from mne.preprocessing import ICA, compute_bridged_electrodes


def preprocess_ANT_dataset(raw):
    """Automatic preprocessing pipeline for ANT dataset."""
    # Bandpass standard filter
    # 100 Hz edge shhould retain muscle activity.
    bandpass = (1., 100.)  # Hz
    raw.filter(
        l_freq=bandpass[0],
        h_freq=bandpass[1],
        picks="eeg",
        method="fir",
        phase="zero-double",
        fir_window="hamming",
        fir_design="firwin",
        pad="edge",
    )
    # Set montage
    raw.set_montage("standard_1020")
    
    # Detect bad channels, on a copy
    nc = pyprep.find_noisy_channels.NoisyChannels(raw.copy())
    nc.find_all_bads()
    raw.info["bads"] = nc.get_bads()
    
    # Look for bridged electrodes
    # requires MNE >= 1.1.0
    bridged_idx, _ = compute_bridged_electrodes(raw)
    ch_names = [raw.ch_names[k] for k in itertools.chain(*bridged_idx)]
    raw.info["bads"] = list(set(raw.info["bads"] + ch_names))
    
    # CAR reference
    # The reference channel CPz is not added as it would just introduce an
    # additional dependency and is not required for the ICA.
    # The CAR reference excludes the bad channels.
    raw.set_eeg_reference("average", ch_type="eeg", projection=False)
    
    # Fit ICA decomposition on good channels
    # Fit on n_good_channels-1 because of CAR reference.
    picks = mne.pick_types(raw.info, eeg=True, exclude="bads")
    ica = ICA(n_components=picks.size - 1, method="picard")
    ica.fit(raw, picks=picks)
    
    return raw, ica

And here is the post with the rule of thumb for the number of samples to provide: https://mne.discourse.group/t/ideal-length-of-epochs-ideal-number-of-components-for-ica/4865/2
Lucky I had it bookmark :)

adam2392 · 2022-06-15T16:23:02Z

And here is the post with the rule of thumb for the number of samples to provide: https://mne.discourse.group/t/ideal-length-of-epochs-ideal-number-of-components-for-ica/4865/2 Lucky I had it bookmark :)

The discourse post seems to suggest 1-2 second epochs? However, we want to do 1 minute right?

mscheltienne · 2022-06-15T16:30:21Z

That is for cleaning by rejecting epochs, the part that interest us is:

The only thing that matters is the total number of points you provide. In general, it is best to use as much data as possible, but in practice this is of course not possible. As a rule of thumb, for N stable components you need at least kN2 points for each channel, where N is the number of channels and k is a multiplier which depends on (and increases with) N (see Indep. Comp. Analysis - EEGLAB Wiki 1 for more details).

adam2392 · 2022-06-16T14:06:36Z

@anandsaini024 and @mscheltienne I made a new repo to house our common data processing and also final analysis scripting, so we don't have to keep sending gists to each other. I invited you guys as admins on the repo.

I added some details of our discussion to the README.md. Please take a look and lmk if you have any thoughts/comments? Thanks!

adam2392 · 2022-07-29T17:56:33Z

Can be closed now by the introduction of the GUI and API for labeling: #66

adam2392 mentioned this issue May 18, 2022

[ENH] Add API for bids pipeline annotating #60

Merged

6 tasks

adam2392 changed the title ~~Creating a highly informative example/jupyter notebook for IC labeling~~ API and GUI: Creating a highly informative example/jupyter notebook for IC labeling Jun 1, 2022

adam2392 mentioned this issue Jun 15, 2022

[ENH] Draft GUI for labeling components #66

Merged

6 tasks

adam2392 closed this as completed Jul 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API and GUI: Creating a highly informative example/jupyter notebook for IC labeling #12

API and GUI: Creating a highly informative example/jupyter notebook for IC labeling #12

adam2392 commented Apr 6, 2022 •

edited

Loading

mscheltienne commented Apr 13, 2022

adam2392 commented May 4, 2022

adam2392 commented May 10, 2022

mscheltienne commented May 10, 2022

adam2392 commented May 11, 2022

hoechenberger commented May 11, 2022

adam2392 commented May 11, 2022

mscheltienne commented May 11, 2022

hoechenberger commented May 11, 2022

mscheltienne commented May 11, 2022

hoechenberger commented May 11, 2022

mscheltienne commented May 11, 2022

adam2392 commented May 18, 2022

adam2392 commented May 24, 2022

mscheltienne commented May 25, 2022

adam2392 commented Jun 1, 2022

adam2392 commented Jun 14, 2022

adam2392 commented Jun 15, 2022

adam2392 commented Jun 15, 2022 •

edited

Loading

mscheltienne commented Jun 15, 2022 •

edited

Loading

adam2392 commented Jun 15, 2022

mscheltienne commented Jun 15, 2022 •

edited

Loading

adam2392 commented Jun 15, 2022

mscheltienne commented Jun 15, 2022

adam2392 commented Jun 16, 2022

adam2392 commented Jul 29, 2022

API and GUI: Creating a highly informative example/jupyter notebook for IC labeling #12

API and GUI: Creating a highly informative example/jupyter notebook for IC labeling #12

Comments

adam2392 commented Apr 6, 2022 • edited Loading

mscheltienne commented Apr 13, 2022

adam2392 commented May 4, 2022

adam2392 commented May 10, 2022

mscheltienne commented May 10, 2022

adam2392 commented May 11, 2022

hoechenberger commented May 11, 2022

adam2392 commented May 11, 2022

mscheltienne commented May 11, 2022

hoechenberger commented May 11, 2022

mscheltienne commented May 11, 2022

hoechenberger commented May 11, 2022

mscheltienne commented May 11, 2022

adam2392 commented May 18, 2022

adam2392 commented May 24, 2022

mscheltienne commented May 25, 2022

adam2392 commented Jun 1, 2022

adam2392 commented Jun 14, 2022

adam2392 commented Jun 15, 2022

adam2392 commented Jun 15, 2022 • edited Loading

Dataset tasks

mscheltienne commented Jun 15, 2022 • edited Loading

adam2392 commented Jun 15, 2022

mscheltienne commented Jun 15, 2022 • edited Loading

adam2392 commented Jun 15, 2022

mscheltienne commented Jun 15, 2022

adam2392 commented Jun 16, 2022

adam2392 commented Jul 29, 2022

adam2392 commented Apr 6, 2022 •

edited

Loading

adam2392 commented Jun 15, 2022 •

edited

Loading

mscheltienne commented Jun 15, 2022 •

edited

Loading

mscheltienne commented Jun 15, 2022 •

edited

Loading