Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API and GUI: Creating a highly informative example/jupyter notebook for IC labeling #12

Closed
adam2392 opened this issue Apr 6, 2022 · 26 comments

Comments

@adam2392
Copy link
Member

adam2392 commented Apr 6, 2022

We'll need an example/notebook walking through any team member on how the guidelines for labeling IC components are.

Ideally this notebook/example can be used by team asynchronously to

  1. load the data
  2. run ICA
  3. manually annotate the components with their best guess and then
  4. save the labels and ICA to centralized repo.

Reference: https://www.sciencedirect.com/science/article/pii/S0165027015000928?via%3Dihub

@mscheltienne
Copy link
Member

Additional reference:

@adam2392
Copy link
Member Author

adam2392 commented May 4, 2022

^ In addition to the above tutorials, and in line with discussion today on Zoom, we want a lightweight pipeline in-house to annotate IC components that has the following features:

  • GUI to easily have user visualize IC components from BIDS formatted dataset stored on disc
  • interface to select different IC components as different labels (e.g. brain, eye blink, heart beat, etc.)
  • 1-click commands to save to disc, preferably in line with BIDS format

I wonder if others have already done so. I can ask in the MNE-core dev meeting.

@mscheltienne brought up mnelab, which might have a subset of the above features.

@adam2392
Copy link
Member Author

For discussion next week: I was thinking that if we want to produce an "annotation" pipeline, we want to fire up the following subplots:

  • topoplot
  • the actual IC time-series
  • power spectrum of the IC time-series
  • dipole plot (is this easy to compute?)
  • single-trial time-course (?)

Possibly also with a corresponding time-series plot of all ICA components. This subplot would be repeated for all ICA components, and the user would be required to create a list of ICA component labels. The question is how to make this interactive. At the very minimum, we should have a "utility" function perhaps in mne-icalabel that automatically produces all those subplots for a single ICA component, perhaps something like:

plot_ica_features(raw, ica, component_index=1)

Following the plots shown here: https://labeling.ucsd.edu/tutorial/labels

@mscheltienne
Copy link
Member

Agree. The single-trial time-course is optional I think, personally, I'm mostly looking at the 3 first features you listed.
The utility function is a good idea, and it would be nice if it was easy to embed the figure(s) into a Qt widget.

Also, we need to decide on the label classes:
-> Brain / No brain (as I think is the case on swap4ica and on the MARA dataset)
-> Brain / Ocular / Cardiac / Muscular / ... (as e.g. ICLabel)

@adam2392
Copy link
Member Author

Is there any reason to do all the sub-classes of non-brain (i.e. ocular, cardiac, etc.)?

I feel like the utility is low. On the other hand, labeling them accurately can be beneficial for power-users who want to work with a specific signal(?).

I suppose labeling heart-beat is also nice, cuz hypothetically someone can use EEG and back out "EOG" and "heart-rate" data.

@hoechenberger
Copy link
Member

For me personally having EOG and ECG labeled as such would be very beneficial

@adam2392
Copy link
Member Author

Okay, I was on the border, so I'm convinced :p. I suppose we should just follow the ICLabel set of labels then.

@hoechenberger do you know a good pipeline accessible in MNE that we can annotate ICA components like this? I think the current GUI just marks "include/exclude".

@mscheltienne
Copy link
Member

I think ocular and heartbeat could be labeled as such, as both are super easy to spot. For the others.. I'm not sure there is much gain, and the distinction between 'channel noise' and 'muscle' might be a bit difficult to do.
Let's keep in mind that the first objective of those datasets that we will label is to benchmark the models/methods. The training part requires a larger labelled dataset, e.g. MARA or swap4ica, which are labeled binary with brain / no brian only.

@hoechenberger
Copy link
Member

@hoechenberger do you know a good pipeline accessible in MNE that we can annotate ICA components like this? I think the current GUI just marks "include/exclude".

I'm not exactly sure I fully understand the question; what I do (and what we do in the MNE-BIDS-Pipeline) is that we run ICA.find_bads_eXg() on epochs centered around heartbeat or ocular artifacts (created via create_eXg_epochs()), and then we can select the best-matching components based on the score these functions return.

@mscheltienne
Copy link
Member

We are looking for a nice GUI, simple to use that displays a couple of features per component (topoplots, PSD, time-series, ...) and lets you select which category these components are; and then we want to adapt it to load and save simply our files and ICs in a format that we can use for benchmarking or training.
Nothing automatic, just some good old interactive buttons :)

@hoechenberger
Copy link
Member

Honestly I think I would cook up an extremely simple Qt window that:

  • embeds the ICA.plot_properties() figure
  • places a couple buttons below or next to it
  • appends the selection (via button press) to a TSV file

In the MNE-BIDS-Pipeline we create TSVs like this:

component       type    description     status  status_description
0       ica     Independent Component   good    n/a
1       ica     Independent Component   good    n/a
2       ica     Independent Component   good    n/a
3       ica     Independent Component   bad     Auto-detected ECG artifact
4       ica     Independent Component   bad     Auto-detected ECG artifact
5       ica     Independent Component   good    n/a
6       ica     Independent Component   good    n/a
7       ica     Independent Component   good    n/a
8       ica     Independent Component   bad     Auto-detected EOG artifact
9       ica     Independent Component   good    n/a
10      ica     Independent Component   bad     Auto-detected EOG artifact
11      ica     Independent Component   good    n/a

@mscheltienne
Copy link
Member

That's what I had in mind as well. Maybe we can re-use some widget or stuff from mnelab (I did not look into it yet).

@adam2392
Copy link
Member Author

Reference to the ICA step that creates the "derived" files: https://github.com/mne-tools/mne-bids-pipeline/blob/main/scripts/preprocessing/_04a_run_ica.py

@adam2392
Copy link
Member Author

Now that #60 has been merged, the next steps are to:

  1. prototype a GUI that essentially calls the API for updating the file
  2. test the GUI on a dataset to see if it functions properly

Ideally we can get this done by IIRC we said last meeting early June(?), say June 10th? Need this pipeline to make the hs student able to annotated during the month of July.

for point 1. above, we'll need to make IO safe calls to update a tsv file from an interactive GUI, but otw I don't think too many technical challenges. The visualization aspects for the ICA, we can maybe borrow from MNE, or MNE-BIDS-Pipeline, or just embed the ICA.plot_properties() figure statically and then have a few buttons?

@mscheltienne
Copy link
Member

This dataset can be used for testing: https://openneuro.org/datasets/ds004132/versions/1.0.0
It's 64 channels ANT device recordings from 19 people (I will update it with more recordings).

I'll add soon a 200+ EGI device dataset.

@adam2392 adam2392 changed the title Creating a highly informative example/jupyter notebook for IC labeling API and GUI: Creating a highly informative example/jupyter notebook for IC labeling Jun 1, 2022
@adam2392
Copy link
Member Author

adam2392 commented Jun 1, 2022

TODO:

  • Confirm correctness of ANT BIDS conversion
  • Confirm correctness of Mara BIDS conversion
  • Spin up prototype of GUI and ping Mathieu

@adam2392
Copy link
Member Author

I've been lagging @mscheltienne with some stuff that's come up. Apologies! Hoping to put up a sketch of the code at the very least tonight tho so we can take a "look together".

@adam2392
Copy link
Member Author

Data specs:

  • maximum of 1 minute length (maybe 2 minutes), but we should fix an upper bound, so the default of the GUI plots it. @mscheltienne maybe you can link the post on the MNE forum you mentioned
  • ^ then save the ICA instances computed (see example below)

Workflow specs:

  • open up BIDS file corresponding to the ICA + the Raw/Epochs instance (at most 1 minute lengths)
  • show the plot_properties plots with 1 minute of the time-series by default
  • label components with color-coded labels
  • closing the window will now save back to the BIDs files.
  • update script to do the next file

Ideally we can embed the time-series plot using a pyQT GUI which is dynamic (e.g. mne-qt-browser does this for Raw).

cc: @anandsaini024 @mscheltienne

@adam2392
Copy link
Member Author

adam2392 commented Jun 15, 2022

Dataset tasks

Mara dataset:

  • just need to pipe the ICA decomposition to BIDS
  • Unique cuz we already have "labels", so just need BIDS formatting

ANTS (to preprocess and label):

  • filtering the Raw and labeling bad electrodes, rejecting EEG bridge channeling, running pyprep
  • running ICA and saving to BIDS

Nihon Kohden Epilepsy EEG dataset (to preprocess and label):

  • if we have Raw, prolly want to run the same preprocessing as ANTS

Let's converge a pipeline for every dataset:

E.g. https://github.com/mscheltienne/neurotin-analysis/tree/main/neurotin/preprocessing

TODO:

  • @mscheltienne and @adam2392 to converge on the GUI prototype
  • @mscheltienne or @anandsaini024 to send over public script for the entire raw and ICA preprocessing they've done already, so we can hopefully generalize to apply to the other datasets in a similar fashion
  • @adam2392 to spec out basic script that runs the GUI for different datasets to label components

Lmk if I missed anything?

@mscheltienne
Copy link
Member

mscheltienne commented Jun 15, 2022

Epilepsy EEG -> EGI (different manufacturer, sponge-based electrode)?

@adam2392
Copy link
Member Author

Epilepsy EEG -> EGI (different manufacturer, sponge-based electrode)?

Epilepsy EEG is Nihon Kohden and not sure on the electrode :p.

@mscheltienne
Copy link
Member

mscheltienne commented Jun 15, 2022

Oh OK ahah

For the preprocessing pipeline, this is what I would go for:

import itertools

import mne
import pyprep
from mne.preprocessing import ICA, compute_bridged_electrodes


def preprocess_ANT_dataset(raw):
    """Automatic preprocessing pipeline for ANT dataset."""
    # Bandpass standard filter
    # 100 Hz edge shhould retain muscle activity.
    bandpass = (1., 100.)  # Hz
    raw.filter(
        l_freq=bandpass[0],
        h_freq=bandpass[1],
        picks="eeg",
        method="fir",
        phase="zero-double",
        fir_window="hamming",
        fir_design="firwin",
        pad="edge",
    )
    # Set montage
    raw.set_montage("standard_1020")
    
    # Detect bad channels, on a copy
    nc = pyprep.find_noisy_channels.NoisyChannels(raw.copy())
    nc.find_all_bads()
    raw.info["bads"] = nc.get_bads()
    
    # Look for bridged electrodes
    # requires MNE >= 1.1.0
    bridged_idx, _ = compute_bridged_electrodes(raw)
    ch_names = [raw.ch_names[k] for k in itertools.chain(*bridged_idx)]
    raw.info["bads"] = list(set(raw.info["bads"] + ch_names))
    
    # CAR reference
    # The reference channel CPz is not added as it would just introduce an
    # additional dependency and is not required for the ICA.
    # The CAR reference excludes the bad channels.
    raw.set_eeg_reference("average", ch_type="eeg", projection=False)
    
    # Fit ICA decomposition on good channels
    # Fit on n_good_channels-1 because of CAR reference.
    picks = mne.pick_types(raw.info, eeg=True, exclude="bads")
    ica = ICA(n_components=picks.size - 1, method="picard")
    ica.fit(raw, picks=picks)
    
    return raw, ica

And here is the post with the rule of thumb for the number of samples to provide: https://mne.discourse.group/t/ideal-length-of-epochs-ideal-number-of-components-for-ica/4865/2
Lucky I had it bookmark :)

@adam2392
Copy link
Member Author

And here is the post with the rule of thumb for the number of samples to provide: https://mne.discourse.group/t/ideal-length-of-epochs-ideal-number-of-components-for-ica/4865/2 Lucky I had it bookmark :)

The discourse post seems to suggest 1-2 second epochs? However, we want to do 1 minute right?

@mscheltienne
Copy link
Member

That is for cleaning by rejecting epochs, the part that interest us is:

The only thing that matters is the total number of points you provide. In general, it is best to use as much data as possible, but in practice this is of course not possible. As a rule of thumb, for N stable components you need at least kN2 points for each channel, where N is the number of channels and k is a multiplier which depends on (and increases with) N (see Indep. Comp. Analysis - EEGLAB Wiki 1 for more details).

@adam2392
Copy link
Member Author

@anandsaini024 and @mscheltienne I made a new repo to house our common data processing and also final analysis scripting, so we don't have to keep sending gists to each other. I invited you guys as admins on the repo.

I added some details of our discussion to the README.md. Please take a look and lmk if you have any thoughts/comments? Thanks!

@adam2392
Copy link
Member Author

Can be closed now by the introduction of the GUI and API for labeling: #66

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants