Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 36 additions & 3 deletions datasets/openneuro/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@ Download all raw anatomical images, JSON sidecars, and participant tables from t
```bash
aws s3 sync --no-sign-request s3://openneuro.org/ data/openneuro \
--exclude '*' \
--include '*T1w.*' \
--include '*T2w.*' \
--include '*FLAIR.*' \
--include '*_T1w.*' \
--include '*_T2w.*' \
--include '*_FLAIR.*' \
--include '*participants.*' \
--exclude '*bidsignore*' \
--exclude '*derivatives*' \
Expand All @@ -36,3 +36,36 @@ To re-compute the indexes, run
uv run scripts/index_images.py
uv run scripts/index_participants.py
```

## Curation

The [`scripts/data_curation.ipynb`](scripts/data_curation.ipynb) notebook performs a data exploration and curation. The final counts are:

| datasets | subjects | images | T1w | T2w | FLAIR |
|:----------:|:----------:|:--------:|:-----:|:-----:|:-------:|
| 939 | 39143 | 64287 | 51591 | 9159 | 3537 |

The curated file list is at:

- [`metadata/openneuro_include_filelist.txt`](metadata/openneuro_include_filelist.txt)

The filter criteria are:

- file size between 1MB and 60MB
- min voxel size >= 0.3mm
- X, Y (in-plane) voxel size <= 1.5mm
- Z axis voxel size <= 3mm
- X, Y, Z axis length between 120mm and 260mm

(These criteria were chosen by eye-balling the sensible outlier cutoffs for each case.)

## Notes

OpenNeuro includes several datasets that are well-known in their own right and often used individually:

- AOMIC-ID1000: [ds003097](https://openneuro.org/datasets/ds003097)
- DLBS: [ds004856](https://openneuro.org/datasets/ds004856)
- SOOP: [ds004889](https://openneuro.org/datasets/ds004889)
- QTIM: [ds004169](https://openneuro.org/datasets/ds004169)

DLBS (ds004856) and the Neurocognitive aging data release (ds003592) are good options for hold-out test sets for brain-age prediction. Both datasets have a large number of subjects spanning a wide age range, and no pathology.
89 changes: 89 additions & 0 deletions datasets/openneuro/metadata/openneuro_exclude_datasets.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# non-human species
# https://docs.google.com/spreadsheets/d/1rsVlKg0vBzkx7XUGK4joky9cM8umtkQRpJ2Y-5d6x7c/edit
- ds001981
- ds002134
- ds002307
- ds002374
- ds002551
- ds002868
- ds002870
- ds002885
- ds003325
- ds003463
- ds003646
- ds003647
- ds003830
- ds003928
- ds003929
- ds003959
- ds003989
- ds004092
- ds004114
- ds004116
- ds004125
- ds004127
- ds004145
- ds004161
- ds004254
- ds004265
- ds004305
- ds004402
- ds004441
- ds004465
- ds004509
- ds004598
- ds004620
- ds004632
- ds004644
- ds004738
- ds004784
- ds004797
- ds004819
- ds004913
- ds004959
- ds004962
- ds005077
- ds005093
- ds005137
- ds005186
- ds005233
- ds005236
- ds005402
- ds005424
- ds005431
- ds005467
- ds005496
- ds005497
- ds005521
- ds005534
- ds005605
- ds005635
- ds005636
- ds005687
- ds005688
- ds005839
- ds005895
- ds006123
- ds006218
- ds006269
- ds006366
- ds006402
- ds006407
- ds006613
- ds006663
- ds006670
- ds006691
- ds006721
- ds006746
- ds007028
- ds007195
- ds007392

# bad/missing data
- ds003710 # empty
- ds001365 # negative ages
- ds006224 # empty

# duplicated datasets
- ds001966 # duplicate of ds003012
- ds005595 # subset of HCP-A
162,128 changes: 81,064 additions & 81,064 deletions datasets/openneuro/metadata/openneuro_images.csv

Large diffs are not rendered by default.

Loading