Skip to content

improvements to XNAT data fetching #153

@laurencejackson

Description

@laurencejackson

Currently the XNAT data is fetched using a set of functions that are specific to csc-mlops.

It would be good to investigate whether we can create a class that inherits from the base torch dataset type to facilitate integration with other torch tools. This would reduce a lot of boilerplate and let us roll in validation functions into the dataset object.

e.g. something like this allows us to use multiple xnat projects easily, we can inherit most of the dataet functionality from CacheDatset (the cache can be disabled by setting the cache_rate to 0.0).

from monai.util.data import CacheDataset

class XNATDataset(CacheDatset):
    def __init__(self, xnat_configuration, **kwargs)
        super etc

The dataset could include functions for validating data (checking all subjects return appropriate data objects etc). Then could be used like this:

from mlops.data import XNATDataset

training_data = XNATDataset(project_name, actions, xnat_configuration, transforms, workers, etc)

test_data = XNATDataset(holdout_data_project_name, actions, xnat_configuration, transforms, workers, etc)

train_dl = Dataloader(training_data)

This would require some exploratory work to check it all looks good at works with pytorch lightning/monai etc but would be really useful in simplifying the Datamodule structure.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions