-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistencies with R SummarizedExperiment and RangedSummarizedExperiment #77
Comments
This behavior is managed differently in the R package through a dedicated constructor method. However, in the Python package, there is no equivalent constructor function. Users are expected to directly instantiate objects by invoking the appropriate class constructors.
Given the complicated relationship between row_data and row_ranges in RangedSummarizedExperiment, we decided not to replicate this behavior in the Python versions of these classes. There is an open issue in the Bioconductor package’s GitHub repository that addresses the inconsistent behavior during mutations of these objects.
it's not immediately clear why they require validation. I don't believe the R implementations enforce validation either. These names are primarily human-readable identifiers for intervals or regions; they are not supposed to be indices.
|
The However, since the |
Making SummarizedExperiments
In Bioconductor there is no RangedSummarizedExperiment that does not have row ranges, if you try to build a RangedSummarizedExperiment without rowRanges in Bioconductor you get a SummarizedExperiment back instead. I think in BiocPy the best way to handle this issue is to make row_ranges required in RangedSummarizedExperiment and/or a SummarizedExperiment/RangedSummarizedExperiment constructor function.
Row data in RangedSummarizedExperiment is incorrect:
In Bioconductor the rowData accessor calls the mcols accessor on the SummarizedExperiment: https://github.com/Bioconductor/SummarizedExperiment/blob/c5b7ca2f8d975af13a18b3b5931449f092657f5c/R/SummarizedExperiment-class.R#L121
In Bioconductor the mcols accessor of RangedSummarizedExperiment calls the mcols accessor of the rowRanges : https://github.com/Bioconductor/SummarizedExperiment/blob/c5b7ca2f8d975af13a18b3b5931449f092657f5c/R/RangedSummarizedExperiment-class.R#L215
Thus in Bioconductor the rowData of a RangedSummarizedExperiment is the mcols of the rowRanges.
In BiocPy the get_row_data method accesses the _rows BiocFrame attribute of the RangedSummarizedExperiment. It is likely necessary to remove the row_data init parameter of RangedSummarizedExperiment as part of fixing this issue, alternatively you could have any supplied row_data be column concatenated to the mcols of the required row_ranges.
There are 4+ row indices in RangedSummarizedExperiment
This relates to the issue raised about GenomicRanges: BiocPy/GenomicRanges#121
The IRanges of the GenomicRanges has a names index, the GenomicRanges object has a names index, the mcols of the GenomicRanges has a row_names index, the row_data of the RangedSummarizedExperiment has a row_names index. None of these are cross validated to ensure they have the same values in the same order, this will likely lead to confusion by users in the future. Which one of these is the correct one? It would be good to resolve this upon object construction.
The text was updated successfully, but these errors were encountered: