Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect start positions #4

Open
Mchicken1988 opened this issue Mar 8, 2023 · 2 comments
Open

Incorrect start positions #4

Mchicken1988 opened this issue Mar 8, 2023 · 2 comments

Comments

@Mchicken1988
Copy link

Hi guys,

I would like to use megadepth for quantifying intron retention events in TCGA patients. When using get_coverage(), I noticed that the start position of all ranges is shifted by 1 nucleotide, which is most likely due to the BED file being 0-based. However, I would expect that in the final GRanges object, the start positions are not 0-based, as these represent true genomic coordinates.

Here is some example where I create a GRanges object with two introns, save them as a BED-File and use the BED-File together with a BigWig-File as input for get_coverage().

library(GenomicRanges)
library(rtracklayer)
library(megadepth)

gr <- GRanges(Rle(c("chr2", "chr15")),
              IRanges(c(74528490,84860073), c(74528750, 84862634)),
              Rle(strand(c("-", "+"))))

export(gr, "/Users/mariokeller/projects/IR_Normoxia_Hypoxia/qmd/introns.bed")

get_coverage("http://duffel.rail.bio/recount3/human/data_sources/tcga/base_sums/SO/MESO/D6/tcga.base_sums.MESO_beb37a2d-e8b0-4674-89ac-dc0f34e0afd6.ALL.bw",
             "mean",
             "/Users/mariokeller/projects/IR_Normoxia_Hypoxia/qmd/introns.bed")

As you can see below the start position of the ranges is shifted by 1 nucleotide, which is not the behavior I would expect.

GRanges object with 2 ranges and 1 metadata column:
      seqnames            ranges strand |     score
         <Rle>         <IRanges>  <Rle> | <numeric>
  [1]     chr2 74528489-74528750      * |      5.01
  [2]    chr15 84860072-84862634      * |      1.40

I can easily adjust this on my own but was wondering whether the additional nucleotide is used for the calculation of the mean coverage.

Best,
Mario

@Mchicken1988 Mchicken1988 changed the title [BUG] Your bug or feature request Incorrect start positions Mar 8, 2023
@lcolladotor
Copy link
Member

Hi,

Thank you for your interest in our work.

In R, and particularly in GRanges objects from the GenomicRanges package, all genomic positions are 1 based and follow R rules. Aka: [1, x]. See how rtracklayer::import() on a BED file internally changes BED position coordinates to R ones.

Best,
Leo

@Mchicken1988
Copy link
Author

Hi Leo,

thanks for the super quick reply. I'm still a little bit confused.

As you said, rtracklayer::import() adjusts the coordinates. Shouldn't I expect megadepth to do the same? Right now, the ranges of the GRanges object reported by get_coverage() have an extra nucleotide at the left end, which is not present in my original ranges.

Best,
Mario

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants