Skip to content

Commit

Permalink
Data release/add v5 release (AlexsLemonade#127)
Browse files Browse the repository at this point in the history
* 📝 update release notes for v3

* 🔧 download script for v3-release

* 🔧 update release-notes.md for v5

* 🔧 update download-data.sh for v5

* ✨ add header for pbta-fusion-arriba.tsv.gz

* ✨ add header for pbta-fusion-starfusion.tsv.gz

* 🔧 update release-notes.md for v5, update folder structure

* 🔧 update release-notes.md; add rsem-isoform counts
  • Loading branch information
yuankunzhu authored and jaclyn-taroni committed Sep 25, 2019
1 parent 510b76e commit a3a77ad
Show file tree
Hide file tree
Showing 4 changed files with 116 additions and 2 deletions.
29 changes: 29 additions & 0 deletions doc/format/arriba-tsv-header.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
```
1 gene1
2 gene2
3 strand1.gene.fusion.
4 strand2.gene.fusion.
5 breakpoint1
6 breakpoint2
7 site1
8 site2
9 type
10 direction1
11 direction2
12 split_reads1
13 split_reads2
14 discordant_mates
15 coverage1
16 coverage2
17 confidence
18 closest_genomic_breakpoint1
19 closest_genomic_breakpoint2
20 filters
21 fusion_transcript
22 reading_frame
23 peptide_sequence
24 read_identifiers
25 tumor_id
26 gene1--gene2
27 annots
```
28 changes: 28 additions & 0 deletions doc/format/starfusion-tsv-header.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
```
1 FusionName
2 JunctionReadCount
3 SpanningFragCount
4 SpliceType
5 LeftGene
6 LeftBreakpoint
7 RightGene
8 RightBreakpoint
9 LargeAnchorSupport
10 FFPM
11 LeftBreakDinuc
12 LeftBreakEntropy
13 RightBreakDinuc
14 RightBreakEntropy
15 annots
16 CDS_LEFT_ID
17 CDS_LEFT_RANGE
18 CDS_RIGHT_ID
19 CDS_RIGHT_RANGE
20 PROT_FUSION_TYPE
21 FUSION_MODEL
22 FUSION_CDS
23 FUSION_TRANSL
24 PFAM_LEFT
25 PFAM_RIGHT
26 tumor_id
```
59 changes: 58 additions & 1 deletion doc/release-notes.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,57 @@
# release notes
## current release
### release-v5-20190924
- release date: 2019-09-24
- status: available
- changes:
- [Separated RNA-Seq files](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/121):
- Created separate RDS files for stranded and polyA RNA-Seq samples
- [new RNA-Seq counts files](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/14):
- Added RSEM count matrices for genes and transcripts
- [new ARRIBA file](https://github.com/AlexsLemonade/OpenPBTA-analysis/pull/92#discussion_r324873300):
- Add `annots` column header which was removed during FusionAnnotator run
- [new SNV files](https://github.com/AlexsLemonade/OpenPBTA-analysis/pull/114):
- Add Lancet VEP-annotated MAF
- Add VarDict VEP-annotated MAF
- [new BED interval files](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/3):
- Add WXS BED (same file used for each variant caller)
- Add WGS BED files for each variant caller
- Methods described [here](https://github.com/AlexsLemonade/OpenPBTA-manuscript/blob/master/content/03.methods.md).

- folder structure:
```
data
└── release-v5-20190924
├── CHANGELOG.md
├── md5sum.txt
├── WGS.hg38.lancet.300bp_padded.bed
├── WGS.hg38.lancet.unpadded.bed
├── WGS.hg38.mutect2.unpadded.bed
├── WGS.hg38.strelka2.unpadded.bed
├── WGS.hg38.vardict.100bp_padded.bed
├── WXS.hg38.100bp_padded.bed
├── pbta-cnv-cnvkit.seg.gz
├── pbta-cnv-controlfreec.seg.gz
├── pbta-fusion-arriba.tsv.gz
├── pbta-fusion-starfusion.tsv.gz
├── pbta-histologies.tsv
├── pbta-snv-mutect2.vep.maf.gz
├── pbta-snv-strelka2.vep.maf.gz
├── pbta-sv-lumpy.tsv.gz
├── pbta-sv-manta.tsv.gz
├── pbta-gene-expression-kallisto.polya.rds
├── pbta-gene-expression-kallisto.stranded.rds
├── pbta-gene-expression-rsem-fpkm.polya.rds
├── pbta-gene-expression-rsem-fpkm.stranded.rds
├── pbta-gene-counts-rsem-expected_count.polya.rds
├── pbta-gene-counts-rsem-expected_count.stranded.rds
├── pbta-isoform-counts-rsem-expected_count.polya.rds
├── pbta-isoform-counts-rsem-expected_count.stranded.rds
├── pbta-snv-lancet.vep.maf.gz
└── pbta-snv-vardict.vep.maf.gz
```

## archived release
### release-v4-20190909
- release date: 2019-09-10
- status: available
Expand Down Expand Up @@ -28,7 +80,8 @@ data
├── pbta-snv-mutect2.vep.maf.gz
├── pbta-snv-strelka2.vep.maf.gz
├── pbta-sv-lumpy.tsv.gz
└── pbta-sv-manta.tsv.gz
├── pbta-sv-manta.tsv.gz
└── README.md
```

## archived release
Expand Down Expand Up @@ -104,3 +157,7 @@ data
├── strelka2.maf.gz
└── tumor-normal-pair.tsv
```




2 changes: 1 addition & 1 deletion download-data.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ set -o pipefail

# Use the OpenPBTA bucket as the default.
URL=${URL:-https://s3.amazonaws.com/kf-openaccess-us-east-1-prd-pbta/data}
RELEASE=${RELEASE:-release-v4-20190909}
RELEASE=${RELEASE:-release-v5-20190924}

# The md5sum file provides our single point of truth for which files are in a release.
curl --create-dirs $URL/$RELEASE/md5sum.txt -o data/$RELEASE/md5sum.txt -z data/$RELEASE/md5sum.txt
Expand Down

0 comments on commit a3a77ad

Please sign in to comment.