Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repartition gene model release tables #1672

Merged
merged 4 commits into from
Jan 15, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 17 additions & 5 deletions browser/help/topics/v4-browser-hts.md
Original file line number Diff line number Diff line change
Expand Up @@ -182,12 +182,15 @@ Row fields:
- `xstart`: Transcript genomic start position (format: chromosomeposition).
- `xstop`: Transcript genomic stop position (format: chromosomeposition).
- `exons`: Array containing transcript exon information.
- `feature_type`: Exon type (e.g., CDS).
- `start`: Exon genomic start position (position only).
- `stop`: Exon genomic stop position (position only).
- `xstart`: Exon genomic start position (format: chromosomeposition).
- `xstop`: Exon genomic start position (format: chromosomeposition).
- `feature_type`: Exon type (e.g., CDS).
- `start`: Exon genomic start position (position only).
- `stop`: Exon genomic stop position (position only).
- `xstart`: Exon genomic start position (format: chromosomeposition).
- `xstop`: Exon genomic start position (format: chromosomeposition).
rileyhgrant marked this conversation as resolved.
Show resolved Hide resolved
- `reference_genome`: Reference genome associated with this transcript.
- `gtex_tissue_expression`: Array containing [GTEx](https://gtexportal.org/home/) v10 information.
- `tissue`: The tissue type, e.g. 'brain_cerebellum'.
- `value`: The Transript Per Million (TPM) value associated with the tissue.
- `refseq_id`: Transcript RefSeq ID.
- `refseq_version`: RefSeq version.
- `hgnc_id`: HGNC gene ID.
Expand All @@ -208,6 +211,15 @@ Row fields:
- `ensembl_version`: Ensembl version.
- `refseq_id`: Transcript RefSeq ID.
- `refseq_version`: RefSeq version.
- `pext`: Struct containing [pext](https://gnomad.broadinstitute.org/help/pext) information.
- `regions`: Array containing pext information by region.
- `chrom`: The chromosome in which the region is located.
- `start`: Region genomic start position (position only).
- `stop`: Region genomic stop position (position only).
- `mean`: Mean expression across all tissues for the region.
- `tissues`: Array containing tissue information.
- `tissue`: The tissue type, e.g. 'brain_cerebellum'.
- `value`: The pext score for the tissue in the region.
- `preferred_transcript_id`: Transcript shown on the gene page by default. Field contains MANE Select transcript ID if it exists, otherwise contains Ensembl canonical transcript ID.
- `preferred_transcript_source`: Source of transcript ID used for `preferred_transcript_id` field; either "`mane_select`" or "`ensembl_canonical`".
- `gnomad_constraint`: Struct containing gnomAD constraint information for gene. Struct is only present on the GRCh37 Hail Table.
Expand Down
2 changes: 1 addition & 1 deletion browser/src/DataPage/GnomadV2Downloads.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -303,7 +303,7 @@ const GnomadV2Downloads = () => {
<ListItem>
<GetUrlButtons
label="Browser GRCh37 gene models Hail Table"
path="/resources/grch37/browser/gnomad.genes.GRCh37.GENCODEv19.ht"
path="/resources/grch37/browser/gnomad.genes.GRCh37.GENCODEv19.pext.ht"
/>
</ListItem>
</FileList>
Expand Down
2 changes: 1 addition & 1 deletion browser/src/DataPage/GnomadV4Downloads.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -293,7 +293,7 @@ const GnomadV4Downloads = () => {
<ListItem>
<GetUrlButtons
label="Browser GRCh38 gene models Hail Table"
path="/resources/grch38/browser/gnomad.genes.GRCh38.GENCODEv39.ht"
path="/resources/grch38/browser/gnomad.genes.GRCh38.GENCODEv39.pext.ht"
/>
</ListItem>
</FileList>
Expand Down
1 change: 1 addition & 0 deletions browser/src/GenePage/GenePage.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -574,6 +574,7 @@ const GenePage = ({ datasetId, gene, geneId }: Props) => {
includeNonCodingTranscripts={includeNonCodingTranscripts}
includeUTRs={includeUTRs}
zoomRegion={zoomRegion}
hasOnlyNonCodingTranscripts={!hasCodingExons && hasNonCodingTranscripts}
/>
)}
</RegionViewer>
Expand Down
11 changes: 8 additions & 3 deletions browser/src/GenePage/VariantsInGene.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ type OwnVariantsInGeneProps = {
start: number
stop: number
}
hasOnlyNonCodingTranscripts?: boolean
}

// @ts-expect-error TS(2456) FIXME: Type alias 'VariantsInGeneProps' circularly refere... Remove this comment to see the full error message
Expand All @@ -97,6 +98,7 @@ const VariantsInGene = ({
includeUTRs,
variants,
zoomRegion,
hasOnlyNonCodingTranscripts,
}: VariantsInGeneProps) => {
const datasetLabel = labelForDataset(datasetId)

Expand Down Expand Up @@ -134,9 +136,12 @@ const VariantsInGene = ({
<Badge level={includeNonCodingTranscripts || includeUTRs ? 'warning' : 'info'}>
{includeNonCodingTranscripts || includeUTRs ? 'Warning' : 'Note'}
</Badge>{' '}
Only variants located in or within 75 base pairs of a coding exon are shown here. To see
variants in UTRs or introns, use the{' '}
<Link to={`/region/${gene.chrom}-${gene.start}-${gene.stop}`}>region view</Link>.
{hasOnlyNonCodingTranscripts && <>This gene has no coding transcripts. </>}
Only variants located in or within 75 base pairs of{' '}
{!hasOnlyNonCodingTranscripts ? <>a coding exon (CDS)</> : <>an exon</>} are shown here.
To see variants {!hasOnlyNonCodingTranscripts ? <>in UTRs or introns</> : <>in introns</>}
, use the <Link to={`/region/${gene.chrom}-${gene.start}-${gene.stop}`}>region view</Link>
.
</p>
<p>
The table below shows the HGVS consequence and VEP annotation for each variant&apos;s most
Expand Down
2 changes: 1 addition & 1 deletion data-pipeline/src/data_pipeline/data_types/gene.py
Original file line number Diff line number Diff line change
Expand Up @@ -234,7 +234,7 @@ def prepare_gene_table_for_release(genes_path, keep_mane_version_global_annotati
else:
ds = ds.select_globals()

ds = ds.repartition(50)
ds = ds.repartition(100)
return ds


Expand Down
4 changes: 2 additions & 2 deletions data-pipeline/src/data_pipeline/pipelines/genes.py
Original file line number Diff line number Diff line change
Expand Up @@ -401,7 +401,7 @@ def annotate_with_preferred_transcript(table_path):
pipeline.add_task(
"prepare_grch37_genes_table_for_public_release",
prepare_gene_table_for_release,
f"/{genes_subdir}/gnomad.browser.GRCh37.GENCODEv19.ht",
f"/{genes_subdir}/gnomad.browser.GRCh37.GENCODEv19.pext.ht",
{
"genes_path": pipeline.get_task("annotate_grch37_genes_step_5"),
},
Expand Down Expand Up @@ -489,7 +489,7 @@ def annotate_with_constraint(genes_path, constraint_path):
pipeline.add_task(
"prepare_grch38_genes_table_for_public_release",
prepare_gene_table_for_release,
f"/{genes_subdir}/gnomad.browser.GRCh38.GENCODEv39.ht",
f"/{genes_subdir}/gnomad.browser.GRCh38.GENCODEv39.pext.ht",
{
"genes_path": pipeline.get_task("remove_grch38_genes_constraint_for_release"),
},
Expand Down
Loading