sashimiDataConstants()
errorverbose
not found, fixed by repairing the variable after assigning to the custom environment.
-
Moved functions into separate files for better code review.
-
getGRcoverageFromBw()
- When it fails to download a bigwig file, it prints the error which is not always informative. Now it also prints the warning, which can tell whether the URL was available, or another error was caused by this step.
- Removed
shinyWidgets::setShadow()
to prevent the warning message saying this function is being deprecated. - Fixed bug
"width() not found"
caused by missing package prefix.
-
Help docs for
launchSashimiApp()
now include Troubleshooting- It includes two potential workarounds for errors caused by failure to download the coverage file from a remote server.
-
Updates to address Shiny app error:
"object 'covNameL' not found"
-
This bug occurs when there is no coverage, and seems to occur when the connection to the server coverage files is not available.
-
combineGRcoverage()
was updated to sidestep thecovNameL
error, although it may not fully resolve the underlying problem of missing coverage data. Tests are ongoing, since the underlying problem is difficult to reproduce. Theories:- Server file cache reached its maximum and no longer permits new cache files from being stored.
- Previously saved cache file may be corrupted, although errors are intended to cause the cache to be invalidated, therefore refreshing the data from the server.
- Disconnection from the server that provides coverage and junction data.
-
Miscellaneous minor updates to ensure
data.frame
subsets all includedrop=FALSE
where relevant. Minor, but may help with odd cases where server data is not as expected.
-
-
grl2df()
- Finally fixed the junction stacking for negative strands, see Nectin3 using default farrisdata.
The method of labeling exons was updated to enforce placement below
exons, and to expand the ggplot2 whitespace below the exons using
ggplot2::expansion()
which uses a multiple of plot height.
-
With ggplot2 version 3.5 and higher,
trans
objects are deprecated in favor oftransform
objects. The conversionas.transform()
is buggy, it fails when provided atrans
object to convert totransformation
- which can happen when using R-shiny cached data withlaunchSashimiApp()
, which stores objects in subfoldersashimi_memoise/
. These objects may have storedtrans
objects, which s -
Many functions were modified not to use
require(package)
- Major changes to underlying R code which could cause errors for any un-prefixed package function call.
- Numerous changes to add package prefixing, packages including
jamba
,GenomicRanges
,IRanges
,S4Vectors
,shiny
, others. I was really lax originally, usingrequire()
to avoid proper prefixing. - Full tests were completed, mainly running the live R-shiny app
launchSashimiApp()
which touches nearly every aspect of this package.
-
make_ref2compressed()
returns the output fromscales::trans_new()
which changed output fromtrans
totransform
in newer versions ofscales
, consistent withggplot2
version 3.5 and higher, see previous point.- The output of
make_ref2compressed()
with newer versions ofscales
andggplot2
includes atransform
object. - Fortunately, all
splicejam
functions can make equivalent use oftrans
ortransform
objects, provided they were created with the same R package versions forscales
andggplot2
. - When in doubt, if observing errors
"covNamesL not found"
or similar, try deleting memoise cache subfolders, update R packages, then try again.
- The output of
-
gene2gg()
- The method for rendering exon labels was updated:- It uses
expand=ggplot2::expansion()
to extend the y-axis by a multiple of the plot height, making it more reliable to add enough whitespace to fit exon labels. - Exon labels are pushed slightly lower than before (-0.2 instead of -0.1),
and use
ylim
to constrain all labels below the exon. Previously with many exon labels, they would be pushed up or down, making a tangled mess. - Exons labels use
min.segment.length=0
and always draws a line segment. - In principle, even for genes like
Zbtb20
, the exon labels should be much more visible, positioned below the exon plot, and will use two rows to minimize overlapping labels.
- It uses
-
plotSashimi()
- New optional argument
junc_nudge_pct
to control thenudge_y
placement of junction labels relative to the top edge of the junction ribbon. - Previously, labels may randomly be placed below or above the top edge of the ribbon.
- The new default
junc_nudge_pct=0.05
slightly moves labels up by 5% the max y-axis junction label position, which makes all labels appear consistently above the ribbon. - Negative values will place the label below the top edge of the ribbon.
- A vector with multiple values can be supplied, in order to control the position of each individual label.
- New optional argument
-
gene2gg()
- Added
nudge_y=-0.2
by default (not configurable) to help position exon labels below the gene structure. - New default
hjust=0.5
changed fromhjust=-0.2
which caused labels to be slightly shifted left. (I do not think the previous effect was actually intended.)
- Added
readGtf()
is a small helper function to load GTF/GFF3 files, called bymakeTx2geneFromGtf()
anddescribeGtfAttrNames()
.describeGtfAttrNames()
is a simple helper function to print a quickdata.frame
for eachsource
(column 2) in a GTF/GFF3 file, the attribute names and values observed in column 9.
-
makeTx2geneFromGtf()
- Updated to handle
gff3
format in addition togtf
format. - new argument
zcat_command
used to specify the call tozcat
orgzcat
, only used when input is gzipped andR.utils
is not installed. - Fixed issue when transcript attribute names are not consistently represented on all rows in gtf/gff3 output. Formerly the output fields would contain the full semicolon-delimited values from column 9, instead of replacing unaltered entries with '' empty string.
- Updated to handle
-
bgaPlotly3d()
- A more complete refactor of this function is in progress,
to simplify the code, to allow exporting individual
data.frame
objects for the visual components displayed in plotly. - The original function used
rgl
, and was ported toplotly
to help render on Rmarkdown documents in HTML form. However, thergl
package can also create HTML export, and makes slightly nicer figures, at the expense of not having customized mouseover text labels. (Even though labels are incredible difficult to customize in plotly.) Thergl
package has additional 3D shapes that could help when rendering the cross-group block arrows, and could be useful to apply shape to experiment design factors. - A plotly error occurred when
drawSegments="genes"
due to mismatch in expected values in column"textposition"
during plotly rendering. This error may have occurred during a recent plotly update, the workaround is simply not to include"textposition"
during display of gene vectors, which causes them to default to usetextposition="top center"
. geneColor
can now be a named vector, to allow colorizing individual genes by name.
- A more complete refactor of this function is in progress,
to simplify the code, to allow exporting individual
- Issue #3 reported
jamGeomean()
was not honoringna.rm=TRUE
, this issue has been resolved. Alsogeomean()
was updated to ensure thatna.rm=TRUE
also results in returning a non-NA value, except in cases where all input values areNA
in which case it returnsNaN
consistent withmean.default()
.
-
Bug was reported on edge cases with positive strand CB samples. It appears to be an edge case derived from rare duplicated
names()
in splice junctionGRanges
, fortuitous for certain genes. The duplicate names are repaired inprepareSashimi()
hopefully resolving downstream effects. The duplicate names appear to originate fromimport_juncs_from_bed()
, unclear why the issue was not seen before now, could be subtle change that added rownames that previously did not exist somewhere upstream in the process.- Aside: The bug is intriguing partly by the way is was observed: only
when all CB samples were shown, and only on the positive strand. I think
the cause is related to an upstream step adding numeric names to splice
junctions in order they appear, and they happen to be loaded one
sample_id
at a time. When selective a particular gene, it combines junctions from each requestedsample_id
, after subsetting each for the requestedgene
. In principle it would be rare that the same range of entries is used from multiple samples - for example entries 9500 through 9575 fromCA1_CB
and entries 9570 through 9800 fromDG_CB
. But it tends to happen acrossCB
samples and not betweenCB
andDE
samples, which suggests theCB
samples are more similar to each other than they are toDE
samples.
- Aside: The bug is intriguing partly by the way is was observed: only
when all CB samples were shown, and only on the positive strand. I think
the cause is related to an upstream step adding numeric names to splice
junctions in order they appear, and they happen to be loaded one
Added package R.utils
to Imports, since it is required by data.table
to import gzip
files.
Added @import data.table
to import_juncs_from_bed()
and other functions,
sometimes it is required for full functionality from data.table
.
import_juncs_from_bed()
was updated to force printing the actual R error message, in addition to internal error reporting where the error occurred. We suspectdata.table::fread()
oropenssl
may not be reading the remote gzipped junction file over the https connection. It appears to be a machine-specific error not seen on multiple test machines until now.
sashimiDataConstants()
now propagates"verbose"
into the data environment so the verbose flag will be persistent through the Shiny app functions.getGRcoverageFromBw()
prints more error information when retrieving coverage fails. Errors seem to happen when openssl is not enabled for "https" URLs as opposed to "http" URLs. Unclear exactly why.import_juncs_from_bed()
was updated to force printing error messages when retrieving data from cache fails, or when retrieving the junction data itself fails.
-
Two new config options adjust the junction arc height. We noticed for some genes, the junctions overlap the intermediate coverage, so these options allow raising the arcs accordingly.
- "Junction arc factor" adjusts the relative height of junction arcs.
- "Junction arc minimum" defines the minimum junction arc height.
-
Changed "Junction transparency" to "Junction non-transparency".
-
Moved "Interactive Plot" option to the top.
import_juncs_from_bed()
was updated to correct an issue where verbose output was required for the cache repair step.sashimiAppServer()
was updated to remove all%>%
pipes. Ah well, adding a package prefix was not the answer to that particular issue.
sashimiAppServer()
was updated to reorganize the overall series of Shiny reactive events, cleaning up some dependencies.sashimiAppUI()
new query device to control the gene-transcript-exon panel height independent of the sashimi coverage plot panels.
Wrapping Shiny functions in an environment was a clever way to pass variables directly from an active R session. It also means the data can remain inside the environment without copying it into each function - while also allowing data to be updated and saved in that environment.
However it appears to break the R package import
process,
and caused %>%
to be unknown even though the dplyr
package was supposed to imported. I cannot understand the
import scope - it works sometimes but not other times.
Maybe because the environment does not have a parent, it
causes imported functions to be lost?
-
sashimiAppServer()
was updated to fix missing%>%
, and other issues that arose with the new data environment workflow. -
sashimiAppServer()
briefly broke the ggplotly workflow for interactive plots, becauseplotly::subplot()
absolutely choked and failed when the numeric vectorheights
had names. So that was fun... removing names fromheights
fixed the error. -
Another bug appears to be causing splice junctions to be invisible on the public site, despite being present in all internal test instances.
sashimiDataConstants()
is a subset ofsashimiAppConstants()
that includes only the required data, and not the R-shiny app components such as the help text and widgets. It was updated to store all data inside an environment, which updates data in that environment during processing as needed.get_fn_envir()
is a helper function that returns a variable value from the calling function, or from one or more provided environments. The use case is to allow a user to define function variable values directly, then to use the environment values otherwise.
-
launchSashimiApp()
was changed so that it uses a user-supplied environment to store intermediate data for the R-shiny app. By default this environment isglobalenv()
for backward compatibility, however it is recommended to create a new environment to insulate data from theglobalenv()
. -
sashimiAppConstants()
was updated to use environment as input and the new functionget_fn_envir()
. Note that previous input would ultimately use values inglobalenv()
even when using a specific environment. The default was to useglobalenv()
so that behavior was not problematic. The new behavior will strictly only use either the function argument, or the value in the specific environment provided. -
import_juncs_from_bed()
was refactored completely, so it is able to handle BED12 input as well as"SJ.out.tab"
junction output directly from STAR alignments. This refactor will cause all previous memoise cache for junction files to be invalidated, and thus downloaded again. These files are small and fast to download, but it is worth noting. The underlying method now usesdata.table::fread()
then it decides whether to convert from BED format withas(bed, "GRanges")
or converts from"SJ.out.tab"
when there are 9 columns. -
Several functions had the
verbose
output reduced, instead whenverbose > 1
the more verbose output is printed.
getGRcoverageFromBw()
was updated to fix a bug that really originated fromrtracklayer::import.bw()
, which returns output sorted by the bigWig file chromosome sort order, and not by the order ofrtracklayer::BigWigSelection()
as documented inrtracklayer::import.bw()
. The workaround implemented ingetGRcoverageFromBw()
is to return coverage in the exact order requested. Note this issue will not affect sashimi plots since each gene is contained within one chromosome, and in this case the coverage is returned byrtracklayer
in the order it is requested within a chromsome, butrtracklayer
bug returns results for each chromosome together regardless the order it was requested. Also this update was performed outside the memoise call, so the cache will store data in the order received fromrtracklayer::import.bw()
but will re-order those results consistent with the order of input GRanges.
sashimiAppUI()
was updated to change "Sample Selection" to usedataTable
output.sashimiAppServer()
was updated to handledataTable
widget for "Sample Selection" and sample ordering, removing the previous method of drag-and-drop. The reasons: first, the drag-and-drop was not mobile-friendly; second, the widget apparently broke after the recent updates. The error was reproduced by changing the sample selection, then creating a new sashimi figure - in that case no sample selections were retained and the figure failed due to no selected samples.sashimiAppConstants()
was updated to include a text box with fullsessionInfo()
output, inside a box that is collapsed and hidden by default. This output is on the"Guides"
tab below the section"Relevant R version info"
.
sashimiAppUI()
was updated to correct some visual glitches in the R-shiny app, using theshinydashboard::box()
instead ofshinydashboardPlus::box()
except where required. TheshinyWidgets::sliderTextInput()
appears to detect the wrong color for title text and tick marks, so the sidebar background was changed to medium grey to try to account for those oddly inconsistent choices. It currently cannot be customized.
gene2gg()
new argumentgeneSymbolColname
allows using a specific gene symbol column name, instead of the previous default"gene_name"
. The correspondingggplot2::aes()
was changed toggplot2::aes_()
to allow using a variable as a name.gene2gg()
argumenthjust
was changed tohjust=-0.5
because somehow the apparent behavior ofhjust
inggrepel
has changed in recent versions. The previous valuehjust=2
is roughly the same as the new valuehjust=-0.5
which is... unexpected.
Several updates were implemented to correct behavior seen only with long-running R-shiny apps, such as https://splicejam.vtc.vt.edu unfortunately. These changes should help make the R-shiny app more robust for all users, but are particularly targeted at the longer-running R-shiny servers.
The shinydashboardPlus
package version 2.0.0 included
numerous of "breaking changes" that also required
numerous updates to the R-shiny sashimi app.
Functions were renamed (dashboardHeaderPlus()
, boxPlus()
)
and introduced name conflicts with the existing
shinydashboard
package - and so required package prefixing
to ensure the correct function is being called.
flattenExonsBy()
was updated to warn on disjoint exons correctly this time.flattenExonsBy()
several updates to ensure proper handling ofdetectedTx
in edge cases. Code was somewhat cleaned up to make each step more clear.sashimiAppServer()
was updated to force several instances of default behavior for some form components that apparently do not return values upon the initial start-up of the R-shiny app. This behavior may be fixed properly in future, so that the values are initially available. For now, the fallback to use default values is chosen.sashimiAppServer()
fixed bug wheredetectedTx
was not showing even when check box "Show transcripts" and "Detected only" were both checked. In future unchecking "Detected only" should include more transcripts for genes that have a subset of detected transcripts, thus widening the field of view so to speak.sashimiAppConstants()
was updated to print more detail regardingdetectedTx
entries during the preparation steps. When there are nodetectedTx
entries that matchtx2geneDF$transcript_id
then it prints the first 20 lines oftx2geneDF
for visual review. In these cases the resolution might be to remove the local file".tx2geneDF.txt"
which will force the file to be re-created from the GTF source file, and may resolve the mismatch.
Apparently for .tx2geneDF.txt
files that existed in older
versions of splicejam, the stored file included rownames
that were ignored upon load. When splicejam switched to
use data.table::fread()
and data.table::fwrite()
it
briefly broke the ability to keep the header line because
header=TRUE
was not working as expected. Removing
header=TRUE
causes data.table
to detect the rownames
and add a new column which is ignored. All future versions
of splicejam should not be affected.
- Added
"testthat"
package dependency, which requires version 3.0.0 or higher to avoid the bug associated withtestthat_print() not found
. assignGRLexonNames()
was updated to include examples, and to remove a redundant step when checking for disjoint ranges. The process was updated to clarify how geneSymbol values are obtained, fromvalues(GRL)[[geneSymbolColname]]
,values(GRL@unlistData)[[geneSymbolColname]]
, thennames(GRL)
in order of the first one with valid values. It now also handles the case wherenames(GRL)
is not defined, but where values are present viageneSymbolColname
.jam_isDisjoint()
was updated to return alogical
vector forGRangesList
input, representing eachGRanges
element in theGRangesList
. This change is consistent withGenomicRanges::isDisjoint()
without sacrificing speed.
- Fixed jampack #1 bug with
splicejam::gene2gg()
when onlyflatExonsByGene
is supplied withoutflatExonsByTx
theGRangesList
coersion no longer allows one object to beNULL
. Workaround is to ignore the missing object which keeps everything asGRangesList
without needing coersion. - Fixed bug in
assignGRLexonNames()
that appears to be caused by a new Bioconductor data typeFactorList
, which requires explicit conversion tolist
for this function to work as expected. - Cleaned up some errors in function examples, most just needed to have the examples re-evaluated to clear the cached error.
makeTx2geneFromGtf()
was updated to prevent edge cases in parsing GTF files, conversion todata.frame
withstringsAsFactors=FALSE
just to make sure, for R versions lower than4.0
.
defineDetectedTx()
was updated with slight change to the percent max isoform expression. The previous calculation used the higher of max expression or1
in the percent maximum, to prevent divide by zero. However, it assumed expression should be1
or higher, making the resulting percentages lower than 100% max for genes whose isoforms all had values less than1
. Originally this was a feature, since genes with expression less than1
were not beneficial, however when using TPM values it is useful to allow values less than1
. New arguments:floorTPM
andfloorCounts
allow custom minimum values, the defaults are 0.001.
annotateGRfromGR()
was optimized for edge cases where there were a low number of multi-overlaps and large number of single-overlaps.grl2df()
was not properly stacking junctions in sashimi plots, the problem was caused byclosestExonToJunctions()
which annotates junctions by the nearest compatible exon boundary, with no regard to distance from the exon. ThestackJunctions()
function simply stacks based upon"nameFrom"
and"nameTo"
, so junctions that land between exons were being stacked together with junctions that properly align those exons. NowclosestExonToJunctions()
has optional argumentspliceBuffer
, when supplied a distance, exons beyond that distance are annotated byexonName.distance
. Novel junctions will stack only when they share the same distance from an annotated exon.
jam_isDisjoint()
is a rapid alternative toGenomicRanges::isDisjoint()
, that testsGRanges
orGRangesList
for disjoint (non-overlapping) ranges. In one test with largeGRangesList
, the duration was reduced from 100+ seconds to 0.5 seconds.
-
makeTx2geneFromGtf()
was optimized again, substantially reducing the time and memory required. The rownames returned are also defined by values in colname"transcript_id"
if it or similar colname exists, passed throughjamba::makeNames()
to ensure they are unique. -
Updated help documentation for
assignGRLexonNames()
. -
Added
\preformatted
tags to help docs forcurateVtoDF()
andcurateDFtoDF()
, otherwise the examples are not properly formatted with whitespace. -
internal_junc_score()
new argumentminScore=0
will return this minimum score, especially useful if for some reason the input data contains no valid score colname. Otherwise theminScore
is applied to the absolute value of the score, keeping the original sign. (Zero becomes positive.) -
detectedTxInfo()
was updated to handle being supplied bothGene
andTx
values, and a new optional argumentGroups
to allow returning a subset of group colnames. -
defineDetectedTx()
new argumentapplyTxPctTo
defines which expression data to use (count data or TPM abundance data) for the percent max calculation. The purpose of this new argument is to specify whether to apply the percent max threshold to TPM, counts, or a combination of the two:"TPM"
uses only TPM data for percent max thresholding"counts"
uses only count data"either"
uses the higher percentage from TPM and count"both"
uses the lower percentage from TPM and count
As a result, the output is also expanded to include list elements:
"txPctMaxTxGrpAll"
"txPctMaxTxTPMGrpAll"
, in addition to"txPctMaxGrpAll"
still represents the data used for filtering, modified relative toapplyTxPctTo
.
Several changes were intended to help set up custom R-shiny apps, with specific default settings different than the Farris et al defaults.
-
sashimiAppConstants()
was updated to enhance the logic flow, and to print more verbose and hopefully more helpful output during the process. -
sashimiAppConstants()
usesdata.table::fread()
anddata.table::fwrite()
. Fixed inconsistency in row.names and col.names. -
sashimiAppServer()
was updated to clarify some verbose output, and define default gene more robustly. -
launchSashimiApp()
,sashimiAppServer()
,sashimiAppUI()
now respect"gene"
from the environment to use as the first gene loaded, instead of using"Gria1"
which is intended only for the Farris et al data. -
launchSashimiApp()
now properly handles server-configuredoptions
such as width, server, port, and other R-shiny app options. Most important are server and port, which allows the R-shiny app to be run manually on ports above 8000 if necessary. -
launchSashimiApp()
now properly initializes with variables from the active R environment. More variables will be converted, along with robust error checking. For now, these variables are available:gene
which defines the first gene displayedsample_id
which defined the set of sample_id values to displaymin_junction_reads
- minimum junction read thresholduse_exon_names
- one ofc("coordinates", "exon names")
exon_range_selected
- two gene_nameExon valuesexon_range_choices_default
- exon labels available for default genelayout_ncol
- number of columns in layoutinclude_strand
- one ofc("+","-","both")
strands to displayshare_y_axis
- one ofc(TRUE, FALSE)
whether to use shared y-axis range
-
assignGRLexonNames()
was updated to be more tolerant of "problematic GTF file" data, specifically to allow optionally keeping genes that are found on multiple strands, which previously were removed by default. The idea of keeping a multi-strand gene is not ideal when thinking about sashimi plots, but can be useful when considering gene exons, especially when genome assembly may also be unfinished. Tools such as featureCounts will generate counts across all exons for a gene, regardless of strand and chromosome, so the output fromassignGRLexonNames()
will be compatible. The argumentfilterTwoStrand=FALSE
will keep features present on multiple strands (or chromosomes.) -
assignGRLexonNames()
now calls new functionjam_isDisjoint()
to test for disjoin features. For a large testGRangesList
object, the duration was reduced from 60 seconds to 0.5 second. -
flattenExonsBy()
was also updated with new argumentfilterTwoStrand
which is passed toassignGRLexonNames()
after exons have been flattened. UsefilterTwoStrand=FALSE
to keep all gene features even when they are present on multiple strands or chromosomes. -
flattenExonsBy()
new argumentexon_method
which has two options:exon_method="disjoin"
is the default, which combines transcript exons per gene and maintains distinct boundaries, in case an exon is longer in one isoform than another, it is subdivided into something likec("GENE_exon1a", "GENE_exon1b")
.exon_method="reduce"
is a new alternative, it usesGenomicRanges::reduce()
to combine exons across transcripts, and therefore ignores any subdivision of exons. In the same example above it would produce only"GENE_exon1"
. The main benefit from this option is that it is markedly faster for genome-wide data preparation, where"disjoin"
may take 2 to 5 minutes,"reduce"
may only take several seconds.
Setting up a new sashimi plot is not crystal clear, and most preparatory steps can be done once and never again (for each version of Gencode.) Ideally, install an R package that contains:
- exonsByTx
- cdsByTx
- flatExonsByGene (all genes?)
- flatExonsByTx (all transcripts?)
- tx2geneDF (with fix for mis-matched transcript_id between GTF and FASTA from Gencode)
txdb
(optional, but not provided by Bioconductor currently)
groups2contrasts()
updated a bug that was not maintaining factor order in two-way contrasts. Slightly enhanced logic used in argumentremovePairs
so that it works with single factor contrasts (where that factor does not change.)defineDetectedTx()
now uses decimal values for the mean count and mean abundance group values, rounded to the 0.1 decimal place.
defineDetectedTx()
default argument valuecutoffTxTPMExpr=0.1
which was previouslycutoffTxTPMExpr=2
. It appears that this threshold is somewhat dependent upon the input data, and that2
was too stringent for the majority of datasets testes since the Farris et al publication. Therefore, a more suitable default value is less restrictive of low-TPM abundance values. This change also makes the count-based threshold the dominant filter for technical detection above background noise, which makes some sense because count noise is somewhat more of a hard boundary.
Overall, the changes allow some coverage data to be "missing", though it will generally try twice to retrieve coverage. When coverage is "missing" it is rendered empty, with no other error or message. This change may not be ideal, if the coverage URL is mis-typed for example, but it beneficial during a network outage.
-
sashimiAppServer()
was updated to repeat the call toprepareSashimi()
when the returned object contained a list elementsome_null=TRUE
, which indicates that one or more underlying elements of the data was returned asNULL
, indicating a failure. In this case,prepareSashimi()
is called in a way that does not re-use the existing cache. Note thatprepareSashimi()
calls other functions, each of which caches data. If those functions returnNULL
, they also assign attributeattr(x, "some_null") <- NULL
, which helps cascade this failure up the chain to calling functions. -
import_juncs_from_bed()
was updated to usetryCatch()
to catch errors when the requested file cannot be retrieved. When memoise cache is being used, it will try to clear the cache then try again, or will try again without using memoise. -
getGRcoverageFromBw()
was updated to enhance the memoise cache repair logic: If coverage is zero-length, it tries again. Zero-length coverage can happen from a zero-length memoise cache file, or when thertracklayer::import()
function returns zero-length coverage. That tends to happen when the file is not accessible, network is down, or the path is invalid, etc. Nonetheless, memoise will happily store a zero-size cache file (!) with no clear way to remove or ignore it. So we try to callmemoise::drop_cache()
which only exists in memoise version1.1.0.9000
, currently on Github with no expected CRAN release timeline. If we cannot remove the cache file, we try to get coverage without using memoise. The overall summary:- If coverage is empty, try remove the empty memoise cache file.
- If we removed the empty memoise cache file, we try again to retrieve coverage using memoise. The new memoise cache file will either be empty (if the source failed again), or non-empty (the second attempt was successful). If the file is non-empty, the repair worked, and future calls should use the memoise cache file without problem.
- If it could not remove the empty memoise cache file, it tries to retrieve coverage again without using memoise. In this case, the empty memoise file still exists, which means all future calls will not use the cache file.
-
combineGRcoverage()
was modified to be tolerant to missing coverage data. Typically when coverage is missing, the data returned fromgetGRcoverageFromBw()
returns a vector of0
values, however sometimes there is no data returned for a single URL, perhaps due to network outage. Nonetheless,combineGRcoverage()
will now ignore any samples that are not present in the colnames of theGRanges
object supplied.
- The
launchSashimiApp()
function which starts the R-shiny app sometimes encounters errors when the required files are not available, as has been the case with server or network outages. Since each requested file is stored in amemoise
local file cache, a previously-accessed gene should pull from the cache and avoid these errors. But newly accessed genes result in corrupted (or zero-size) cache files. Splicejam tries to detect corrupt cache files and reload from the source (referring tofilesDF
) however this process is proving to be imperfect, and the sort of thing that is difficult to debug. - The remedy for weird errors in the R-shiny app is usually to remove all *_memoise subdirectories, which forces the cache to be rebuilt.
- Some reference numbers regarding speed: In our testing, the typical new gene request takes about 10-20 seconds to retrieve data, assemble the ggplot object, then display the plot -- for 8 covergae files and 8 junction files. A cached gene takes about 3-5 seconds to display the plot, almost all of that time is ggplot rendering its own plot object. In principle, R-shiny has the ability to cache rendered ggplot objects, which results in response of less than 1 second for cached rendered plots. However that process only works when the cached rendered plot has identical dimensions to the newly requested plot. Splicejam currently scales the plot to size of the web browser, which means the benefit would only help each browser size. For example, splicejam works really well on a phone web browser (!), somewhat surprising to me, but it turns out to be great. I use it quite a lot when I'm on the go, between meetings, and admittedly during some meetings.
prepareSashimi()
now positions junction labels at the maximum edge of the junction arc, in stranded fashion. Might consider adding text adjustment so the label is not centered at the edge.plotSashimi()
now appliesscales::comma(..., accuracy=1)
by default, which removed the decimal values from junction labels. Some recent update toscales::comma()
seems to have caused decimal values to appear by default. A new argumentjunc_accuracy
is passed toscales::comma(..., accuracy=junc_accuracy)
to allow customization.plotSashimi()
fixed issue wherejunc_alpha
was only being applied when data contained junctions without coverage, now thejunc_alpha
is applied in all cases.sashimiAppServer()
now displays the strand of the gene found in the top search pane, which allows someone to set the coverage strand consistent with the gene if desired.
- Fixed issue with plotly rendering in R-shiny context, it
was throwing an error
"TypeError: postRenderHandlers is undefined"
. Upon deeper inspection, theuiOutput()
element being used to display both ggplot and ggplotly output, was somehow not providing the plotly javascript dependencies to include relevantpostRenderHandlers
. The workaround was to add an emptyplotlyOutput("plotly_blank")
to the UI, which appears to cause the required javascript dependencies to be loaded. Stackoverflow threads and Github issues suggest there exists a method to load dependencies when embedding htmlwidgets inside tagList context -- for example when embedding a custom htmlwidget inside a table cell. Perhaps the proper remedy is related.
- Fixed rare error in
annotateGRfromGR()
with numeric columns, which failed to initialize a new GRanges column with numeric(0) instead of using a numeric form of NA, which was accomplished withc(0, NA)[2]
.
- Changed
plotSashimi()
argument defaultylab="read depth"
, previously it was"score"
. In future, normalized data might useylab="normalized read depth"
but that would be a custom option for the analyst. plotSashimi()
arguments"xlabel"
and"xlabel_ref"
control the x-axis label, which by defaultxlabel_ref=TRUE
will use the reference (chromosome) as the x-axis label, which makes sense because the axis shows chromosome coordinates. The"xlabel"
argument is intended to allow a fully custom label.
- Fixed bug in
compressPolyM()
edge case, polygons with only two x values and y values all zero; threw an errorError in x[idx1, ] : only 0's may be mixed with negative subscripts
. New rule requires at least 5 x-axis values, otherwise compression doesn't seem necessary anyway.
- New argument
compress_introns
added toexoncov2polygon()
andprepareSashimi()
, which determines whether to callcompressPolygonM()
when creating polygon coordiantes for coverage data. Settingcompress_introns=FALSE
saves about 20% of the time it takes to prepare sashimi data. - Changed default
compressPolyM()
argument fromminRatio=3
tominRatio=5
, the threshold of compression above which polygon coordinates are adjusted. - Changed
sashimiAppServer()
to create the memoise version ofprepareSashimi()
only once, instead of re-creating it each time. Probably no noticeable effect, but it feels better.
- New README.Rmd file with a visual example and description
of a sashimi plot. New small data object
"demodata"
which contains the minimum required to produce the Gria1 plot. - Added splicejam hexsticker with an embedded sashimi plot. An R package isn't an R package without a hexsticker.
- Slight enhancement to
prepareSashimi()
to change the"name"
column to a factor, in order to force the drawing order of junction arcs, so junctions are drawn in order of most dominant, to least dominant, then shorter spans to longer spans. Smaller score, wider junctions are drawn last, since they are typically the most difficult to see otherwise. "Dominant"" refers to thejunction_rank
calculated by having the highest score among junctions that start or end at the same coordinate:junction_rank=3
means the junction has the highest score on both sides, and has the darkest color;junction_rank=2
has the highest score on one side but not both sides, and is lighter in color; andjunction_rank=1
does not have the highest score on either side of the junction, and has the lightest color. Thejunction_rank=1
entries are now drawn last, since they typically have the lowest scores, and are small and thus easily obscured.
splicejam-sashimi-server()
was updated to handle changes in the gene search field properly, avoiding edge cases with un-detected genes that have no sashimi data.prepareSashimi()
was updated to handle absence of splice junction data, in a more robust way.spliceGR2junctionDF()
was updated to handle missing junction data, ultimately returning NULL which is easier handle consistently by other downstream functions. Also made small update to handlesample_id
as a factor when input data is supplied only as a GRanges object.
bgaPlotly3d()
new argumentsampleGroups
allows data to be re-grouped to define custom group centroids. Main driver is when running BGA on un-grouped data, for example where each replicate is its own group. UsingsampleGroups
allows the proper sample group to be highlighted on the plot. This scenario is mainly only advised when needing to run standard PCA or COA, or when there are fewer than four groups, since BGA producesn-1
dimensions. Code was also slightly update to prepare for broader use outside of BGA.bgaPlotly3d()
was updated to handle the"textposition"
argument to plotly labels, allowing individual adjustment of each centroid and sample label.bgaPlotly3d()
new argumentgeneLabels
allows replacing gene identifier with a custom label, for example replacing assay ID with gene symbol.gene2gg()
updated to maintain a minimum y-axis range whenlabelExons=FALSE
. Previously the exons could be cropped.
bgaPlotly3d()
bug fixed which prevented display of scaled coordinates from thebgaInfo
object, mainly because the scaled coordinates are not supplied for individual samples. Instead the scaling factor (adjustment used to convert centroid coordinates from raw to scaled) is applied to sample coordinates to produce equivalent scaled coordinates. That said, I recommend using un-scaled coordinates, which keeps the x, y, and z axis scales proportional to their relative contribution, which visually reinforces the relative strength of separation in each dimension.
- Fixed regression in
stat_diagonal_wide_arc()
which callsggforce::StatBezier
. Version 3.1.0 of ggforce changed the required call to that function, breaking the dependency and causing junction arcs to be invisible as a result. The new interface mimics howggforce::stat_diagonal_wide()
callsggforce::StatBezier
. I hope the interface does not change in future. - Added package version dependency for
ggforce
version 0.3.1, which is the point where the API changed withggforce::StatBezier
. - Fixed vignettes which assumed optional R packages were installed,
for example
org.Mm.eg.db
. Now if the package is not available, it skips the corresponding sections.
- Updated the
"Guides"
tab of the R-shiny Sashimi app. It now also prints the R version, and the package versions of jam packages. - Updated
DESCRIPTION
to include higher specific version numbers for several packages, to force an update if needed. - Updated vignette to remove hard dependency on the
tximportData
package, making it optional. This change prevents the 250 MB package from being required only for the example vignette. In future, a random generated expression data matrix may be created.
- Fixed small bug in
getGRcoverageFromBw()
that used default memoise path"memoise_coverage"
instead of"coverage_memoise"
as with other functions. Only affected directly callinggetGRcoverageFromBw()
since other functions passed the directory as an argument, and thus probably only me.
- The default plotly does not enable highlighting. It is currently too slow and laggy, subject of future optimization.
- Font sizing is reworked, to start with 14 point font and
add/subtract from that font size. It converts to ggplot2
convention of
"mm"
units, since it appears to ignore grid"pt"
units. The bast plot font size can be adjusted, affecting everything including exon labels. The exon labels can separately be adjusted relative to the base font size. - The coordinate label on the R-shiny app includes the gene, to make it clear when a new gene search results in a new coordinate range.
- The
scale_factor
values are applied outside the memoise cache strategy, which should allow modifying thescale_factor
values without invalidating the cache. It seems correct to cache the unadjusted data, then separately apply any adjustment. - The
"farrisdata"
package was added to the"Guides"
tab which lists the versions of relevant packages. - The plot config side menu icon was changed from
"wrench"
to"gear"
, thanks to @DivadNojnarg of the"RinteRface/shinydashboardPlus"
package on Github! - Updated R package dependencies for minimum versions for
"shinydashboardPlus"
and"farrisdata"
.
- In near future,
sashimiAppConstants()
may become the standard way to prepare the dependencies for sashimi plots, including the gene-transcript-exon models, the filesDFdata.frame
, the flattened exons, and color substitutions. This function will be able to take a GTF file, and prepare all the tx2geneDF cross-references, the exon and CDS exons, the flattened exons, usingdetectedTx
if available for enhanced exon structures. - Package dependency on memoise, using Github version
1.1.0.9000
which allows invalidating a single problem cache file. It has been available since October 2018 but not released to CRAN yet. This changes is a step toward recovering from network outages, which currently create empty cache files which cannot be recovered without emptying the entire cache.
getGRcoverageFromBw()
now checks if memoise cached data isNULL
, which happens during network outage. If the data returned from the cache isNULL
it clears the cache for that key, then tries again.import_juncs_from_bed()
now checks memoise cached data forNULL
as described above.sashimiAppConstants()
was refactored, to define variables in a systematic way, by default updating global environment for convenience, but optionally creating a new environment inside which the required variables are populated. In future this technique may become the default here along with downstream functions, which may reference the environment rather than data objects themselves.
- Updated
groups2contrasts()
to detect when any factor level contains"-"
, and converts the"-"
to"."
prior to detecting contrasts. Previously the"-"
interfered with proper contrast names, and resulted in some contrasts not being returned by this function. Note that this function avoids usingbase::make.names()
because that function aggressively converts other valid characters to"."
. However, if downstream functions requirebase::make.names()
compliant names, run that function prior to callinggroups2contrasts()
. sortSamples()
was updated to match"wildtype"
as a control term.groups2contrasts()
was updated to check for specific scenarios where iFactors and iSamples may be missing, to cover a variety of common scenarios. Error messages were updated to be specific to the expected input.flattenExonsBy()
removed the...
dots argument, to try to appease the memoise caching logic, which is suspected to be comparing the...
data when invalidating the cached files. Ironically, this change itself will invalidate existing cached files.internal_junc_score()
was updated to handle data with nosampleColname
column, which fixed an issue with the example code.stackJunctions()
was updated to handle multiple sample_id in the same operation, keeping each sample_id independent during stacking. This update should fix the edge cases where junctions appeared to be improperly stacked.to_basic.GeomShape()
is now exported, since it was causing problems during the call toplotly::ggplotly()
when convertingggforce::geom_shape()
to plotly format. Somehow it worked for the Sashimi coverageggforce::geom_shape()
but not for gene exonggforce::geom_shape()
. I am not always here to understand fully, but to make things work.
prepareSashimi()
now returns list item"df"
which contains the merged coordinates for coverage, junctions, and labels. Thisdata.frame
overrides the need forplotSashimi()
to merge this data on the fly, and takes some extra logic away regarding junction rank, label positions, etc. It also returns"ref2c"
as an attribute to the"df"
, to make sure it is available even when requesting only the"df"
format. The"ref2c"
contains the functions used to compress the introns on the x-axis scale.plotSashimi()
now expects the inputsashimi
object to contain list element named"df"
to include the full coordinatedata.frame
used for plotting. Note thatplotSashimi()
will now require output fromprepareSashimi()
in version 46 or higher. Fortunately, memoise already invalidates the cache for even the slightest change in any molecule in the world, so caching should not be problematic.
- Made Sashimi plot the default tab. Fixed regression caused by page load with partially initialized input values.
- Added
aboutExtra
as optional text to describe the data used in the R-shiny app. - Added R package versions to the Guides tab.
- Added option to show legend in plotly interactive plots, which allows for some interesting filtering options, like hiding coverage, or junctions, or subsets of junctions based on predominance (.1 is minor, .2 is mixed, and .3 is major predominance.) Junction ranks are based upon having the highest score at each junction end, typically the predominant junction represents the predominant transcript isoform.
plotSashimi()
new argumentjunc_alpha
to control the alpha transparency of junctions, to allow transparency in cases where the intron coverage may be obscured by the junction arc.- Increased default minimum junction arc height from 100 to 200, the effect should be slightly higher junction arcs. In future, will consider inspecting intervening coverage max height as perhaps better estimate of a minimum junction arc, plus some buffer height.
- Fixed regression in R-shiny, updating the progress bar using gene in the caption, but the function did not need to know the gene.
- Improved overall handling of reactive gene, sample_id values.
plotSashimi()
andgene2gg()
have argumentlabel_coords
for optional x-axis range to subset labels before displaying withggrepel::geom_text_repel()
. This change solves the issue where zooming the x-axis range withcoord_cartesian()
kept all labels which were then arranged at the edges of the plot. This change also necessitates re-creating the ggplot object, since it has to change the label coordinate data.- Fixed some legacy code that assigned
gene
andsample_id
to the global environment. - Removed the
"Samples and Data"
tab from R-shiny for now. - Improved level of detail in shiny progress bars, now shows each file being loaded. Not sure it is actually better than having less detail.
- The order of
sample_id
items in R-shiny"Sample Selection"
is now maintained, allowing custom sorting of samples. - Fixed error when junctions had no intervening junctions to use when determining the minimum height for a junction arc.
- Fixed errors when part of the Sashimi plot data did not exist, for example no junction data or no coverage data.
prepareSashimiData()
argumentuse_memoise
enables memoise caching of individual bigWig files per gene, and individual junction files per gene. The file paths are defined bymemoise_coverage_path
andmemoise_junction_path
. These options should greatly enhance the success rate of caching, at the expense of creating more cache files.getGRcoverageFromBw()
now has argumentsuse_memoise
andmemoise_coverage_path
to cache at the level of each bigWig file and genomic range.- R-shiny by default caches sashimi data, so repeated calls for the
same
gene
and same set ofsample_id
will retrieve the full cached sashimi data. However, if any smaller argument changes, including changes to any bigWig coverage or junction BED file, each individual step is also cached to prevent retrieving the same coverage or junction data repeatedly. For practical R-shiny usage, wheresample_id
is frequently changed, this update should be a substantial improvement. - Changed default R-shiny to non-interactive. One day will switch it back, but need plotly ninja skills meanwhile.
- Changed
sashimiAppConstants()
to improve handling of memoise flatExonsByTx data. - Updated the Guides tab with a description of Sashimi plots, and a how-to for creating a Sashimi plot.
- Added
shinyjs
package dependency.
- Fixed regression in junction scale factors, not consistently applied
in
prepareSashimi()
.
- Dependency added for
shinyjqui
to useorderInput()
for drag-and-drop selection and ordering ofsample_id
values. - Dependency added for
shinycssloaders
.
- New tab "Sample Selection" which focuses on selecting and ordering
the unique sample_id entries found in
filesDF
. - "Samples and Data" tab allows editing columns, and will
include updated
"scale_factor"
values in normalizing coverage and junction scores. - "Samples and Data" tab uses
color_sub
to colorize the table, and will create colors for any undefined values in the"sample_id"
column. Other columns are colorized when all values matchnames(color_sub)
otherwise are not colorized. Columns are arranged so any extra columns (such as group, subgroup, batch, etc.) would appear besidesample_id
, and typically would be colorized also usingcolor_sub
. - Added spinning loader indicator to the plot panel, which covers the time after data is prepared, but before ggplot has created the actual visualization.
- The "Sample Selection" tab now allows setting the number of plot panel columns. However, it breaks the synchrony with the gene-exon model, since the gene panel is still full-width. Already thinking of alternatives.
- Plot settings like panel height, font relative sizing, and exon label size (for non-interactive) are available.
- Added panel height to R-shiny UI, controlling the allocated height for each facet panel.
- Numerous updates to the plotly highlighting methods.
- Reorganized the R-shiny UI, still in progress.
- Added R-shiny tab "Sample and Data" intended to customize and select sample_id entries to display or hide in the Sashimi plots. Purely aesthetic and non-functional, still testing out the many javascript table options available.
- Fixed regression in use of
color_sub
to pre-define categorical colors in sashimi plots. - Updated plotly highlighting, increasing the success rate in most test cases.
grl2df()
now returns seqnames, and adjusts yBaseline within each seqname (chromosome). To plot multi-chromosome GRangesList, use+facet_wrap(~seqnames)
- R-shiny enabled some plotly highlighting features, currently in testing phases.
- Major refactor of
plotSashimi()
to merge together the junction, coverage, and label coordinate data.frames to enable plotly and crosstalk to highlight features. The end result looks amazing.
- R-shiny new options: minimum junctions; show by strand.
- R-shiny now defines each to the global environment if they do not already exists: exonsByTx, cdsByTx, flatExonsByGene, flatExonsByTx, tx2geneDF, detectedTx, detectedGenes. This mechanism is used for now, both to help define custom input data, but also to help define the data values needed to produce plots manually outside R-shiny.
stackJunctions()
now also returns "junction_rank" scored from 1 to 3, where 1=minor, 2=partial, 3=major, based upon the rank of junctions entering and leaving exons.- Junctions are drawn in order from dominant to minor,
which has the effect of ensuring smaller junctions are
displayed. The
junc_fill
is converted to gradient so a darker color is used for dominant junctions, to try to highlight the major isoform splice junctions per sample. Unfortunately, plotly does not honor the polygon render order (yet).
- Added
flatExonsByTx
to R-shiny app, to be able to display transcript isoform exon structures alongside the flattened gene-exon model. - Added R-shiny option to display transcript isoforms alongside the gene-exon model. Also option to show all or detected transcripts.
- Splice junctions now require at least one junction end within the range being displayed, to avoid junctions that do not connect with the visible gene model.
- The x-axis range of multiple plots are more consistently controlled, to avoid one panel autoscaling to display a wider junction than other panels.
- Plotly views contain custom tooltip text, using the sample_id, the name of the exon or feature, and the track (referring to the name of the coverage).
- New ggplot2 geom
geom_diagonal_wide_arc()
, extendsggforce::geom_diagonal_wide()
by connecting the left and right halves of a diagonal into one continuous ribbon arc.
plotSashimi()
now usesgeom_diagonal_wide_arc()
to display junction arcs instead of using twoggforce::geom_diagonal_wide()
. For now,prepareSashimi()
still provides coordinates to create two polygons, which are combined in the geom. Still todo: figure out how to add a label; and make the middle x position immune to x-axis rescaling.
- R-shiny app now properly keeps interactive plot x-axis ranges in sync, when zooming the plot.
- Fixed bug in
prepareSashimi()
that occurred when no junctions overlapped another, resulting in error "invalid 'type' (S4) of argument".
defineDetectedTx()
has a new argumentzeroAsNA
to handle the special case where some expression values reported as zero should be treated asNA
(no data obtained) and therefore will not be included in group mean calculations. As transcript-exon models get more "comprehensive", kmer quantitation tools such as Salmon and Kallisto sometimes need to assign abundance to one of several nearly identical isoforms, and in low count scenarios all counts may be assigned to one or another isoform, leaving zero in the alternate position. Excluding zero ensures that group mean values represent only the assigned quantitation.
- Packages now imported for
launchSashimiApp()
:shiny
,shinydashboard
,shinydashboardPlus
,shinyWidgets
. getGRcoverageFromBw()
now returns NULL when a BigWig file is not accessible. Previously any error caused"Error in seqinfo(con) : UCSC library operation failed"
which could mean the file does not exist, or any number of other validation checks failed. SincegetGRcoverageFromBw()
used a vector of BigWig files, any error caused the entire set to fail.getFirstStrandedFromGRL()
added package prefix to the use ofIRanges::heads()
which was not imported directly.
- R-shiny plots now set the plot height for ggplot2 or plotly, depending upon the number of samples, and presence of gene-exon model. Future versions will allow R-shiny user customization.
bgaPlotly3d()
removed some unnecessary print functions.
- This version is an incremental update, while we evaluate some display options of Sashimi plots in the R-shiny app.
- Minor aesthetic changes to R-shiny app, including plotly interactive options, and evaluating some UI elements in sidebar.
- Added option to display gene-exon model in R-shiny.
makeTx2geneFromGtf()
help docs recommend installing theR.utils
package when.gz
files are required for import, since this process usesdata.table::fread()
.- Package dependencies were added for shiny, shinydashboard.
to_basic.GeomShape()
is a hidden function intended only to provide basic support ofggforce::geom_shape()
when converting ggplot objects to plotly.
- R-shiny Sashimi app now allows zooming into coordinate ranges, defined per gene.
- First working Sashimi R-shiny app, still has issues with
certain genes that needs debugging. Defaults to using
Farris et al data from
farrisdata
package, but can be overridden with global environment variables.
- Initial R-shiny
launchSashimiApp()
function. - Suggests the
farrisdata
package for example data used for the Farris et all RNA-seq mouse hippocampus manuscript.
- Fixed examples that did not explicitly call library() for required libraries.
- Fixed error in
codonUsage2df()
example data file path.
- Added specific TODO items.
- Added two function categories:
"jam ALE-specific RNA-seq functions"
, and"jam codon-usage RNA-seq functions"
to help organize the large list of accessory functions.
- Minor fixes to examples.
- Added data
test_exon_gr
,test_junc_gr
,test_cov_gr
for easy example data for exons, junctions, and coverage, respectively. - Added "wide" variants of the above test data, with introns about 100x larger than exons, consistent with mammalian gene structures.
- Added examples to each data, showing how one would use the raw data to generate different visualizations used in Sashimi plots.
- Added sample data to the vignettes and some examples.
- Renamed
flattenExonsByGene()
toflattenExonsBy()
to reflect that the function handlesby="gene"
andby="tx"
.
- Added
create-a-sashimi-plot.Rmd
which walks through a full example showing how to create a Sashimi plot. - Added examples to
stackJunctions()
with schematics, including example on plotting junctions by themselves.
- Removed
compressGRgaps()
for now, since the methods now try to keep GRanges intact, and instead transform the x-axis scale to compress visible gaps.
bgaPlotly3d()
now properly handles ellipsoid colors, previously the colors were assigned but not honored byplotly::add_trace()
.combineGRcoverage()
determines strandedness by requiring all values to be at or below zero, with at least one negative value. Otherwise, data can have position and negative values and will be considered positive stranded.plotSashimi()
replacesjamSashimi()
because it is just more intuitive... Ah well.- Fixed small typo in plotSashimi.Rd help that included an unmatched quote.
jamSashimi()
is used to plot Sashimi data prepared byprepareSashimi()
in order to separate the download and preparation of Sashimi data from the visualization.
prepareSashimi()
is refactored to remove the plot functions, moving them to the newjamSashimi()
.flattenExonsByGene()
can now handle Tx data, mainly useful to add CDS regions to existing exon models.gene2gg()
is more robust to edge input cases.
prepareSashimi()
is a workhorse function that prepares several types of Sashimi-like data output, including ggplot2-based Sashimi plots. This plot uses vanilla ggplot2 with custom x-axis scaling to compress genomic coordinates, while keeping data in proper genomic coordinate space.gene2gg()
is a lightweight function that creates gene and transcript exons models, and returns a ggplot2 object for plotting. It can optionally return the data.frame used by ggplot2.grl2df()
is similar tofortify()
from ggplot2, or thebroom::tidy()
functions, that take a custom object and return a data.frame for downstream use. Here the function currently works with "rectangle" data (like exons, introns, peaks, etc.), and "junction" (like junction regions to be represented byggforce::geom_diagonal_wide()
). In future it may also handle sequence coverage polygons.exoncov2polygon()
converts a GRanges object containing coverage in the form of NumericList for each exon or intron, into a data.frame suitable for use bygeom_polygon()
orggforce::geom_shape()
.getGRcoverageFromBw()
loads bigWig coverage data for a GRanges of exons and introns, returning a GRanges object with columns representing each sample_id. It callscombineGRcoverage()
to combine coverages by strand within eachsample_id
.flattenExonsByGene()
is intended to provide a set of unique exons per gene, using all or only detected transcript exon models. It numbers exons by contiguous segment, then labels subsections of each exon with a letter suffix, for example "exon1", "exon2a", "exon2b", "exon3". Lastly, it can annotate exons as CDS or non-CDS, if given a set ofcdsByTx
data.make_ref2compressed()
takes a GRanges object of exons (or any useful feature like ChIP-seq peaks, etc) and returns functions needed to compress x-axis coordinates, in order to shrink gaps/introns to a fixed width, suitable for use by ggplot2.spliceGR2junctionDF()
takes junctions GRanges data, and flatExonsByGene, and summarizes and annotates each junction by the gene_exon connected, and combines scores when multiple junctions are within "spliceBuffer" distance of an exon boundary, by default 3 bases. It can accept a GRanges object that was derived from multiple sample_id, and will return data summarized within each sample_id. It callsstackJunctions()
to combine junctions per replicate.simplifyXY()
takes a vector of coordinates, and simplifies them to the minimal set of points to represent the polygon. For sequence coverage data, that process can reduce individual points by 90%, but it works with any coordinate data. When two or more consecutive line segments have identical angle (with non-zero distance), they are combined using only the first and last points.getGRgaps()
,getGRLgaps()
,addGRgaps()
,addGRLgaps()
are functions that take GRanges input, and return either just the gaps between non-overlapping regions, or the original features with gaps inserted between them. Useful to convert a set of exons, to a set of exons and introns.
- Added package dependencies to ggplot2, ggforce, ggrepel
- Need a way of annotating GRangesList gaps by the parent feature name.
- In future, allow samples to have replicates, optionally allow each
replicate to be independently plotted, while being part of the
parent
sample_id
. - Allow alternative input for
prepareSashimi()
, namely BAM alignment files, from which coverage and junctions can be derived. - Consider making
ref2c
chromosome-wide, however it would require x-axis coordinates to know their context, in terms of chromosome/seqname. - Multiple vignettes to demonstrate workflows: Sashimi plots; preparing
exon GRangesList; using
annotateGRLfromGRL()
to add annotations, etc.
curateVtoDF()
,curateDFtoDF()
are data curation functions to curate a vector, or a data.frame, into a data.frame with consistent, usable nomenclature for downstream analysis. They use a flexible yaml format that should help automate analysis pipelines that start with raw data file import.exoncov2polygon()
andcompressPolygonM()
are basic functions for sashimi plots, efficiently converting exon/intron coverages to multi-polygons, then optionally compressing introns and reducing the information content to roughly similar resolution as uncompressed regions.spliceGR2junctionDF()
is the summary function to convert a set of splice junction read counts to gene-annotated junctions, grouped and summed where needed.closestExonToJunctions()
called byspliceGR2junctionDF()
is used to annotate junction ends near compatible exon boundaries, with some buffer distance allowed to "snap" junctions to the edge.
- Fixed issue where
numTxs
was not getting populated inrunDiffSplice()
, otherwise the stats summary is not changed.
- Added "openxlsx" to Suggests, for exporting Rmarkdown stats tables to Excel format.
runDiffSplice()
includes examples usinggroups2contrasts()
, also allowstxColname,geneColname
to be defined and therefore custom.- Changed all
verbose=TRUE
toverbose=FALSE
by default. - Changed
makeTx2geneFromGtf()
to usedata.table::fread()
ability to uncompress .gz files, hopefully making it cross-platform. groups2contrasts()
was updated to handle single-factor experiments, and be more robust to two-factor experiments with missing groups in the full design table.
- Added basic RNA-seq workflow to vignettes.
runDiffSplice()
is a wrapped aroundlimma::diffSplice()
intended to capture several steps of pre- and post-processing, optionally applyinglimma::voom()
.groups2contrasts()
takes a vector or data.frame of sample groups, and determines the relevant pairwise and two-way contrasts, returning a design matrix and contrast matrix.sortSamples()
sorts biological samples so that known patterns of control group terms are returned before non-control groups, in order to help provide useful defaults when setting up sample group contrasts.strsplitOrdered()
providesbase::strsplit()
but returns factor output whose factor levels are influenced either by factor input, or by other arguments.
limma
package was added to Imports.runDiffSplice()
has an argumentspliceTest
which definestest
when callinglimma::topSplice()
. Only the "t" (t-test) returns fold change, therefore hits are not filtered by fold change for "F" or "simes".
ale2violin()
takes output fromtx2ale()
with some arguments, and produces a ggplot2 violin plot object, as well as the underlying data. It allows a custom filtering function, which allows filtering gene lists for relevant regions of expression.
shrinkMatrix()
minor fix to remove the shadowx
in the resulting colnames.tx2ale()
modified to be tolerant of NA values in the expression matrix.
sortGRL()
sorts GRangesList objects by chromosome and position.getFirstStrandedFromGRL()
return the first GRanges feature per GRangesList, ordered properly by strand.annotateGRfromGR()
which annotates one GRanges object based upon overlaps with another GRanges object.annotateGRLfromGRL()
which annotates one GRangesList object based upon overlaps only with the same index entries in a second GRangesList object.findOverlapsGRL()
runsGenomicRanges::findOverlaps()
for the case of two GRangesList objects, matching at the GRanges level but restricting overlaps to those matching the same GRangesList index.assignGRLexonNames()
assigns exon names and numbers to a disjoint set of exons per gene model.
- Added "S4Vectors" to package imports, since some functions like
lengths()
have been moved there. Another reason using a package prefix is clunky at best. If methods have dispatch, and if they did not conflict between S3 and S4 methods, that would be the better strategy. It is not a solution to hardcode package names into function bodies, since every package maintainer has to stay updated on the source package of all other dependent functions. Those details should be irrelevant to other package maintainers.
- The "arules" package was moved to "Imports" since it is required
for the
list2im()
function. A slower workaround could be written, but ultimately the arules package is preferred. - Configured the package to use pkgdown for function references.
- Added "#' @imports jamba" to import all jamba R functions.
factor2label()
will convert a factor to a factor label, with the same order of levels as the input factor, but including summary stats like the number of items for each factor level. Useful for ggplot2 visualizations, to include counts in the color legend for example.
- Added reference file "Mouse_codon_usage.txt" to the "extdata"
folder, as example input both for farrisSeq.Rmd, and for other
species. Access the file path with this command:
system.file("extdata", "Mouse_codon_usage.txt", package="farrisdata")
codonUsage2df()
imports a text codon usage file, and returns a data.frame. Support functiondna2codon()
simply takes a character vector and returns vector whose elements all have three characters, e.g. all(nchar(x) == 3).- Two geometric mean functions:
jamGeomean()
is preferred by the Jam packages, butgeomean()
is provided for direct comparison to the classical approach. detectedTxInfo()
summarizes the data used to define detected transcripts for a given gene, or for a given set of transcripts.
- Minor fix to
makeTx2geneFromGtf()
to remove requirement forcol.names
during import. - Several minor documentation updates.
- Updated the DESCRIPTION file to include proper "Remotes" entries and depencies for jamba, colorjam, and jamma packages.
- Fixed issue where the
data.table
package required a specific#' @import data.table
entry in the roxygen2 entry for theshrinkMatrix
function. This issue preventeddefineDetectedTx()
from working within an R package.
defineDetectedTx()
is a core RNA-seq function to determine which transcript isoforms are considered "detected" based upon counts (or pseudocounts), TPM values if available, and the relative abundance of isoforms per gene. When "everything with over 10 counts" is not sufficient.tx2ale()
takes a set of transcript exon models, and returns a set of alternate 3'UTR regions, similar to ALE (alternative last exon.) It differs slightly from other methods in that it aggregates transcript quantities by shared ALE regions, producing a matrix based upon unique ALEs. It does not use counts in the ALE region itself, but the aggregate abundance of all isoforms that contain each ALE. This method allows tools like Salmon or Kallisto to quantify isoforms using the best available kmers per isoform, without being restricted to the ALE regions which have a very wide distribution of lengths.makeTx2geneFromGtf()
takes a GTF file, and produces adata.frame
with tx (transcript) to gene associations.
list2im()
converts a list to an incidence matrix. It uses thearules
package fast methods for creatingtransactions
objects, which are sparse binary matrices with linked data.frames used to describe rows and columns. Similar toSummarizedExperiment
but with optimization specifically for list-to-matrix and matrix-to-list conversion. Their implementation is memory efficient as well.bgaPlotly3d()
takes a"bga"
class, as produced by themade4::bga()
function, and produces a 3-D plotly visualization. It adds ability to group centroids into "supergroups" which are connected by a spline 3-D curve.spline3d()
is the supporting function to draw interpolated 3-D curves through a set of coordinates.shrinkMatrix()
shrinks a numeric matrix by groups of rows, essentially a wrapper function for thedata.table
package functions.