-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathREADME.Rmd
300 lines (248 loc) · 11.6 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
---
output:
github_document:
fig_width: 5
fig_height: 3
df_print: kable
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r knitr_init, echo=FALSE}
knitr::opts_chunk$set(
collapse=TRUE,
warning=FALSE,
message=FALSE,
comment="#>",
fig.path="man/figures/README-"
);
options(knitr.table.format='markdown')
```
# jamba
The goal of jamba is to provide useful custom functions for R data
analysis and visualization.
## Package Reference
A full online function reference is available via the pkgdown
documentation:
[Full jamba command reference](https://jmw86069.github.io/jamba)
Functions are categorized, some examples are listed below:
## Background
The R functions in `jamba` have been built up, used, tested, revised
over several years. They are immediately useful for day-to-day work,
and efficient and robust enough for production pipelines.
Many were inspired by discussion from Stackoverflow, R-help, or
Bioconductor, with citations thanking principal author(s).
Many thanks to the original authors! The R community is built
upon the collective greatness of its contributors!
Most of the functions are designed around workflows for Bioinformatics
analyses, where functions need to be efficient when operating over
10,000 to 100,000 elements. (They work quite well with millions as well.)
Usually the speed gains are obvious with about 100 elements, then
scale linearly (or worse) as the number increases. I and others
use these functions *all the time*.
One example function `writeOpenxlsx()` is a simple wrapper around
very useful `openxlsx::write.xlsx()`, which also applies column formatting
for column types: P-values, fold changes, log2 fold changes, numeric,
and integer values. Columns use conditional Excel formatting to
apply color-shading to cells for each type.
Similarly, `readOpenxlsx()` is a wrapper function to `openxlsx::read.xlsx()`
which reads each worksheet and returns a `list` of `data.frame` objects.
It can detect multi-row column headers, for which it returns combined
column names. It also applies equivalent of `check.names=FALSE` so
column names are returned without change.
Small and large efficiencies are used wherever possible.
The `mixedSort()` functions are based upon `gtools::mixedsort()`,
with additional optimizations for speed and custom needs. It sorts
chromosome names, gene names, micro-RNA names, etc.
## Example R functions
### Efficient alphanumeric sort
* `mixedSort()` - highly efficient alphanumeric sort, for example chr1, chr2, chr3, chr10, etc.
* `mixedSortDF()` - as above, applied to columns in a `data.frame`
(or `matrix`, `tibble`, `DataFrame`, etc.)
* `mixedSorts()` - as above, applied to a list of vectors with no speed loss.
Example:
```{r, mixedSort, echo=FALSE}
x <- sort(c(
"miR-12","miR-1","miR-122","miR-1b",
"miR-1a","miR-2", "miR-22",
"ABCA2", "ABCA12"));
df1 <- data.frame(
miRNA=x,
sort_rank=seq_along(x),
mixedSort_rank=order(jamba::mixedOrder(x)),
check.names=FALSE,
stringsAsFactors=FALSE);
df2 <- jamba::mixedSortDF(df1);
df2;
```
### Base R plotting
These functions help with base R plots, in all those little cases when
the amazing `ggplot2` package is not a smooth fit.
* `nullPlot()` - convenient "blank" base R plot, optionally displays margins
* `plotSmoothScatter()` - smooth scatter `plot()` for point density, enhanced
over `smoothScatter()`
```{r, plotSmoothScatter, echo=FALSE}
require(jamba);
set.seed(123);
x <- matrix(ncol=2, data=rnorm(40000*2));
x[,2] <- x[,1] + rnorm(40000)*0.15;
x[1:3000,] <- t(t(x[1:3000,,drop=FALSE])+c(0.6,-0.7));
x[1:2000,2] <- x[1:2000,1] + rnorm(1000)*0.5;
opar <- par("mfrow"=c(1,2), "mar"=c(2.5, 2, 3, 0.5));
smoothScatter(x, main="smoothScatter()");
plotSmoothScatter(x, main="plotSmoothScatter()");
par(opar);
```
* `plotPolygonDensity()` - fast density/histogram plot for vector or matrix
```{r, plotPolygonDensity, echo=FALSE}
plotPolygonDensity(x[,1] - x[,2]);
```
* `imageDefault()` - enhanced `image()` that enables raster output with
consistent pixel aspect ratio.
* `imageByColors()` - wrapper to `image()` for a matrix or data.frame of colors,
with optional labels
```{r, imageByColors, echo=FALSE}
set.seed(23);
opar <- par("mar"=c(1,1,1,1));
m <- matrix(sample(colors(), 9), ncol=3)
imageByColors(m, cellnote=m);
par(opar);
```
* `minorLogTicksAxis()` - log-transformed axis labels, flexible log base, and
option for properly adjusted `log2(1 + x)` format.
* `sqrtAxis()` - draw a square-root transformed axis, with proper labels.
* `drawLabels()` - draw square colorized text labels
* `shadowText()` - replacement for `text()` that draws shadows or outlines.
```{r, labels, echo=FALSE}
opar <- par("mar"=c(1,1,1,1));
nullPlot(fill="navy", plotAreaTitle="", doMargins=FALSE);
text <- shadowText;
drawLabels(txt="shadowText() label",
boxColor=alpha2col("palegoldenrod", 0.6),
labelCol=setTextContrastColor(alpha2col("palegoldenrod", 0.6),
bg="navy"),
x=1.6, y=1.3, labelCex=2);
rm(text);
drawLabels(txt="text() label",
boxColor=alpha2col("palegoldenrod", 0.6),
labelCol=setTextContrastColor(alpha2col("palegoldenrod", 0.6),
bg="navy"),
x=1.3, y=1.7, labelCex=2);
par(opar);
```
* `groupedAxis()` - grouped axis labels to show regions/ranges
* `decideMfrow()` - determine appropriate value for `par("mfrow")` for multipanel
output in base R plotting.
* `getPlotAspect()` - determine visible plot aspect ratio.
### Excel export
Every Bioinformatician/statistician needs to write data to Excel,
the `writeOpenxlsx()` function is consistent and makes it look pretty.
You can save numerous worksheets in a single
Excel file, without having to go back and custom-format everything.
* `writeOpenxlsx()` - flexible Excel exporter, with categorical and conditional
colors.
* `applyXlsxCategoricalFormat()` - apply categorical colors to Excel
* `applyXlsxConditionalFormat()` - apply conditional colors to Excel
### Color
Almost everything uses color somewhere, especially on R console,
and in every R plot.
* `getColorRamp()` - flexible to create or retrieve color gradients
* `warpRamp()` - "bend" a color gradient to enhance the visual range
* `color2gradient()` - convert a color to gradient of n colors; or do the
same for a vector
* `makeColorDarker()` - adjust darkness and saturation
* `showColors()` - display a vector or list of colors
* `fixYellow()` - adjust the weird green-yellow, by personal preference
* `printDebug()` - pretty colorized text output using `crayon` package.
```{r, colorshow, echo=FALSE}
opar=par("mar"=c(1,1,1,1));
rainbowv <- c("red","yellow","green","cyan","blue","magenta");
colorlist <- list(
viridis_lens5=rep(each=2, getColorRamp("viridis", n=12, lens=5)),
viridis=rep(each=2, getColorRamp("viridis", n=12)),
`viridis_lens-5`=rep(each=2, getColorRamp("viridis", n=12, lens=-5)),
RdBu_r_lens5=rep(each=2, getColorRamp("RdBu_r", n=11, lens=5)),
RdBu_r=rep(each=2, getColorRamp("RdBu_r", n=11)),
`RdBu_r_lens-5`=rep(each=2, getColorRamp("RdBu_r", n=11, lens=-5)),
rainbow=rep(rainbowv, each=4),
rainbow_gradient2=rep(each=2, color2gradient(rainbowv, n=2, gradientWtFactor=1/3)),
rainbow_gradient4=color2gradient(rainbowv, n=4, gradientWtFactor=1/3)
)
showColors(colorlist, xaxt="n", labelCells=FALSE);
par(opar);
```
### List
Cool methods to operate on super-long lists in one call, to avoid looping
through the list either with `for()` loops, `lapply()` or `map()` functions.
* `cPaste()` - highly efficient `paste()` over a large list of vectors
* `cPasteS()` - as above but using `mixedSort()` before `paste()`.
* `cPasteU()` - as above but using `uniques()` before `paste()`.
* `cPasteSU()` - as above but using `mixedSort()` and `uniques()` before `paste()`.
* `uniques()` - efficient `unique()` over a list of vectors
* `sclass()` - runs `class()` on a list
* `sdim()`, `ssdim()` - dimensions of list objects, or nested list of lists
* `rbindList()` - efficient `do.call(rbind, ...)` to bind rows into a matrix or data.frame,
useful when following `strsplit()`.
* `mergeAllXY()` - merge a list of `data.frame` objects
* `rmNULL()` - remove NULL or empty elements from a list, with optional replacement
### Names
We use R names as an additional method to make sure everything is
kept in the proper order. Many R functions return results using input
names, so it helps to have a really solid naming strategy.
For the R functions that remove names -- I highly recommend
adding them back yourself!
* `makeNames()` - make unique names, using flexible logic
* `nameVector()` - add names to a vector, using its own value, or supplied names
* `nameVectorN()` - make named vector using the names of a vector (useful inside `lapply()`)
or any function that returns data using names of the input vector.
### data.frame/matrix/tibble
* `pasteByRow()` - fast, flexible row-paste with delimiters, optionally remove blanks
* `pasteByRowOrdered()` - as above but returns ordered factor, using existing
factor orders from each column when present
* `rowGroupMeans()`, `rowRmMadOutliers()` - efficient grouped row functions
* `mergeAllXY()` - merge a list of `data.frame` into one
* `renameColumn()` - rename columns `from` and `to`.
* `kable_coloring()` - flexible colorized `data.frame` output in Rmarkdown.
### String / grep
* `gsubOrdered()` - gsub that returns ordered factor, maintians the previous factor order
* `grepls()` - grep the environment (including attached packages) for object names
* `vgrep()`, `vigrep()` - value-grep shortcut
* `unvgrep()`, `unvigrep()` - un-grep -- remove matched results from the output.
* `provigrep()` - progressive grep, searches each pattern in order, returning
results in that order
* `igrepHas()` - rapid case-insensitive grep presence/absense test
* `ucfirst()` - upper-case the first letter of each word.
* `padString()`, `padInteger()` - produce strings from numeric values with
consistent leading zeros.
### Numeric
* `noiseFloor()` - apply noise floor (and ceiling) with flexible replacement values
* `warpAroundZero()` - warp a numeric vector symmetrically around zero
* `rowGroupMeans()`, `rowRmMadOutliers()` - efficient grouped row functions
* `deg2rad()`, `rad2deg()` - convert degrees to radians
* `rmNA()` - remove NA values, with optional replacement
* `rmInfinite()` - remove infinite values, with optional replacement.
* `formatInt()` - convenient `format()` for integer output, with comma-delimiter by default
### Practical / helpful
* `jargs()` - pretty function arguments, optional pattern search argument name
```{r, jargs}
jargs(plotSmoothScatter)
```
* `sdim()`, `ssdim()` - dimensions of list objects, or nested list of lists
* `sdima()` - runs `sdim()` on the attributes of an object.
* `isTRUEV()`, `isFALSEV()` - vectorized test for TRUE or FALSE values,
since `isTRUE()` only operates on single values, and does not allow `NA`.
### R console
* `printDebug()` - pretty colorized text output using `crayon` package.
* `setPrompt()` - pretty colorized R console prompt with project name and R version
* `newestFile()` - most recently modified file from a vector of files
## Other related Jam packages
* `jamma` -- MA-plots (also known as "mean-variance", "Bland-Altman",
or "mean-difference" plots), relies upon `jamba::plotSmoothScatter()`;
`centerGeneData()` to apply flexible row-centering with optional
groups and control samples;
`jammanorm()` - normalize data based upon MA-plot output
* `colorjam` -- `colorjam::rainbowJam()`
for scalable categorical colors using alternating luminance and chroma values.
* `genejam` -- fast, consistent conversion of gene symbols to the most current
gene nomenclature
* `splicejam` -- Sashimi plots for RNA-seq data
* `multienrichjam` -- multiple gene set enrichment analysis and visualization
* `platjam` -- platform technology functions, importers for NanoString