Skip to content

Commit

Permalink
* 因 R 包升级而更新代码,涉及 ggplot2 和igraph
Browse files Browse the repository at this point in the history
* 去掉空白的小节,去掉纸质书没有的部分章节
  • Loading branch information
XiangyunHuang committed May 21, 2024
1 parent 0cfcc5d commit 048de57
Show file tree
Hide file tree
Showing 7 changed files with 23 additions and 89 deletions.
1 change: 1 addition & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ Imports:
ggpointdensity,
gifski,
heatmaply,
hexbin,
igraph,
leaflet,
magick,
Expand Down
2 changes: 1 addition & 1 deletion _bookdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ rmd_files:
- "programming.Rmd"
- "tricks.Rmd"
- "gui.Rmd"
- "msg.Rmd"
- "msg-pkgs.Rmd"
- "postscript.Rmd"

- "references.Rmd"
16 changes: 9 additions & 7 deletions data.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -136,13 +136,15 @@ summary(canabalt)
我们关心的重点当然是得分,因此拿到这批数据我们可以先看一下得分的分布,例如用直方图;其次我们会考虑游戏得分和平台是否有关,高分玩家会因为什么原因死亡,等等,这都是基于离散变量的连续变量比较,一个自然而然的选择就是对离散变量的每一分类分别画图。图 \@ref(fig:canabalt-boxplot) 是基于离散变量的不同分类的箱线图,从图中可以看出,iPad 玩家的平均得分较高,这可能是因为 iPad 相比起 iPhone 或者 iPod touch 来说屏幕较大,玩家易于控制,也可能是因为 iPad 需要专门开机,不像另外两个平台随时都能打开玩,因此 iPad 玩家玩起来会更集中精力。至于死因,由于作者对这款游戏并不在行,玩了几次,得到的结果都是因为跳得不够高而撞墙坠落摔死,最多能跑几百米,因此不了解其它死因的场景。因为撞墙摔死的玩家中有很多人得分超高,看来这种障碍的难度并不小。注意我们画箱线图时,对死因做了重新排序 — 按照得分的中位数排序,这样能方便读者阅读这幅图,否则,读者需要额外花费功夫用眼睛对箱线图排序,对读者来说是不必要的阅读负担。按照原始数据的顺序画图尤其是条形图和饼图中常见的问题,其实排序对于制图者只是举手之劳,对读者却能带来很大的方便。

```{r canabalt-boxplot,fig.cap="(ref:canabalt-boxplot)",fig.scap="游戏得分在不同游戏平台以及死因下的比较",message=FALSE}
canabalt_g1 <- qplot(device, score, data = canabalt, geom = "boxplot") +
canabalt_g1 <- ggplot(aes(device, score), data = canabalt) +
geom_boxplot() +
coord_flip()
canabalt_g2 <- qplot(reorder(death, score, median), score,
data = canabalt,
geom = "boxplot", xlab = "death"
canabalt_g2 <- ggplot(aes(reorder(death, score, median), score),
data = canabalt
) +
coord_flip()
geom_boxplot() +
coord_flip() +
labs(xlab = "death")
library(cowplot)
plot_grid(canabalt_g1, canabalt_g2, ncol = 1)
```
Expand Down Expand Up @@ -478,10 +480,10 @@ $$R_{i,j}=\frac{\mbox{同时出现高频词}i\mbox{和}{j}\mbox{的词的数目}
```{r song-high-freq,fig.cap="(ref:song-high-freq)",fig.scap=" 宋词前 100 高频词的关系网络图 "}
library(igraph, warn.conflicts = FALSE)
load(system.file("extdata", "HighFreq100.rda", package = "MSG"))
g <- graph.adjacency((HighFreq100 > 0.05) * HighFreq100,
g <- graph_from_adjacency_matrix((HighFreq100 > 0.05) * HighFreq100,
mode = "undirected", weighted = TRUE, diag = FALSE
)
cg <- clusters(g)
cg <- components(g)
colbar <- as.numeric(as.factor(cg$csize[cg$membership + 1]))
V(g)$color <- rev(heat.colors(9))[colbar]
Expand Down
20 changes: 8 additions & 12 deletions gallery.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ library(ggplot2)
library(cowplot)
p <- ggplot(aes(waiting), data = geyser)
p1 <- p + geom_histogram(breaks = seq(40, 110, by = 5))
p2 <- p + geom_histogram(breaks = seq(40, 110, by = 5), aes(y = ..density..))
p2 <- p + geom_histogram(breaks = seq(40, 110, by = 5), aes(y = after_stat(density)))
p3 <- p + geom_histogram(breaks = seq(40, 110, by = 10))
p4 <- p + geom_histogram(breaks = seq(42, 108, by = 2), fill = "red", color = "black")
plot_grid(p1, p2, p3, p4, labels = c(
Expand Down Expand Up @@ -109,7 +109,6 @@ p2 + geom_density(fill = "lightgray", color = "black") +

由于直方图需要对连续型数据做离散分组,因此它有一个明显的缺点,就是它的形状依赖于分组的端点,例如若有好几个相同的数值正好处在分组端点上,那么我们只要稍微向左或向右移动一下分组端点,这些数据点就会被划分入不同的区间,导致矩形条的高度变化。@Scott92 提出了一种解决这种直方图不稳定性问题的办法叫“移动平均直方图”(Average Shifted Histogram,简称 ASH),它的思想是使用一系列移动的区间去划分数据,比如 $(b_1+ih/n,b_2+ih/n,\ldots,b_n+ih/n)$,$i=0,\cdots,n-1$,最后将这 $n$ 种划分方法的频数结果"平均"起来,就得到了 ASH 图,这样有效避免了边界点的归属问题。然而,在核密度估计理论已经非常完备的今天,我们几乎没有必要再用这种技巧去克服原来的问题了,毕竟 ASH 与核密度估计比起来显得还是太粗糙。图 \@ref(fig:hist-density) 的核密度曲线基于函数 `density()` 计算而来,它的参数包括核函数和窗宽等,实际应用中我们可能需要尝试不同的核函数以及窗宽值,@Venables02 第 5.6 小节介绍了一些选择的经验可供参考。

[密度曲线的延伸 --- 岭线 ggridges <https://github.com/clauswilke/ggridges>]{.todo}

## 茎叶图

Expand Down Expand Up @@ -468,7 +467,7 @@ ggplot(contour_grid_tidy, aes(x, y)) +
scale_x_continuous(limits = c(0.5, 4.5), labels = function(x) paste("x =", x)) +
scale_y_continuous(limits = c(0.5, 3.5), labels = function(x) paste("x =", x)) +
geom_polygon(data = contour_grid_polygon, fill = NA, color = "black", lty = 2) +
geom_segment(aes(x = 2, y = 2, xend = 3, yend = 2)) +
geom_segment(aes(x = 2, y = y, xend = 3, yend = y), data = data.frame(y = 2)) +
theme_bw()
```

Expand Down Expand Up @@ -597,7 +596,7 @@ coplot(lat ~ long | depth,
(ref:fig-curve) 函数 $f(x)=\mathrm{sin}(\mathrm{cos}(x)*\mathrm{exp}(-x/2))$ 的曲线图(上)和均匀分布 $U(-1,1)$ 的特征函数图(下)。

```{r curve,fig.width=4.8,fig.height=5,fig.cap="(ref:fig-curve)",fig.scap="(ref:fig-curve-s)",dev='tikz',fig.process=to_png,fig.showtext=FALSE,small.mar=FALSE}
par(par(mar = c(4.5, 4, 0.2, 0.2)), mfrow = c(2, 1))
par(mar = c(4.5, 4, 0.2, 0.2), mfrow = c(2, 1))
chippy <- function(x) sin(cos(x) * exp(-x / 2))
curve(chippy, -8, 7, n = 2008, xlab = "$x$", ylab = "$\\mathrm{chippy}(x)$")
curve(sin(x) / x, from = -20, to = 20, n = 200,
Expand Down Expand Up @@ -939,9 +938,6 @@ usage(graphics:::pairs.formula)
```{r define-pairs-panel}
# 观察如何使用 hist() 做计算并用 rect() 画图
panel.hist <- function(x, ...) {
usr <- par("usr")
on.exit(par(usr))
par(usr = c(usr[1:2], 0, 1.5))
h <- hist(x, plot = FALSE)
nB <- length(breaks <- h$breaks)
y <- h$counts / max(h$counts)
Expand All @@ -956,10 +952,10 @@ panel.hist <- function(x, ...) {
```{r pairs,fig.width=4.8,fig.height=4.8,fig.cap="(ref:fig-pairs)",fig.scap="(ref:fig-pairs-s)"}
idx <- as.integer(iris[["Species"]])
pairs(iris[1:4],
upper.panel = function(x, y, ...)
points(x, y, pch = c(17, 16, 6)[idx], col = idx),
pch = 20, oma = c(2, 2, 2, 2),
lower.panel = panel.smooth, diag.panel = panel.hist
upper.panel = function(x, y, ...) {
points(x, y, pch = c(17, 16, 6)[idx], col = idx)
},
lower.panel = panel.smooth, diag.panel = panel.hist
)
```

Expand Down Expand Up @@ -1763,7 +1759,7 @@ usage(ggparcoord)

```{r ggparcoord,fig.width=4.8,fig.height=2.5,fig.cap="(ref:fig-ggparcoord)",fig.scap="(ref:fig-ggparcoord-s)"}
ggparcoord(iris, columns = 1:4, groupColumn = 5, scale = "uniminmax") +
geom_line(size = 1.2)
geom_line(linewidth = 1.2)
```


Expand Down
File renamed without changes.
69 changes: 2 additions & 67 deletions system.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -80,12 +80,12 @@ ggplot(aes(x = carat, y = price), data = diamonds) +

```{r ggplot2-violin}
ggplot(diamonds, aes(x = price)) +
stat_density(aes(ymax = ..density.., ymin = -..density..),
stat_density(aes(ymax = after_stat(density), ymin = -after_stat(density)),
geom = "ribbon", position = "identity"
)
```

注意其中 density 变量的两边都需要用 `..` 围起来,这是 ggplot2 的语法规定,这种写法表示变量从统计量函数中计算而来,并非原始数据自带的。 ribbon 是带状的几何形状,本质上是多边形,通常带有填充色。
注意其中 density 变量需要传递给函数 `after_stat()`,这是 ggplot2 的语法规定,这种写法表示变量从统计量函数中计算而来,并非原始数据自带的。 ribbon 是带状的几何形状,本质上是多边形,通常带有填充色。

### 标度

Expand Down Expand Up @@ -162,12 +162,6 @@ ggplot(aes(x = carat), data = diamonds) +

切片的版面设置除了上面介绍的行列排列之外,还有一种从左到右、从上到下的排列方式,有时候切片生成的子集数目如果太多的话,无论按行或按列摆放可能都摆不下,这时候可以考虑这种顺序排列的方式,参见 `facet_wrap()` 的帮助文档(前一种排列叫 `facet_grid()`)。

### 分组 {#subsec:group}

[介绍 group 参数和切片/分面的区别]{.todo}



### 位置调整

位置调整主要针对条形图中的矩形条的位置摆放。在 \@ref(sec:barplot) 小节中我们讲到了基础图形系统中的条形图,里面有个 beside 参数可以指定矩形条是并排排列还是堆砌排列,ggplot2 系统中的位置调整也类似。当然,不仅条形图中有矩形条,直方图中也有,所以我们同样可以画堆砌直方图。另外在散点图中也有一类重要的位置调整,即随机打乱,这一点在 \@ref(sec:stripchart) 小节和图 \@ref(fig:discrete-var) 中都提到过。略微随机打乱散点图中的点的位置,能减轻图的重叠程度,尤其是有很多个点都在同一个位置上时,由于重叠的原因,我们可能会被误导(以为该处只有 1 个点)。随机打乱也可以作为一种几何形状添加到图中,如:
Expand All @@ -178,65 +172,6 @@ ggplot(aes(x = Petal.Length, y = Petal.Width), data = iris) +
geom_jitter(color = "red") # 对比随机打乱的散点
```

[介绍 ggrepel <https://github.com/slowkow/ggrepel> 和 ggbeeswarm <https://github.com/eclarke/ggbeeswarm>]{.todo}

### 图例 {#subsec:legend}

添加、取消、修改图例

二维的图例 [biscale](https://github.com/slu-openGIS/biscale)[multiscales](https://github.com/clauswilke/multiscales)[ggnewscale](https://github.com/eliocamp/ggnewscale)

### 注释 {#subsec:annotation}

### 字体 {#subsec:font}

相比于 Base R 绘图系统,ggplot2 系统可以更加精细地设置各处字体

```{r font-in-ggplot,fig.cap="(ref:font-in-ggplot)",fig.scap="(ref:font-in-ggplot-s)",message=FALSE,fig.showtext=FALSE}
p1 <- ggplot(pressure, aes(x = temperature, y = pressure)) +
geom_point()
p2 <- p1 + theme(
axis.title = element_text(family = "sans"),
axis.text = element_text(family = "serif")
)
p3 <- p1 + labs(x = "温度", y = "压力") +
theme(
axis.title = element_text(family = "GB1"),
axis.text = element_text(family = "serif")
)
p4 <- p1 + labs(
x = "温度", y = "压力", title = "散点图",
subtitle = "Vapor Pressure of Mercury as a Function of Temperature",
caption = paste("Data on the relation between temperature in degrees Celsius and",
"vapor pressure of mercury in millimeters (of mercury).",
sep = "\n"
)
) +
theme(
axis.title = element_text(family = "GB1"),
axis.text.x = element_text(family = "serif"),
axis.text.y = element_text(family = "sans"),
title = element_text(family = "GB1"),
plot.subtitle = element_text(family = "sans", size = rel(0.7)),
plot.caption = element_text(family = "sans", size = rel(0.6))
)
library(cowplot)
plot_grid(p1, p2, p3, p4,
labels = c(
"默认字体设置", "英文字体设置",
"中文字体设置", "任意字体设置"
), label_fontfamily = "GB1", ncol = 2,
label_x = 0.1, label_y = 0.6
)
```

(ref:font-in-ggplot-s) 在 ggplot2 绘图系统中设置中英文字体

(ref:font-in-ggplot) 在 ggplot2 绘图系统中,字体可以非常容易地做到精细的设置,轴的标题、标签和图的主、副标题等都可以设置不同的字体、字号和字样

### 配色 {#subsec:color}

RColorBrewer 和 colorspace

### 主题 {#subsec:theme}

Expand Down
4 changes: 2 additions & 2 deletions tools.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -81,12 +81,12 @@ R 的官方网站 <https://www.R-project.org> 中对 R 有详细介绍,我们

(ref:fig-ggplot2-minard-s) 在 R 中用 **ggplot2** 包重制拿破仑远征图

```{r ggplot2-minard, fig.cap='(ref:fig-ggplot2-minard)', fig.scap='(ref:fig-ggplot2-minard-s)', fig.width=4.9, fig.height=2.5}
```{r ggplot2-minard, fig.cap='(ref:fig-ggplot2-minard)', fig.scap='(ref:fig-ggplot2-minard-s)', fig.width=6, fig.height=3}
troops <- read.table(system.file("extdata", "troops.txt", package = "MSG"), header = TRUE)
cities <- read.table(system.file("extdata", "cities.txt", package = "MSG"), header = TRUE)
library(ggplot2)
p <- ggplot(cities, aes(x = long, y = lat)) # 框架
p <- p + geom_path(aes(size = survivors, colour = direction, group = group),
p <- p + geom_path(aes(linewidth = survivors, colour = direction, group = group),
data = troops, lineend = "round") # 军队路线
p <- p + geom_point() # 城市点
p <- p + geom_text(aes(label = city), hjust = 0, vjust = 1, size = 2.5) # 城市名称
Expand Down

0 comments on commit 048de57

Please sign in to comment.