* 因 R 包升级而更新代码，涉及 ggplot2 和igraph

* 去掉空白的小节，去掉纸质书没有的部分章节
XiangyunHuang · May 21, 2024 · 048de57 · 048de57
1 parent 0cfcc5d
commit 048de57
Show file tree

Hide file tree

Showing 7 changed files with 23 additions and 89 deletions.
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -20,6 +20,7 @@ Imports:
     ggpointdensity,
     gifski,
     heatmaply,
+    hexbin,
     igraph,
     leaflet,
     magick,

diff --git a/_bookdown.yml b/_bookdown.yml
@@ -28,7 +28,7 @@ rmd_files:
 - "programming.Rmd"
 - "tricks.Rmd"
 - "gui.Rmd"
-- "msg.Rmd"
+- "msg-pkgs.Rmd"
 - "postscript.Rmd"
 
 - "references.Rmd"
diff --git a/data.Rmd b/data.Rmd
@@ -136,13 +136,15 @@ summary(canabalt)
 我们关心的重点当然是得分，因此拿到这批数据我们可以先看一下得分的分布，例如用直方图；其次我们会考虑游戏得分和平台是否有关，高分玩家会因为什么原因死亡，等等，这都是基于离散变量的连续变量比较，一个自然而然的选择就是对离散变量的每一分类分别画图。图 \@ref(fig:canabalt-boxplot) 是基于离散变量的不同分类的箱线图，从图中可以看出，iPad 玩家的平均得分较高，这可能是因为 iPad 相比起 iPhone 或者 iPod touch 来说屏幕较大，玩家易于控制，也可能是因为 iPad 需要专门开机，不像另外两个平台随时都能打开玩，因此 iPad 玩家玩起来会更集中精力。至于死因，由于作者对这款游戏并不在行，玩了几次，得到的结果都是因为跳得不够高而撞墙坠落摔死，最多能跑几百米，因此不了解其它死因的场景。因为撞墙摔死的玩家中有很多人得分超高，看来这种障碍的难度并不小。注意我们画箱线图时，对死因做了重新排序 — 按照得分的中位数排序，这样能方便读者阅读这幅图，否则，读者需要额外花费功夫用眼睛对箱线图排序，对读者来说是不必要的阅读负担。按照原始数据的顺序画图尤其是条形图和饼图中常见的问题，其实排序对于制图者只是举手之劳，对读者却能带来很大的方便。
 
 ```{r canabalt-boxplot,fig.cap="(ref:canabalt-boxplot)",fig.scap="游戏得分在不同游戏平台以及死因下的比较",message=FALSE}
-canabalt_g1 <- qplot(device, score, data = canabalt, geom = "boxplot") +
+canabalt_g1 <- ggplot(aes(device, score), data = canabalt) +
+  geom_boxplot() +
   coord_flip()
-canabalt_g2 <- qplot(reorder(death, score, median), score,
-  data = canabalt,
-  geom = "boxplot", xlab = "death"
+canabalt_g2 <- ggplot(aes(reorder(death, score, median), score),
+  data = canabalt
 ) +
-  coord_flip()
+  geom_boxplot() +
+  coord_flip() +
+  labs(xlab = "death")
 library(cowplot)
 plot_grid(canabalt_g1, canabalt_g2, ncol = 1)
 ```
@@ -478,10 +480,10 @@ $$R_{i,j}=\frac{\mbox{同时出现高频词}i\mbox{和}{j}\mbox{的词的数目}
 ```{r song-high-freq,fig.cap="(ref:song-high-freq)",fig.scap=" 宋词前 100 高频词的关系网络图 "}
 library(igraph, warn.conflicts = FALSE)
 load(system.file("extdata", "HighFreq100.rda", package = "MSG"))
-g <- graph.adjacency((HighFreq100 > 0.05) * HighFreq100,
+g <- graph_from_adjacency_matrix((HighFreq100 > 0.05) * HighFreq100,
   mode = "undirected", weighted = TRUE, diag = FALSE
 )
-cg <- clusters(g)
+cg <- components(g)
 colbar <- as.numeric(as.factor(cg$csize[cg$membership + 1]))
 V(g)$color <- rev(heat.colors(9))[colbar]
 

diff --git a/gallery.Rmd b/gallery.Rmd
@@ -73,7 +73,7 @@ library(ggplot2)
 library(cowplot)
 p <- ggplot(aes(waiting), data = geyser)
 p1 <- p + geom_histogram(breaks = seq(40, 110, by = 5))
-p2 <- p + geom_histogram(breaks = seq(40, 110, by = 5), aes(y = ..density..))
+p2 <- p + geom_histogram(breaks = seq(40, 110, by = 5), aes(y = after_stat(density)))
 p3 <- p + geom_histogram(breaks = seq(40, 110, by = 10))
 p4 <- p + geom_histogram(breaks = seq(42, 108, by = 2), fill = "red", color = "black")
 plot_grid(p1, p2, p3, p4, labels = c(
@@ -109,7 +109,6 @@ p2 + geom_density(fill = "lightgray", color = "black") +
 
 由于直方图需要对连续型数据做离散分组，因此它有一个明显的缺点，就是它的形状依赖于分组的端点，例如若有好几个相同的数值正好处在分组端点上，那么我们只要稍微向左或向右移动一下分组端点，这些数据点就会被划分入不同的区间，导致矩形条的高度变化。@Scott92 提出了一种解决这种直方图不稳定性问题的办法叫“移动平均直方图”（Average Shifted Histogram，简称 ASH），它的思想是使用一系列移动的区间去划分数据，比如 $(b_1+ih/n,b_2+ih/n,\ldots,b_n+ih/n)$，$i=0,\cdots,n-1$，最后将这 $n$ 种划分方法的频数结果"平均"起来，就得到了 ASH 图，这样有效避免了边界点的归属问题。然而，在核密度估计理论已经非常完备的今天，我们几乎没有必要再用这种技巧去克服原来的问题了，毕竟 ASH 与核密度估计比起来显得还是太粗糙。图 \@ref(fig:hist-density) 的核密度曲线基于函数 `density()` 计算而来，它的参数包括核函数和窗宽等，实际应用中我们可能需要尝试不同的核函数以及窗宽值，@Venables02 第 5.6 小节介绍了一些选择的经验可供参考。
 
-[密度曲线的延伸 --- 岭线 ggridges <https://github.com/clauswilke/ggridges>]{.todo}
 
 ## 茎叶图
 
@@ -468,7 +467,7 @@ ggplot(contour_grid_tidy, aes(x, y)) +
   scale_x_continuous(limits = c(0.5, 4.5), labels = function(x) paste("x =", x)) +
   scale_y_continuous(limits = c(0.5, 3.5), labels = function(x) paste("x =", x)) +
   geom_polygon(data = contour_grid_polygon, fill = NA, color = "black", lty = 2) +
-  geom_segment(aes(x = 2, y = 2, xend = 3, yend = 2)) +
+  geom_segment(aes(x = 2, y = y, xend = 3, yend = y), data = data.frame(y = 2)) +
   theme_bw()
 ```
 
@@ -597,7 +596,7 @@ coplot(lat ~ long | depth,
 (ref:fig-curve) 函数 $f(x)=\mathrm{sin}(\mathrm{cos}(x)*\mathrm{exp}(-x/2))$ 的曲线图（上）和均匀分布 $U(-1,1)$ 的特征函数图（下）。
 
 ```{r curve,fig.width=4.8,fig.height=5,fig.cap="(ref:fig-curve)",fig.scap="(ref:fig-curve-s)",dev='tikz',fig.process=to_png,fig.showtext=FALSE,small.mar=FALSE}
-par(par(mar = c(4.5, 4, 0.2, 0.2)), mfrow = c(2, 1))
+par(mar = c(4.5, 4, 0.2, 0.2), mfrow = c(2, 1))
 chippy <- function(x) sin(cos(x) * exp(-x / 2))
 curve(chippy, -8, 7, n = 2008, xlab = "$x$", ylab = "$\\mathrm{chippy}(x)$")
 curve(sin(x) / x, from = -20, to = 20, n = 200, 
@@ -939,9 +938,6 @@ usage(graphics:::pairs.formula)
 ```{r define-pairs-panel}
 # 观察如何使用 hist() 做计算并用 rect() 画图
 panel.hist <- function(x, ...) {
-  usr <- par("usr")
-  on.exit(par(usr))
-  par(usr = c(usr[1:2], 0, 1.5))
   h <- hist(x, plot = FALSE)
   nB <- length(breaks <- h$breaks)
   y <- h$counts / max(h$counts)
@@ -956,10 +952,10 @@ panel.hist <- function(x, ...) {
 ```{r pairs,fig.width=4.8,fig.height=4.8,fig.cap="(ref:fig-pairs)",fig.scap="(ref:fig-pairs-s)"}
 idx <- as.integer(iris[["Species"]])
 pairs(iris[1:4],
-      upper.panel = function(x, y, ...)
-        points(x, y, pch = c(17, 16, 6)[idx], col = idx),
-      pch = 20, oma = c(2, 2, 2, 2),
-      lower.panel = panel.smooth, diag.panel = panel.hist
+  upper.panel = function(x, y, ...) {
+    points(x, y, pch = c(17, 16, 6)[idx], col = idx)
+  },
+  lower.panel = panel.smooth, diag.panel = panel.hist
 )
 ```
 
@@ -1763,7 +1759,7 @@ usage(ggparcoord)
 
 ```{r ggparcoord,fig.width=4.8,fig.height=2.5,fig.cap="(ref:fig-ggparcoord)",fig.scap="(ref:fig-ggparcoord-s)"}
 ggparcoord(iris, columns = 1:4, groupColumn = 5, scale = "uniminmax") + 
-  geom_line(size = 1.2)
+  geom_line(linewidth = 1.2)
 ```
 
 

diff --git a/msg.Rmd → msg-pkgs.Rmd b/msg.Rmd → msg-pkgs.Rmd
diff --git a/system.Rmd b/system.Rmd
@@ -80,12 +80,12 @@ ggplot(aes(x = carat, y = price), data = diamonds) +
 
 ```{r ggplot2-violin}
 ggplot(diamonds, aes(x = price)) +
-  stat_density(aes(ymax = ..density.., ymin = -..density..),
+  stat_density(aes(ymax = after_stat(density), ymin = -after_stat(density)),
     geom = "ribbon", position = "identity"
   )
 ```
 
-注意其中 density 变量的两边都需要用 `..` 围起来，这是 ggplot2 的语法规定，这种写法表示变量从统计量函数中计算而来，并非原始数据自带的。 ribbon 是带状的几何形状，本质上是多边形，通常带有填充色。
+注意其中 density 变量需要传递给函数 `after_stat()`，这是 ggplot2 的语法规定，这种写法表示变量从统计量函数中计算而来，并非原始数据自带的。 ribbon 是带状的几何形状，本质上是多边形，通常带有填充色。
 
 ### 标度
 
@@ -162,12 +162,6 @@ ggplot(aes(x = carat), data = diamonds) +
 
 切片的版面设置除了上面介绍的行列排列之外，还有一种从左到右、从上到下的排列方式，有时候切片生成的子集数目如果太多的话，无论按行或按列摆放可能都摆不下，这时候可以考虑这种顺序排列的方式，参见 `facet_wrap()` 的帮助文档（前一种排列叫 `facet_grid()`）。
 
-### 分组 {#subsec:group}
-
-[介绍 group 参数和切片/分面的区别]{.todo}
-
-
-
 ### 位置调整
 
 位置调整主要针对条形图中的矩形条的位置摆放。在 \@ref(sec:barplot) 小节中我们讲到了基础图形系统中的条形图，里面有个 beside 参数可以指定矩形条是并排排列还是堆砌排列，ggplot2 系统中的位置调整也类似。当然，不仅条形图中有矩形条，直方图中也有，所以我们同样可以画堆砌直方图。另外在散点图中也有一类重要的位置调整，即随机打乱，这一点在 \@ref(sec:stripchart) 小节和图 \@ref(fig:discrete-var) 中都提到过。略微随机打乱散点图中的点的位置，能减轻图的重叠程度，尤其是有很多个点都在同一个位置上时，由于重叠的原因，我们可能会被误导（以为该处只有 1 个点）。随机打乱也可以作为一种几何形状添加到图中，如：
@@ -178,65 +172,6 @@ ggplot(aes(x = Petal.Length, y = Petal.Width), data = iris) +
   geom_jitter(color = "red") # 对比随机打乱的散点
 ```
 
-[介绍 ggrepel <https://github.com/slowkow/ggrepel> 和 ggbeeswarm <https://github.com/eclarke/ggbeeswarm>]{.todo}
-
-### 图例 {#subsec:legend}
-
-添加、取消、修改图例
-
-二维的图例 [biscale](https://github.com/slu-openGIS/biscale) 和 [multiscales](https://github.com/clauswilke/multiscales) 和 [ggnewscale](https://github.com/eliocamp/ggnewscale)
-
-### 注释 {#subsec:annotation}
-
-### 字体 {#subsec:font}
-
-相比于 Base R 绘图系统，ggplot2 系统可以更加精细地设置各处字体
-
-```{r font-in-ggplot,fig.cap="(ref:font-in-ggplot)",fig.scap="(ref:font-in-ggplot-s)",message=FALSE,fig.showtext=FALSE}
-p1 <- ggplot(pressure, aes(x = temperature, y = pressure)) +
-  geom_point()
-p2 <- p1 + theme(
-  axis.title = element_text(family = "sans"),
-  axis.text = element_text(family = "serif")
-)
-p3 <- p1 + labs(x = "温度", y = "压力") +
-  theme(
-    axis.title = element_text(family = "GB1"),
-    axis.text = element_text(family = "serif")
-  )
-p4 <- p1 + labs(
-  x = "温度", y = "压力", title = "散点图",
-  subtitle = "Vapor Pressure of Mercury as a Function of Temperature",
-  caption = paste("Data on the relation between temperature in degrees Celsius and",
-    "vapor pressure of mercury in millimeters (of mercury).",
-    sep = "\n"
-  )
-) +
-  theme(
-    axis.title = element_text(family = "GB1"),
-    axis.text.x = element_text(family = "serif"),
-    axis.text.y = element_text(family = "sans"),
-    title = element_text(family = "GB1"),
-    plot.subtitle = element_text(family = "sans", size = rel(0.7)),
-    plot.caption = element_text(family = "sans", size = rel(0.6))
-  )
-library(cowplot)
-plot_grid(p1, p2, p3, p4,
-  labels = c(
-    "默认字体设置", "英文字体设置",
-    "中文字体设置", "任意字体设置"
-  ), label_fontfamily = "GB1", ncol = 2,
-  label_x = 0.1, label_y = 0.6
-)
-```
-
-(ref:font-in-ggplot-s) 在 ggplot2 绘图系统中设置中英文字体
-
-(ref:font-in-ggplot) 在 ggplot2 绘图系统中，字体可以非常容易地做到精细的设置，轴的标题、标签和图的主、副标题等都可以设置不同的字体、字号和字样
-
-### 配色 {#subsec:color}
-
-RColorBrewer 和 colorspace
 
 ### 主题 {#subsec:theme}
 

diff --git a/tools.Rmd b/tools.Rmd
@@ -81,12 +81,12 @@ R 的官方网站 <https://www.R-project.org> 中对 R 有详细介绍，我们
 
 (ref:fig-ggplot2-minard-s) 在 R 中用 **ggplot2** 包重制拿破仑远征图
 
-```{r ggplot2-minard, fig.cap='(ref:fig-ggplot2-minard)', fig.scap='(ref:fig-ggplot2-minard-s)', fig.width=4.9, fig.height=2.5}
+```{r ggplot2-minard, fig.cap='(ref:fig-ggplot2-minard)', fig.scap='(ref:fig-ggplot2-minard-s)', fig.width=6, fig.height=3}
 troops <- read.table(system.file("extdata", "troops.txt", package = "MSG"), header = TRUE)
 cities <- read.table(system.file("extdata", "cities.txt", package = "MSG"), header = TRUE)
 library(ggplot2)
 p <- ggplot(cities, aes(x = long, y = lat)) # 框架
-p <- p + geom_path(aes(size = survivors, colour = direction, group = group), 
+p <- p + geom_path(aes(linewidth = survivors, colour = direction, group = group), 
                    data = troops, lineend = "round") # 军队路线
 p <- p + geom_point() # 城市点
 p <- p + geom_text(aes(label = city), hjust = 0, vjust = 1, size = 2.5) # 城市名称