-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path线性学习机.html
357 lines (316 loc) · 17.1 KB
/
线性学习机.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
<!DOCTYPE html>
<html lang="" xml:lang="">
<head>
<meta charset="utf-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<title>Chapter 3 线性学习机 | 机器学习(2022 秋,80250993)</title>
<meta name="description" content="test" />
<meta name="generator" content="bookdown 0.26 and GitBook 2.6.7" />
<meta property="og:title" content="Chapter 3 线性学习机 | 机器学习(2022 秋,80250993)" />
<meta property="og:type" content="book" />
<meta property="og:description" content="test" />
<meta name="twitter:card" content="summary" />
<meta name="twitter:title" content="Chapter 3 线性学习机 | 机器学习(2022 秋,80250993)" />
<meta name="twitter:description" content="test" />
<meta name="author" content="Jim Zhang" />
<meta name="date" content="2022-09-24" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="apple-mobile-web-app-capable" content="yes" />
<meta name="apple-mobile-web-app-status-bar-style" content="black" />
<link rel="prev" href="错误.html"/>
<link rel="next" href="参考资料.html"/>
<script src="libs/jquery-3.6.0/jquery-3.6.0.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/fuse.min.js"></script>
<link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-table.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-highlight.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-search.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-fontsettings.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-clipboard.css" rel="stylesheet" />
<link href="libs/anchor-sections-1.1.0/anchor-sections.css" rel="stylesheet" />
<link href="libs/anchor-sections-1.1.0/anchor-sections-hash.css" rel="stylesheet" />
<script src="libs/anchor-sections-1.1.0/anchor-sections.js"></script>
<style type="text/css">
/* Used with Pandoc 2.11+ new --citeproc when CSL is used */
div.csl-bib-body { }
div.csl-entry {
clear: both;
}
.hanging div.csl-entry {
margin-left:2em;
text-indent:-2em;
}
div.csl-left-margin {
min-width:2em;
float:left;
}
div.csl-right-inline {
margin-left:2em;
padding-left:1em;
}
div.csl-indent {
margin-left: 2em;
}
</style>
<link rel="stylesheet" href="style.css" type="text/css" />
</head>
<body>
<div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">
<div class="book-summary">
<nav role="navigation">
<ul class="summary">
<li><a href="./">Machine Learning (2022 fall)</a></li>
<li class="divider"></li>
<li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> 前言</a>
<ul>
<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#课程大纲"><i class="fa fa-check"></i><b>1.1</b> 课程大纲</a></li>
<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#课程参考书"><i class="fa fa-check"></i><b>1.2</b> 课程参考书</a></li>
</ul></li>
<li class="chapter" data-level="2" data-path="错误.html"><a href="错误.html"><i class="fa fa-check"></i><b>2</b> 错误</a></li>
<li class="chapter" data-level="3" data-path="线性学习机.html"><a href="线性学习机.html"><i class="fa fa-check"></i><b>3</b> 线性学习机</a>
<ul>
<li class="chapter" data-level="3.1" data-path="线性学习机.html"><a href="线性学习机.html#本章基础知识"><i class="fa fa-check"></i><b>3.1</b> 本章基础知识</a>
<ul>
<li class="chapter" data-level="3.1.1" data-path="线性学习机.html"><a href="线性学习机.html#二次型的导数"><i class="fa fa-check"></i><b>3.1.1</b> 二次型的导数</a></li>
</ul></li>
<li class="chapter" data-level="3.2" data-path="线性学习机.html"><a href="线性学习机.html#线性判别分析-fisher-的线性判别法"><i class="fa fa-check"></i><b>3.2</b> 线性判别分析 —— Fisher 的线性判别法</a></li>
<li class="chapter" data-level="3.3" data-path="线性学习机.html"><a href="线性学习机.html#感知机"><i class="fa fa-check"></i><b>3.3</b> 感知机</a>
<ul>
<li class="chapter" data-level="3.3.1" data-path="线性学习机.html"><a href="线性学习机.html#感知机学习策略"><i class="fa fa-check"></i><b>3.3.1</b> 感知机学习策略</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="" data-path="参考资料.html"><a href="参考资料.html"><i class="fa fa-check"></i>参考资料</a></li>
<li class="divider"></li>
<li><a href="https://github.com/rstudio/bookdown" target="blank">Published with bookdown</a></li>
<li><a href="https://jimzhang.me" target="blank">Jim Zhang 的博客</a></li>
</ul>
</nav>
</div>
<div class="book-body">
<div class="body-inner">
<div class="book-header" role="navigation">
<h1>
<i class="fa fa-circle-o-notch fa-spin"></i><a href="./">机器学习(2022 秋,80250993)</a>
</h1>
</div>
<div class="page-wrapper" tabindex="-1" role="main">
<div class="page-inner">
<section class="normal" id="section-">
<div id="线性学习机" class="section level1 hasAnchor" number="3">
<h1><span class="header-section-number">Chapter 3</span> 线性学习机<a href="线性学习机.html#线性学习机" class="anchor-section" aria-label="Anchor link to header"></a></h1>
<div id="本章基础知识" class="section level2 hasAnchor" number="3.1">
<h2><span class="header-section-number">3.1</span> 本章基础知识<a href="线性学习机.html#本章基础知识" class="anchor-section" aria-label="Anchor link to header"></a></h2>
<div id="二次型的导数" class="section level3 hasAnchor" number="3.1.1">
<h3><span class="header-section-number">3.1.1</span> 二次型的导数<a href="线性学习机.html#二次型的导数" class="anchor-section" aria-label="Anchor link to header"></a></h3>
<p>假如现在有一个二次型 <span class="math inline">\(f(\boldsymbol{x})=\boldsymbol{x}^T P \boldsymbol{x}\)</span>,其中 <span class="math inline">\(P=(a_{ij})_{n\times n}\)</span> 是一个对称阵,那么其对 <span class="math inline">\(\boldsymbol{x}\)</span> 求导有:</p>
<p><span class="math display">\[
\frac{\partial f(\boldsymbol{x})}{\partial \boldsymbol{x}}
=\left(\frac{\partial(\sum_{i=1}^n\sum_{j=1}^n a_{ij} x_i x_j)}{\partial x_i}\right)_n
=2P\boldsymbol{x}
\]</span></p>
</div>
</div>
<div id="线性判别分析-fisher-的线性判别法" class="section level2 hasAnchor" number="3.2">
<h2><span class="header-section-number">3.2</span> 线性判别分析 —— Fisher 的线性判别法<a href="线性学习机.html#线性判别分析-fisher-的线性判别法" class="anchor-section" aria-label="Anchor link to header"></a></h2>
<p>先讨论一些线性可分的二分类问题:</p>
<div class="figure"><span style="display:block;" id="fig:unnamed-chunk-1"></span>
<img src="machine-learning-2022fall_files/figure-html/unnamed-chunk-1-1.png" alt="线性可分的二分类问题" width="90%" />
<p class="caption">
Figure 3.1: 线性可分的二分类问题
</p>
</div>
<p>这些点显然线性可分。我们知道,一个线性判别器的形式如下:</p>
<p><span class="math display">\[
f(\boldsymbol{x})=\boldsymbol{w}^T\boldsymbol{x}+w_0
\]</span></p>
<p>其中,<span class="math inline">\(y=f(\boldsymbol{x})\in\mathbb{R}, \boldsymbol{w}, \boldsymbol{x}\in\mathbb{R}^n, w_0\in\mathbb{R}\)</span>。</p>
<p>决策边界为:<span class="math inline">\(f(\boldsymbol{x})=0\)</span>,即:</p>
<p><span class="math display">\[
\begin{cases}
f(\boldsymbol{x})>0\rightarrow\boldsymbol{x}\in\omega_1 \\
f(\boldsymbol{x})<0\rightarrow\boldsymbol{x}\in\omega_2
\end{cases}
\]</span></p>
<p>或者使用 <span class="math inline">\(y=\text{sgn}(f(\boldsymbol{x}))=\text{sgn}(\boldsymbol{w}^T\boldsymbol{x}+w_0)\)</span> 会更方便一点:</p>
<p><span class="math display">\[
y = \text{sgn}(f(\boldsymbol{x})) =
\begin{cases}
1&\rightarrow\boldsymbol{x}\in\omega_1 \\
-1&\rightarrow\boldsymbol{x}\in\omega_2
\end{cases}
\]</span></p>
<p>那问题来了,在这个例子中线性可分会存在一定的操作空间(改变 <span class="math inline">\(\boldsymbol{w}^T\)</span> 和 <span class="math inline">\(w_0\)</span>,即调整方向和平移,这个定义我们之后再说),那么最好的判别器是什么呢?<span class="citation">Fisher (<a href="#ref-Fisher-1936" role="doc-biblioref">1936</a>)</span> 给出了一种方法:最大化类内散度和类间散度的比值。</p>
<p>首先定义了这样一个映射:</p>
<p><span class="math display">\[
\mathcal{X}\rightarrow\mathcal{Y}:\quad y_i=\boldsymbol{w}^T\boldsymbol{x}_i
\]</span></p>
<p>对 <span class="math inline">\(\boldsymbol{x}\)</span> 来说,定义类内散度:</p>
<p><span class="math display">\[
\boldsymbol{S}_i=\sum_{\boldsymbol{x}_j\in \mathcal{X}_i}(\boldsymbol{x}_j-\boldsymbol{m}_i)(\boldsymbol{x}_j-\boldsymbol{m}_i)^T,\quad i=1,2
\]</span></p>
<p>其中 <span class="math inline">\(\boldsymbol{m}_i\)</span> 是类 <span class="math inline">\(\mathcal{X}_i\)</span> 的均值向量,<span class="math inline">\(S_i\)</span> 表示类 <span class="math inline">\(\mathcal{X}_i\)</span> 的类内散度。</p>
<p>总类内散度:</p>
<p><span class="math display">\[
\boldsymbol{S}_w=\sum S_i,\quad i=1,2
\]</span></p>
<p>类间散度:</p>
<p><span class="math display">\[
\boldsymbol{S}_b = (\boldsymbol{m}_1-\boldsymbol{m}_2)(\boldsymbol{m}_1-\boldsymbol{m}_2)^T
\]</span></p>
<p>而对于 <span class="math inline">\(y=f(\boldsymbol{x})\)</span> 来说,均值是:</p>
<p><span class="math display">\[
\tilde{m}_i=\frac{1}{N_i}\sum_{y_j\in\mathcal{Y}_i} y_j,\quad i=1,2
\]</span></p>
<p>其中 <span class="math inline">\(N_i\)</span> 是类 <span class="math inline">\(\mathcal{Y}_i\)</span> 的样本数,<span class="math inline">\(\tilde{m}_i\)</span> 是类 <span class="math inline">\(\mathcal{Y}_i\)</span> 的均值。</p>
<p>类内散度:</p>
<p><span class="math display">\[
\tilde{S}_i=\sum_{y_j\in\mathcal{Y}_i}(y_j-\tilde{m}_i)^2,\quad i=1,2
\]</span></p>
<p>总类内散度:</p>
<p><span class="math display">\[
\tilde{S}_w=\sum\tilde{S}_i,\quad i=1,2
\]</span></p>
<p>类间散度:</p>
<p><span class="math display">\[
\tilde{S}_b=(\tilde{m}_1-\tilde{m}_2)^2
\]</span></p>
<p>那么,Fisher 的准则就是:</p>
<p><span class="math display">\[
\max_{\boldsymbol{w}} J_{F}(\boldsymbol{w})=\frac{\tilde{S}_b}{\tilde{S}_w}=\frac{\boldsymbol{w}^T\boldsymbol{S}_b\boldsymbol{w}}{\boldsymbol{w}^T\boldsymbol{S}_w\boldsymbol{w}}
\]</span></p>
<p>然后我们要解出 <span class="math inline">\(\boldsymbol{w}^\ast=\underset{\boldsymbol{w}}{\operatorname{arg\,max}}\,J_F(\boldsymbol{w})\)</span>,怎么办呢?一种办法是假设分母是常数,问题就变成了:</p>
<p><span class="math display">\[
\max \boldsymbol{w}^T\boldsymbol{S}_b\boldsymbol{w} \quad \text{s.t.}\,\boldsymbol{w}^T\boldsymbol{S}_w\boldsymbol{w}=c
\]</span></p>
<p>这时候我们可以使用 Lagrange 乘子法,得到:</p>
<p><span class="math display">\[
\mathcal{L}(\boldsymbol{w},\lambda)=\boldsymbol{w}^T\boldsymbol{S}_b\boldsymbol{w}-\lambda(\boldsymbol{w}^T\boldsymbol{S}_w\boldsymbol{w}-c)
\]</span></p>
<p>对 <span class="math inline">\(\boldsymbol{w}\)</span> 求导,得到:</p>
<p><span class="math display">\[
\frac{\partial\mathcal{L}}{\partial\boldsymbol{w}}=2\boldsymbol{S}_b\boldsymbol{w}-2\lambda\boldsymbol{S}_w\boldsymbol{w}
\]</span></p>
<p>令其为 <span class="math inline">\(0\)</span>,得到:</p>
<p><span class="math display">\[
\boldsymbol{S}_b\boldsymbol{w}^\ast=\lambda\boldsymbol{S}_w\boldsymbol{w}^\ast
\]</span></p>
<p>即:</p>
<p><span class="math display">\[
\boldsymbol{S}_w^{-1}\boldsymbol{S}_b\boldsymbol{w}^\ast=\lambda\boldsymbol{w}^\ast
\]</span></p>
<p>显然,<span class="math inline">\(\boldsymbol{w}^\ast\)</span> 是 <span class="math inline">\(\boldsymbol{S}_b^{-1}\boldsymbol{S}_w\)</span> 的特征向量,而 <span class="math inline">\(\lambda\)</span> 是特征值。</p>
<p>然后我们再将原来的 <span class="math inline">\(\boldsymbol{S}_{b}=(\boldsymbol{m}_1-\boldsymbol{m}_2)(\boldsymbol{m}_1-\boldsymbol{m}_2)^T\)</span> 代入,得到:</p>
<p><span class="math display">\[
\lambda\boldsymbol{w}^\ast=\boldsymbol{S}_w^{-1}(\boldsymbol{m}_1-\boldsymbol{m}_2)(\boldsymbol{m}_1-\boldsymbol{m}_2)^T\boldsymbol{w}^\ast
\]</span></p>
<p>因为 <span class="math inline">\((\boldsymbol{m}_1-\boldsymbol{m}_2)^T\boldsymbol{w}^\ast\)</span> 是标量,而我们的问题关注的是 <span class="math inline">\(\boldsymbol{w}^\ast\)</span> 所代表的方向,那么我们可以说</p>
<p><span class="math display">\[
\boldsymbol{w}^\ast\propto\boldsymbol{S}_w^{-1}(\boldsymbol{m}_1-\boldsymbol{m}_2)
\]</span></p>
<p>好了,<span class="math inline">\(\boldsymbol{w}^\ast\)</span> 确定好了,那 <span class="math inline">\(w_0\)</span> 怎么办呢?</p>
<p>一般来说,有很多种办法,比如:</p>
<p><span class="math display">\[
\begin{align}
w_0&=-\frac{1}{2}(\tilde{m}_1+\tilde{m}_2),\\
w_0&=-\frac{1}{2}(\tilde{m}_1+\tilde{m}_2)+\frac{1}{N_1+N_2-2}\ln\frac{P(\omega_1)}{P(\omega_2)},
\end{align}
\]</span></p>
<p>等等。</p>
</div>
<div id="感知机" class="section level2 hasAnchor" number="3.3">
<h2><span class="header-section-number">3.3</span> 感知机<a href="线性学习机.html#感知机" class="anchor-section" aria-label="Anchor link to header"></a></h2>
<p>所谓感知机,就是一个线性分类器,其输入是实例的特征向量,输出是实例的类别,取 <span class="math inline">\(+1\)</span> 和 <span class="math inline">\(-1\)</span> 两个值,即:</p>
<p><span class="math display">\[
f(\boldsymbol{x})=\operatorname{sgn}(\boldsymbol{w}^T\boldsymbol{x}+w_0)
\]</span></p>
<div id="感知机学习策略" class="section level3 hasAnchor" number="3.3.1">
<h3><span class="header-section-number">3.3.1</span> 感知机学习策略<a href="线性学习机.html#感知机学习策略" class="anchor-section" aria-label="Anchor link to header"></a></h3>
<p>感知机学习的策略是极小化损失函数 <span class="math inline">\(\nabla J(\boldsymbol{w})\)</span>,最早的时候还在使用基本梯度下降法:</p>
<ol style="list-style-type: decimal">
<li><span class="math inline">\(\boldsymbol{w}^{\{n+1\}}=\boldsymbol{w}^{\{n\}}-\eta\nabla J(\boldsymbol{w}^{\{n\}})\)</span>;</li>
<li>重复 1 直到收敛,<span class="math inline">\(\nabla J(\boldsymbol{w})<\text{threshold}\)</span>。</li>
</ol>
</div>
</div>
</div>
<h3>参考资料<a href="参考资料.html#参考资料" class="anchor-section" aria-label="Anchor link to header"></a></h3>
<div id="refs" class="references csl-bib-body hanging-indent">
<div id="ref-Fisher-1936" class="csl-entry">
Fisher, R. A. 1936. <span>“The Use of Multiple Measurements in Taxonomic Problems.”</span> <em>Annals of Eugenics</em> 7 (2): 179–88. <a href="https://doi.org/10.1111/j.1469-1809.1936.tb02137.x">https://doi.org/10.1111/j.1469-1809.1936.tb02137.x</a>.
</div>
</div>
</section>
</div>
</div>
</div>
<a href="错误.html" class="navigation navigation-prev " aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
<a href="参考资料.html" class="navigation navigation-next " aria-label="Next page"><i class="fa fa-angle-right"></i></a>
</div>
</div>
<script src="libs/gitbook-2.6.7/js/app.min.js"></script>
<script src="libs/gitbook-2.6.7/js/clipboard.min.js"></script>
<script src="libs/gitbook-2.6.7/js/plugin-search.js"></script>
<script src="libs/gitbook-2.6.7/js/plugin-sharing.js"></script>
<script src="libs/gitbook-2.6.7/js/plugin-fontsettings.js"></script>
<script src="libs/gitbook-2.6.7/js/plugin-bookdown.js"></script>
<script src="libs/gitbook-2.6.7/js/jquery.highlight.js"></script>
<script src="libs/gitbook-2.6.7/js/plugin-clipboard.js"></script>
<script>
gitbook.require(["gitbook"], function(gitbook) {
gitbook.start({
"sharing": {
"github": false,
"facebook": true,
"twitter": true,
"linkedin": false,
"weibo": false,
"instapaper": false,
"vk": false,
"whatsapp": false,
"all": ["facebook", "twitter", "linkedin", "weibo", "instapaper"]
},
"fontsettings": {
"theme": "white",
"family": "sans",
"size": 2
},
"edit": {
"link": null,
"text": null
},
"history": {
"link": null,
"text": null
},
"view": {
"link": null,
"text": null
},
"download": ["machine-learning-2022fall.pdf", "machine-learning-2022fall.epub"],
"search": {
"engine": "fuse",
"options": null
},
"toc": {
"collapse": "subsection"
}
});
});
</script>
<!-- dynamically load mathjax for compatibility with self-contained -->
<script>
(function () {
var script = document.createElement("script");
script.type = "text/javascript";
var src = "true";
if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
if (location.protocol !== "file:")
if (/^https?:/.test(src))
src = src.replace(/^https?:/, '');
script.src = src;
document.getElementsByTagName("head")[0].appendChild(script);
})();
</script>
</body>
</html>