chap4.html

<!DOCTYPE html>
<html lang="en">
<!-- Produced from a LaTeX source file.  Note that the production is done -->
<!-- by a very rough-and-ready (and buggy) script, so the HTML and other  -->
<!-- code is quite ugly!  Later versions should be better.                -->
    <meta charset="utf-8">
    <meta name="citation_title" content="ニューラルネットワークと深層学習">
    <meta name="citation_author" content="Nielsen, Michael A.">
    <meta name="citation_publication_date" content="2014">
    <meta name="citation_fulltext_html_url" content="http://neuralnetworksanddeeplearning.com">
    <meta name="citation_publisher" content="Determination Press">
    <link rel="icon" href="nnadl_favicon.ICO" />
    <title>ニューラルネットワークと深層学習</title>
    <script src="assets/jquery.min.js"></script>
    <script type="text/x-mathjax-config">
      MathJax.Hub.Config({
        tex2jax: {inlineMath: [['$','$']]},
        "HTML-CSS":
          {scale: 92},
        TeX: { equationNumbers: { autoNumber: "AMS" }}});
    </script>
    <script type="text/javascript" src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>


    <link href="assets/style.css" rel="stylesheet">
    <link href="assets/pygments.css" rel="stylesheet">

<style>
/* Adapted from */
/* https://groups.google.com/d/msg/mathjax-users/jqQxrmeG48o/oAaivLgLN90J, */
/* by David Cervone */

@font-face {
    font-family: 'MJX_Math';
    src: url('http://cdn.mathjax.org/mathjax/latest/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); /* IE9 Compat Modes */
    src: url('http://cdn.mathjax.org/mathjax/latest/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot?iefix') format('eot'),
    url('http://cdn.mathjax.org/mathjax/latest/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff')  format('woff'),
    url('http://cdn.mathjax.org/mathjax/latest/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf')  format('opentype'),
    url('http://cdn.mathjax.org/mathjax/latest/fonts/HTML-CSS/TeX/svg/MathJax_Math-Italic.svg#MathJax_Math-Italic') format('svg');
}

@font-face {
    font-family: 'MJX_Main';
    src: url('http://cdn.mathjax.org/mathjax/latest/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); /* IE9 Compat Modes */
    src: url('http://cdn.mathjax.org/mathjax/latest/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot?iefix') format('eot'),
    url('http://cdn.mathjax.org/mathjax/latest/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff')  format('woff'),
    url('http://cdn.mathjax.org/mathjax/latest/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf')  format('opentype'),
    url('http://cdn.mathjax.org/mathjax/latest/fonts/HTML-CSS/TeX/svg/MathJax_Main-Regular.svg#MathJax_Main-Regular') format('svg');
}
</style>

  </head>
  <body><div class="header"><h1 class="chapter_number">
  <a href="">CHAPTER 4</a></h1>
  <h1 class="chapter_title"><a href="">ニューラルネットワークが任意の関数を表現できることの視覚的証明</a></h1></div><div class="section"><div id="toc">
<p class="toc_title"><a href="index.html">ニューラルネットワークと深層学習</a></p><p class="toc_not_mainchapter"><a href="about.html">What this book is about</a></p><p class="toc_not_mainchapter"><a href="exercises_and_problems.html">On the exercises and problems</a></p><p class='toc_mainchapter'><a id="toc_using_neural_nets_to_recognize_handwritten_digits_reveal" class="toc_reveal" onMouseOver="this.style.borderBottom='1px solid #2A6EA6';" onMouseOut="this.style.borderBottom='0px';"><img id="toc_img_using_neural_nets_to_recognize_handwritten_digits" src="images/arrow.png" width="15px"></a><a href="chap1.html">ニューラルネットワークを用いた手書き文字認識</a><div id="toc_using_neural_nets_to_recognize_handwritten_digits" style="display: none;"><p class="toc_section"><ul><a href="chap1.html#perceptrons"><li>Perceptrons</li></a><a href="chap1.html#sigmoid_neurons"><li>Sigmoid neurons</li></a><a href="chap1.html#the_architecture_of_neural_networks"><li>The architecture of neural networks</li></a><a href="chap1.html#a_simple_network_to_classify_handwritten_digits"><li>A simple network to classify handwritten digits</li></a><a href="chap1.html#learning_with_gradient_descent"><li>Learning with gradient descent</li></a><a href="chap1.html#implementing_our_network_to_classify_digits"><li>Implementing our network to classify digits</li></a><a href="chap1.html#toward_deep_learning"><li>Toward deep learning</li></a></ul></p></div>
<script>
$('#toc_using_neural_nets_to_recognize_handwritten_digits_reveal').click(function() {
   var src = $('#toc_img_using_neural_nets_to_recognize_handwritten_digits').attr('src');
   if(src == 'images/arrow.png') {
     $("#toc_img_using_neural_nets_to_recognize_handwritten_digits").attr('src', 'images/arrow_down.png');
   } else {
     $("#toc_img_using_neural_nets_to_recognize_handwritten_digits").attr('src', 'images/arrow.png');
   };
   $('#toc_using_neural_nets_to_recognize_handwritten_digits').toggle('fast', function() {});
});</script><p class='toc_mainchapter'><a id="toc_how_the_backpropagation_algorithm_works_reveal" class="toc_reveal" onMouseOver="this.style.borderBottom='1px solid #2A6EA6';" onMouseOut="this.style.borderBottom='0px';"><img id="toc_img_how_the_backpropagation_algorithm_works" src="images/arrow.png" width="15px"></a><a href="chap2.html">逆伝播の仕組み</a><div id="toc_how_the_backpropagation_algorithm_works" style="display: none;"><p class="toc_section"><ul><a href="chap2.html#warm_up_a_fast_matrix-based_approach_to_computing_the_output_from_a_neural_network"><li>Warm up: a fast matrix-based approach to computing the output  from a neural network</li></a><a href="chap2.html#the_two_assumptions_we_need_about_the_cost_function"><li>The two assumptions we need about the cost function</li></a><a href="chap2.html#the_hadamard_product_$s_\odot_t$"><li>The Hadamard product, $s \odot t$</li></a><a href="chap2.html#the_four_fundamental_equations_behind_backpropagation"><li>The four fundamental equations behind backpropagation</li></a><a href="chap2.html#proof_of_the_four_fundamental_equations_(optional)"><li>Proof of the four fundamental equations (optional)</li></a><a href="chap2.html#the_backpropagation_algorithm"><li>The backpropagation algorithm</li></a><a href="chap2.html#the_code_for_backpropagation"><li>The code for backpropagation</li></a><a href="chap2.html#in_what_sense_is_backpropagation_a_fast_algorithm"><li>In what sense is backpropagation a fast algorithm?</li></a><a href="chap2.html#backpropagation_the_big_picture"><li>Backpropagation: the big picture</li></a></ul></p></div>
<script>
$('#toc_how_the_backpropagation_algorithm_works_reveal').click(function() {
   var src = $('#toc_img_how_the_backpropagation_algorithm_works').attr('src');
   if(src == 'images/arrow.png') {
     $("#toc_img_how_the_backpropagation_algorithm_works").attr('src', 'images/arrow_down.png');
   } else {
     $("#toc_img_how_the_backpropagation_algorithm_works").attr('src', 'images/arrow.png');
   };
   $('#toc_how_the_backpropagation_algorithm_works').toggle('fast', function() {});
});</script><p class='toc_mainchapter'><a id="toc_improving_the_way_neural_networks_learn_reveal" class="toc_reveal" onMouseOver="this.style.borderBottom='1px solid #2A6EA6';" onMouseOut="this.style.borderBottom='0px';"><img id="toc_img_improving_the_way_neural_networks_learn" src="images/arrow.png" width="15px"></a><a href="chap3.html">ニューラルネットワークの学習の改善</a><div id="toc_improving_the_way_neural_networks_learn" style="display: none;"><p class="toc_section"><ul><a href="chap3.html#the_cross-entropy_cost_function"><li>The cross-entropy cost function</li></a><a href="chap3.html#overfitting_and_regularization"><li>Overfitting and regularization</li></a><a href="chap3.html#weight_initialization"><li>Weight initialization</li></a><a href="chap3.html#handwriting_recognition_revisited_the_code"><li>Handwriting recognition revisited: the code</li></a><a href="chap3.html#how_to_choose_a_neural_network's_hyper-parameters"><li>How to choose a neural network's hyper-parameters?</li></a><a href="chap3.html#other_techniques"><li>Other techniques</li></a></ul></p></div>
<script>
$('#toc_improving_the_way_neural_networks_learn_reveal').click(function() {
   var src = $('#toc_img_improving_the_way_neural_networks_learn').attr('src');
   if(src == 'images/arrow.png') {
     $("#toc_img_improving_the_way_neural_networks_learn").attr('src', 'images/arrow_down.png');
   } else {
     $("#toc_img_improving_the_way_neural_networks_learn").attr('src', 'images/arrow.png');
   };
   $('#toc_improving_the_way_neural_networks_learn').toggle('fast', function() {});
});</script><p class='toc_mainchapter'><a id="toc_a_visual_proof_that_neural_nets_can_compute_any_function_reveal" class="toc_reveal" onMouseOver="this.style.borderBottom='1px solid #2A6EA6';" onMouseOut="this.style.borderBottom='0px';"><img id="toc_img_a_visual_proof_that_neural_nets_can_compute_any_function" src="images/arrow.png" width="15px"></a><a href="chap4.html">ニューラルネットワークが任意の関数を表現できることの視覚的証明</a><div id="toc_a_visual_proof_that_neural_nets_can_compute_any_function" style="display: none;"><p class="toc_section"><ul><a href="chap4.html#two_caveats"><li>Two caveats</li></a><a href="chap4.html#universality_with_one_input_and_one_output"><li>Universality with one input and one output</li></a><a href="chap4.html#many_input_variables"><li>Many input variables</li></a><a href="chap4.html#extension_beyond_sigmoid_neurons"><li>Extension beyond sigmoid neurons</li></a><a href="chap4.html#fixing_up_the_step_functions"><li>Fixing up the step functions</li></a><a href="chap4.html#conclusion"><li>Conclusion</li></a></ul></p></div>
<script>
$('#toc_a_visual_proof_that_neural_nets_can_compute_any_function_reveal').click(function() {
   var src = $('#toc_img_a_visual_proof_that_neural_nets_can_compute_any_function').attr('src');
   if(src == 'images/arrow.png') {
     $("#toc_img_a_visual_proof_that_neural_nets_can_compute_any_function").attr('src', 'images/arrow_down.png');
   } else {
     $("#toc_img_a_visual_proof_that_neural_nets_can_compute_any_function").attr('src', 'images/arrow.png');
   };
   $('#toc_a_visual_proof_that_neural_nets_can_compute_any_function').toggle('fast', function() {});
});</script><p class='toc_mainchapter'><a id="toc_why_are_deep_neural_networks_hard_to_train_reveal" class="toc_reveal" onMouseOver="this.style.borderBottom='1px solid #2A6EA6';" onMouseOut="this.style.borderBottom='0px';"><img id="toc_img_why_are_deep_neural_networks_hard_to_train" src="images/arrow.png" width="15px"></a><a href="chap5.html">ニューラルネットワークを訓練するのはなぜ難しいのか</a><div id="toc_why_are_deep_neural_networks_hard_to_train" style="display: none;"><p class="toc_section"><ul><a href="chap5.html#the_vanishing_gradient_problem"><li>The vanishing gradient problem</li></a><a href="chap5.html#what's_causing_the_vanishing_gradient_problem_unstable_gradients_in_deep_neural_nets"><li>What's causing the vanishing gradient problem?  Unstable gradients in deep neural nets</li></a><a href="chap5.html#unstable_gradients_in_more_complex_networks"><li>Unstable gradients in more complex networks</li></a><a href="chap5.html#other_obstacles_to_deep_learning"><li>Other obstacles to deep learning</li></a></ul></p></div>
<script>
$('#toc_why_are_deep_neural_networks_hard_to_train_reveal').click(function() {
   var src = $('#toc_img_why_are_deep_neural_networks_hard_to_train').attr('src');
   if(src == 'images/arrow.png') {
     $("#toc_img_why_are_deep_neural_networks_hard_to_train").attr('src', 'images/arrow_down.png');
   } else {
     $("#toc_img_why_are_deep_neural_networks_hard_to_train").attr('src', 'images/arrow.png');
   };
   $('#toc_why_are_deep_neural_networks_hard_to_train').toggle('fast', function() {});
});</script><p class='toc_mainchapter'><a id="toc_deep_learning_reveal" class="toc_reveal" onMouseOver="this.style.borderBottom='1px solid #2A6EA6';" onMouseOut="this.style.borderBottom='0px';"><img id="toc_img_deep_learning" src="images/arrow.png" width="15px"></a>Deep learning<div id="toc_deep_learning" style="display: none;"><p class="toc_section"><ul><li>Convolutional neural networks</li><li>Pretraining</li><li>Recurrent neural networks, Boltzmann machines, and other  models</li><li>Is there a universal thinking algorithm?</li><li>On the future of neural networks</li></ul></p></div>
<script>
$('#toc_deep_learning_reveal').click(function() {
   var src = $('#toc_img_deep_learning').attr('src');
   if(src == 'images/arrow.png') {
     $("#toc_img_deep_learning").attr('src', 'images/arrow_down.png');
   } else {
     $("#toc_img_deep_learning").attr('src', 'images/arrow.png');
   };
   $('#toc_deep_learning').toggle('fast', function() {});
});</script><p class="toc_not_mainchapter"><a href="acknowledgements.html">Acknowledgements</a></p><p class="toc_not_mainchapter"><a href="faq.html">Frequently Asked Questions</a></p>
<hr>
<span class="sidebar_title">Sponsors</span>
<br/>
<a href='http://www.ersatz1.com/'><img src='assets/ersatz.png' width='140px' style="padding: 0px 0px 10px 8px; border-style: none;"></a>

<a href='http://gsquaredcapital.com/'><img src='assets/gsquared.png' width='150px' style="padding: 0px 0px 10px 10px; border-style: none;"></a>

<a href='http://www.tineye.com'><img src='assets/tineye.png' width='150px'
style="padding: 0px 0px 10px 8px; border-style: none;"></a>

<a href='http://www.visionsmarts.com'><img
src='assets/visionsmarts.png' width='160px' style="padding: 0px 0px
0px 0px; border-style: none;"></a> <br/>


<!--
<p class="sidebar">Thanks to all the <a
href="supporters.html">supporters</a> who made the book possible.
Thanks also to all the contributors to the <a
href="bugfinder.html">Bugfinder Hall of Fame</a>.  </p>

<p class="sidebar">The book is currently a beta release, and is still
under active development.  Please send error reports to
mn@michaelnielsen.org.  For other enquiries, please see the <a
href="faq.html">FAQ</a> first.</p>
-->

<p class="sidebar">著者と共にこの本を作り出してくださった<a
href="supporters.html">サポーター</a>の皆様に感謝いたします。
また、<a
        href="bugfinder.html">バグ発見者の殿堂</a>に名を連ねる皆様にも感謝いたします。
また、日本語版の出版にあたっては、<a
href="translators.html">翻訳者</a>の皆様に深く感謝いたします。

</p>


<p class="sidebar">この本は目下のところベータ版で、開発続行中です。
エラーレポートは mn@michaelnielsen.org まで、日本語版に関する質問は muranushi@gmail.com までお送りください。
その他の質問については、まずは<a
href="faq.html">FAQ</a>をごらんください。</p>


<hr>
<span class="sidebar_title">Resources</span>

<p class="sidebar">
<a href="https://github.com/mnielsen/neural-networks-and-deep-learning">Code repository</a></p>

<p class="sidebar">
<a href="http://eepurl.com/BYr9L">Mailing list for book announcements</a>
</p>

<p class="sidebar">
<a href="http://eepurl.com/0Xxjb">Michael Nielsen's project announcement mailing list</a>
</p>

<hr>
<a href="http://michaelnielsen.org"><img src="assets/Michael_Nielsen_Web_Small.jpg" width="160px" style="border-style: none;"/></a>

<p class="sidebar">
  著：<a href="http://michaelnielsen.org">Michael Nielsen</a> / 2014年9月-12月 <br >  訳：<a href="https://github.com/nnadl-ja/nnadl_site_ja">「ニューラルネットワークと深層学習」翻訳プロジェクト</a>
</p>
</div>
</p>
<p>
<!--One of the most striking facts about neural networks is that they can
compute any function at all.  That is, suppose someone hands you some
complicated, wiggly function, $f(x)$:-->
ニューラルネットワークに関して最も衝撃的な事実の1つは任意の関数を表現できることです。
例えば誰かから複雑で波打った関数$f(x)$を与えられたとします：
</p>
<p><center><canvas id="function" width="300" height="300"></canvas></center></p>
<p>
<!--<a id="basic_network_precursor"></a> No matter what the
function, there is guaranteed to be a neural network so that for every
possible input, $x$, the value $f(x)$ (or some close approximation) is
output from the network, e.g.:-->
<a id="basic_network_precursor"></a> それがどんな関数であっても、考えられるすべての入力$x$に対して、
出力値が$f(x)$（もしくはその近似）であるニューラルネットワークが存在します。例えば下図のようなものです。
</p>
<p><center><canvas id="basic_network" width="350" height="220"></canvas></center></p>
<p>
<!--This result holds even if the function has many inputs, $f = f(x_1,
\ldots, x_m)$, and many outputs.  For instance, here's a network
computing a function with $m = 3$ inputs and $n = 2$ outputs:-->
この結果は入力が複数の関数$f = f(x_1, \ldots, x_m)$や出力が複数の関数でも成立します。
例えば、下図は$m=3$個の入力と$n=2$個の出力を持つ関数を計算するニューラルネットワークです：
</p>
<p><center><canvas id="vector_valued_network" width="450" height="370"></canvas></center></p>
<p>
<!--This result tells us that neural networks have a kind of
<em>universality</em>.  No matter what function we want to compute, we
know that there is a neural network which can do the job.-->
この結果はニューラルネットワークが一種の<em>普遍性</em>を持っている事を示しています。
計算したい関数が何であろうとも、その計算を行えるニューラルネットワークが存在することがわかっているのです。
</p>
<p>
<!--What's more, this universality theorem holds even if we restrict our
networks to have just a single layer intermediate between the input
and the output neurons - a so-called single hidden layer.  So even
very simple network architectures can be extremely powerful.-->
しかも、この普遍性定理は入力層と出力層の間の中間層、
いわゆる隠れ層をたった1層しか持たないニューラルネットワークに限っても成立しています。
つまり、極めて単純なネットワーク構成でも表現力は極めて高いのです。
</p>
<p>
<!--The universality theorem is well known by people who use neural
networks.  But why it's true is not so widely understood.  Most of the
explanations available are quite technical.  For instance, one of the
original papers proving the result*-->
普遍性定理はニューラルネットワークを扱う人々の間ではよく知られています。
しかし、なぜそれが正しいのかはそれほど広くは理解されていません。
よく見られる説明の多くは極めてテクニカルです。
例えば、この結果を証明している原著論文*
<!--<span class="marginnote">
*<a href="http://www.dartmouth.edu/&#126;gvc/Cybenko_MCSS.pdf">Approximation by superpositions of a sigmoidal function</a>, by George Cybenko (1989).
  The result was very much in the air at the time, and
  several groups proved closely related results.
  Cybenko's paper contains a useful discussion of much of that work.
  Another important early paper is
  <a href="http://www.sciencedirect.com/science/article/pii/0893608089900208">Multilayer feedforward networks are universal approximators</a>, by Kurt Hornik, Maxwell Stinchcombe, and Halbert White (1989).
  This paper uses the Stone-Weierstrass theorem to arrive at similar results.</span>-->
<span class="marginnote">
*<a href="http://www.dartmouth.edu/&#126;gvc/Cybenko_MCSS.pdf">Approximation by superpositions of a sigmoidal function</a>, George Cybenko (1989).
  この結果は広く知られ、他にもいくつかのグループが関連した結果を証明しました。
  Cybenkoの論文ではその仕事に関して多くの有益な議論がなされています。
  初期の論文でその他の重要なものとして、
  <a href="http://www.sciencedirect.com/science/article/pii/0893608089900208">Multilayer feedforward networks are universal approximators</a>, Kurt Hornik, Maxwell Stinchcombe, Halbert White (1989)があります。
  この論文はストーン・ワイエルシュトラスの定理を用いて類似の結果を得ています。</span>
<!--did so using the Hahn-Banach theorem, the Riesz Representation theorem, and some Fourier analysis.-->
では、ハーン・バナッハの定理とリースの表現定理とフーリエ解析を利用しています。
<!--If you're a mathematician the argument is not
difficult to follow, but it's not so easy for most people.  That's a
pity, since the underlying reasons for universality are simple and
beautiful.-->
この論文の議論を追うのは数学者にとっては難しくないかもしれませんが、大多数の人にとっては簡単ではありません。
これは悲しいことです。なぜなら、ニューラルネットワークの普遍性を成り立たせている原因は、本当はシンプルで美しいものだからです。
</p>
<p>
<!--In this chapter I give a simple and mostly visual explanation of the universality theorem.
We'll go step by step through the underlying ideas. You'll understand why it's true that neural networks can compute any function.  You'll understand some of the limitations of
the result.  And you'll understand how the result relates to deep
neural networks.-->
本章では、普遍性定理のシンプルで大部分が視覚的な説明を行います。
背景にあるアイデア達を1つずつ順を追って見ていきます。
なぜニューラルネットワークが任意の関数を表現できるのかの理由、結果にある種の制限がついている事、
そしてこの結果と深いニューラルネットとの関連が理解できるようになるはずです。
</p>
<p>
<!--To follow the material in the chapter, you do not need to have read
earlier chapters in this book.  Instead, the chapter is structured to
be enjoyable as a self-contained essay.  Provided you have just a
little basic familiarity with neural networks, you should be able to
follow the explanation.  I will, however, provide occasional links to
earlier material, to help fill in any gaps in your knowledge.-->
本章を読むのにこの本のこれ以前の章を読む必要はありません。
その代わりに自己完結的なエッセイとして楽しめるよう構成されています。
ニューラルネットワークについて少し慣れていれば、説明を追えるはずです。
ただ、知識のギャップを埋めるのに役立つよう、以前の章へのリンクは必要に応じて示すつもりです。
</p>
<p></p>
<p></p>
<p></p>
<p>
<!--Universality theorems are a commonplace in computer science, so much
so that we sometimes forget how astonishing they are.  But it's worth
reminding ourselves: the ability to compute an arbitrary function is
truly remarkable.  Almost any process you can imagine can be thought
of as function computation.  Consider the problem of naming a piece of
music based on a short sample of the piece.  That can be thought of as
computing a function.  Or consider the problem of translating a
Chinese text into English.  Again, that can be thought of as computing
a function-->
普遍性に関する定理はコンピュータ科学では珍しくなく、それらが驚くべき定理である事をしばしば忘れてしまいます。
しかし、任意の関数を計算できるのは本当に著しい性質であることは今一度思い起こす価値のあることです。
あなたが思いつく処理は大抵どれも関数の計算と思うことができます。
例えば、音楽の短いサンプルだけを聞いて曲名を当てる問題を考えてみてください。
これも、一種の関数の構成であると考える事ができます。
または、中国語の文章を英語に翻訳する問題を考えてみてください。
やはり、これも関数の構成だと考える事ができます*<span class="marginnote">
<!--
2015/1/14 Kenta OONO
computing a functionは、givenな関数が与えられてそれの出力値を計算しているというよりは、
関数そのものがどんな形をしているかを決定するという雰囲気があるように思ったので、
ここでは関数を構成していると訳している
-->
<!--*<span class="marginnote">
*Actually, computing one of many functions, since
  there are often many acceptable translations of a given piece of
  text.</span>.-->
実際には1つの文章にはたくさんの妥当な訳し方があるので、多くの考えられる関数のうちの1つを構成している事になります。</span>。
<!--Or consider the problem of taking an mp4 movie file and
generating a description of the plot of the movie, and a discussion of
the quality of the acting.  Again, that can be thought of as a kind of
function computation-->
もしくは、mp4形式の映画ファイルからその映画の物語のプロットを作成する問題を考えてみてください。
これも、一種の関数構成と考える事ができます*<span class="marginnote">
<!--
*<span class="marginnote">
*Ditto the remark about translation and
  there being many possible functions.</span>. -->
先程の翻訳の場合と同様ですが、妥当な関数として様々なものが考えられます。</span>。
<!--Universality means that, in principle, neural networks can do all these things and many more.-->
普遍性定理は、ニューラルネットワークがこれらやそれ以外の様々な処理を原理的にはできることを示しています。
</p>
<p>
<!--Of course, just because we know a neural network exists that can (say)
translate Chinese text into English, that doesn't mean we have good
techniques for constructing or even recognizing such a network.  This
limitation applies also to traditional universality theorems for
models such as Boolean circuits.  But, as we've seen earlier in the
book, neural networks have powerful algorithms for learning functions.
That combination of learning algorithms + universality is an
attractive mix.  Up to now, the book has focused on the learning
algorithms.  In this chapter, we focus on universality, and what it
means.-->
もちろん、例えば中国語の文章を英語に翻訳するニューラルネットワークの存在が分かることは、
そのようなニューラルネットワークを構成したり、ネットワークがその性質を持つか判定する良い方法がわかることを意味しません。
ブーリアン回路のようなモデルに対する古典的な普遍性定理にもこの制限は適用されます。
しかし、この本の前の章で見てきたように、ニューラルネットワークには関数を学習する強力なアルゴリズムがあります。
学習アルゴリズムと普遍性の組み合わせは魅力的です。
ここまで、この本では学習アルゴリズムに焦点を置いてきました。本章では、普遍性とそれが意味する所に焦点を置きます。
</p>
<p><h3><a name="two_caveats"></a><a href="#two_caveats">2つの注意点</a></h3></p>
<p>
<!--Before explaining why the universality theorem is true, I want to
mention two caveats to the informal statement "a neural network can
compute any function".-->
普遍性定理が何故正しいかを説明する前に、「ニューラルネットワークが任意の関数を計算できる」という砕けた表現について2つの注意点を挙げたいと思います。
</p>
<p>
<!--First, this doesn't mean that a network can be used to <em>exactly</em>
compute any function. Rather, we can get an <em>approximation</em>
that is as good as we want.  By increasing the number of hidden
neurons we can improve the approximation.  For instance,
<a href="#basic_network_precursor">earlier</a> I illustrated a network
computing some function $f(x)$ using three hidden neurons.  For most
functions only a low-quality approximation will be possible using
three hidden neurons.  By increasing the number of hidden neurons
(say, to five) we can typically get a better approximation:</p>-->
まず、この表現はニューラルネットワークは任意の関数を<em>完全に</em>計算できる事を意味するのではありません。
そうではなく、好きなだけ近い<em>近似</em>関数を得られるという意味です。
隠れ層のニューロンを増やすことで、近似の精度を上げることができます。
例えば、<a href="#basic_network_precursor">前に</a>ある関数$f(x)$を3つの隠れニューロンを用いて計算するニューラルネットワークを説明しました。
大抵の関数については3個の隠れニューロンでは、精度の低い近似しか得られません。
隠れニューロンの数を（例えば5個に）増やす事で、より良い近似が得られます。
<p><center><canvas id="bigger_network" width="350" height="380"></canvas></center></p>
<p>
<!--And we can do still better by further increasing the number of hidden neurons. -->
そして、隠れニューロンをさらに増やす事で、さらに近似を良くできます。
</p>
<p>
<!--To make this statement more precise, suppose we're given a function
$f(x)$ which we'd like to compute to within some desired accuracy
$\epsilon > 0$.  The guarantee is that by using enough hidden neurons
we can always find a neural network whose output $g(x)$ satisfies
$|g(x) - f(x)| < \epsilon$, for all inputs $x$.  In other words, the
approximation will be good to within the desired accuracy for every
possible input.-->
ステートメントをより正確にする為に、私達が計算したい関数$f(x)$と希望の精度$\epsilon > 0$が与えられたとします。
十分な数の隠れニューロンを用いる事で、出力$g(x)$が$|g(x) - f(x) < \epsilon|$を任意の入力$x$に対して満たすニューラルネットワークを常に見つけられることを普遍性定理は保証しています。
言い換えれば、希望の精度の範囲内で考えられるすべての入力に対して良い近似であることを示しているのです。
</p>
<p>
<!--The second caveat is that the class of functions which can be
approximated in the way described are the <em>continuous</em> functions.
If a function is discontinuous, i.e., makes sudden, sharp jumps, then
it won't in general be possible to approximate using a neural net.
This is not surprising, since our neural networks compute continuous
functions of their input.  However, even if the function we'd really
like to compute is discontinuous, it's often the case that a
continuous approximation is good enough.  If that's so, then we can
use a neural network.  In practice, this is not usually an important
limitation.-->
2つ目の注意点は、この方法で近似できる関数のクラスは<em>連続</em>関数であるという点です。
もし関数が不連続、すなわち急激なジャンプが突然発生する場合、ニューラルネットワークを用いた近似は一般的には不可能です。
これは驚くべき事ではありません。というのも、私達のニューラルネットワークが計算できるのは入力に対して連続な関数だからです。
しかし、計算したい関数が不連続でも、連続関数による近似で十分な場合もあります。その場合には、ニューラルネットを利用できます。
通常はこの制限は重大なものではありません。
</p>
<p>
<!--Summing up, a more precise statement of the universality theorem is
that neural networks with a single hidden layer can be used to
approximate any continuous function to any desired precision.  In this
chapter we'll actually prove a slightly weaker version of this result,
using two hidden layers instead of one.  In the problems I'll briefly
outline how the explanation can, with a few tweaks, be adapted to give
a proof which uses only a single hidden layer.-->
まとめると、普遍性定理のより正確なステートメントは、「隠れ層を1つ持つニューラルネットワークを用いて任意の連続関数を任意の精度で近似できる」となります。
本章では隠れ層が1層ではなく2層の場合のもう少し弱いバージョンの定理を証明します。
少し証明をひねる事で本章での証明を隠れ層が1層しかない場合に適用する方法を、演習問題において簡単に説明します。
</p>
<h3>
<!--<a name="universality_with_one_input_and_one_output"></a><a href="#universality_with_one_input_and_one_output">Universality with one input and one output</a>-->
<a name="universality_with_one_input_and_one_output"></a><a href="#universality_with_one_input_and_one_output">入出力が1つの場合の普遍性定理</a>
</h3>
<p>
<!--To understand why the universality theorem is true, let's start by
understanding how to construct a neural network which approximates a
function with just one input and one output:-->
普遍性定理の正しさを理解するために、まずは1つの入力と1つの出力を持つ関数を近似するニューラルネットワークの構成方法を理解する所から始めましょう：
</p>
<p><center><canvas id="function_2" width="300" height="300"></canvas></center></p>
<p>
<!--It turns out that this is the core of the problem of universality.
Once we've understood this special case it's actually pretty easy to
extend to functions with many inputs and many outputs.-->
この場合が普遍性の問題の中核をなします。
この特別な場合を理解すれば、入出力が多数の場合に拡張するのは容易です。
</p>
<p>
<!--To build insight into how to construct a network to compute $f$, let's
start with a network containing just a single hidden layer, with two
hidden neurons, and an output layer containing a single output neuron:-->
$f$を計算するニューラルネットワークの構成方法について直感を養うために、
隠れニューロンを2つ持つ隠れ層を1層持ち、出力層がニューロンを1つ持つ場合を考えます。
</p>
<p><center><canvas id="two_hidden_neurons" width="350" height="220"></canvas></center></p>
<p>
<!--To get a feel for how components in the network work, let's focus on
the top hidden neuron.  In the diagram below, click on the weight,
$w$, and drag the mouse a little ways to the right to increase $w$.
You can immediately see how the function computed by the top hidden
neuron changes:-->
ニューラルネットワークを構成する各要素の挙動について感覚をつかむ為に、上の隠れニューロンに注目してみましょう。
下図で重み$w$の値をクリックしマウスを右に少しドラッグすると$w$の値が増加します。
それに応じて上の隠れニューロンの出力関数の変化する様子がわかります。
</p>
<p><center><canvas id="basic_manipulation" width="600" height="285"></canvas></center></p>
<p>
<!--As we learnt <a href="chap1.html#sigmoid_neurons">earlier in the book</a>,
what's being computed by the hidden neuron is $\sigma(wx + b)$, where
$\sigma(z) \equiv 1/(1+e^{-z})$ is the sigmoid function.  Up to now,
we've made frequent use of this algebraic form.  But for the proof of
universality we will obtain more insight by ignoring the algebra
entirely, and instead manipulating and observing the shape shown in
the graph.
This won't just give us a better feel for what's going on,
it will also give us a proof*<span class="marginnote">-->
<a href="chap1.html#sigmoid_neurons">この本の前の方</a>で学習したように、隠れニューロンで計算しているのは$\sigma(wx + b)$です。
ここで$\sigma(z) \equiv 1/(1+e^{-z})$はシグモイド関数です。
これまで、このような数式による表現を頻繁に使用してきました。
しかし、普遍性定理の証明においてはこの計算式を完全に忘れて、グラフの形を操作・観察する方がより洞察を得ることができます。
このようにすることで、単に何が起こっているかを感覚的に掴めるだけではなく、シグモイド関数以外の活性化関数に適用する普遍性定理の証明
*<span class="marginnote">
<!--
*Strictly speaking, the visual
  approach I'm taking isn't what's traditionally thought of as a
  proof.  But I believe the visual approach gives more insight into
  why the result is true than a traditional proof.  And, of course,
  that kind of insight is the real purpose behind a proof.
  Occasionally, there will be small gaps in the reasoning I present:
  places where I make a visual argument that is plausible, but not
  quite rigorous.  If this bothers you, then consider it a challenge
  to fill in the missing steps.  But don't lose sight of the real
  purpose: to understand why the universality theorem is true.</span> of
universality that applies to activation functions other than the
sigmoid function.-->
*厳密に言えば、私が取る視覚的なアプローチは伝統的には証明と考えられているものではありません。
しかし、視覚的なアプローチはなぜこの結果が正しいのかについて、伝統的な証明よりもより多くの洞察が得られると信じています。
そしてもちろん、この種の洞察こそが証明の背後にある真の意図なのです。
私が示す推論の中にはいくつか小さなギャップが存在します：つまり視覚的な議論を行っていて妥当だけれど厳密ではない部分です。
もしこのことが気になるようでしたら、欠けている行間を埋める事に挑戦してみてください。
しかし、なぜ普遍性定理が正しいのかを理解するという真の意図を見失わないようにしてください。
</span>も得られます。
</p>
<p>
<!--To get started on this proof, try clicking on the bias, $b$, in the
diagram above, and dragging to the right to increase it.  You'll see
that as the bias increases the graph moves to the left, but its shape
doesn't change.-->
証明を始める前に、上図のバイアス$b$をクリックして右にドラッグすることで値を増加させてみてください。
バイアスが大きくなるに従いグラフは左に移動しますが、形は変化しないことがわかります。
</p>
<p>
<!--Next, click and drag to the left in order to decrease the bias.
You'll see that as the bias decreases the graph moves to the right,
but, again, its shape doesn't change.-->
次にクリック・左ドラッグをしてバイアスを減らしてみてください。
バイアスが減るにつれてグラフが右に移動しますが、やはり形は変化しない事がわかります。
</p>
<p>
<!--Next, decrease the weight to around $2$ or $3$.  You'll see that as
you decrease the weight, the curve broadens out.  You might need to
change the bias as well, in order to keep the curve in-frame.-->
次に、重みを$2$か$3$程度まで減らしてみてください。重みを減らすにつれて、曲線が広がっていくのがわかります。
曲線をフレーム内に収めるために、バイアスも変える必要があるかもしれません。
</p>
<p>
<!--Finally, increase the weight up past $w = 100$.  As you do, the curve
gets steeper, until eventually it begins to look like a step function.
Try to adjust the bias so the step occurs near $x = 0.3$.  The
following short clip shows what your result should look like.  Click
on the play button to play (or replay) the video:-->
最後に重みを$w = 100$過ぎまで増やしてみてください。増加させるにつれて曲線の勾配がきつくなり、最終的にステップ関数のような形になります。
段差が$x = 0.3$あたりに来るようにバイアスを調節してみてください。
下のクリップは想定する挙動を示しています。再生ボタンを押すとビデオが再生（もしくはリプレイ）されます。
</p>
<p><!-- Based on http://worrydream.com/ScrubbingCalculator/, with minor changes -->
      <script type="text/javascript">
    	function playVideo (name) {
    		var div = $("#"+name)[0];
    		div.style.backgroundColor = "transparent";
    		div.style.cursor = "default";
    		div.getElementsByTagName("img")[0].style.display = "none";
    		var video = $("#v" + name)[0];
    		video.play();
    	}
    	function videoEnded (name) {
    		var div = document.getElementById(name);
    		div.getElementsByTagName("img")[0].style.display = "block";
	        div.style.backgroundColor = "white";
	        div.style.opacity = 0.6;
    		div.style.cursor = "pointer";
    	}
      </script>
      <div>
	<div id="a" class="videoOverlay"
	     style="width: 560px; height: 280px; opacity: 0.8"
	     onclick="playVideo('a');">
	  <img style="left: 210px; top: 75px;"
	       src="images/play.png" width="128px">
	</div>
	  <video id="va" width="560" height="280" preload
		 onended="videoEnded('a');">
	    <source type="video/mp4"
		    src="movies/create_step_function.mp4">
	    <source type="video/webm"
		    src="movies/create_step_function.webm"></p>
<p>	  </video>
</div></p>
<p>
<!--We can simplify our analysis quite a bit by increasing the weight so
much that the output really is a step function, to a very good
approximation.  Below I've plotted the output from the top hidden
neuron when the weight is $w = 999$.  Note that this plot is static,
and you can't change parameters such as the weight.-->
重みを増加させ、出力をステップ関数に十分近づけることで、解析を著しく単純にすることができます。
下では、重みが$w = 999$の時の上の隠れニューロンの出力を図示しています。
この図は静的で、重みなどのパラメータを変化できない事に注意してください。
</p>
<p><img src="images/high_weight_function.jpg"></p>
<p>
<!--It's actually quite a bit easier to work with step functions than
general sigmoid functions.  The reason is that in the output layer we
add up contributions from all the hidden neurons.  It's easy to
analyse the sum of a bunch of step functions, but rather more
difficult to reason about what happens when you add up a bunch of
sigmoid shaped curves.  And so it makes things much easier to assume
that our hidden neurons are outputting step functions.  More
concretely, we do this by fixing the weight $w$ to be some very large
value, and then setting the position of the step by modifying the
bias.  Of course, treating the output as a step function is an
approximation, but it's a very good approximation, and for now we'll
treat it as exact.  I'll come back later to discuss the impact of
deviations from this approximation.-->
一般のシグモイド関数に比べてステップ関数で考える方が簡単です。
その理由は出力層がすべての隠れニューロンからの寄与を足しあわせるからです。
ステップ関数達の和を解析するのは簡単ですが、シグモイドの形をした曲線達を足しあわせた時に何が起こるのかを解析するのはそれに比べるとずっと難しいです。
ですので、隠れニューロンがステップ関数を出力していると仮定することでずっと簡単になります。
具体的には、重みを適当なとても大きな値に固定した後にバイアスを変化させて段差の位置を調整する事でこれを実現できます。
もちろん出力をステップ関数として扱うのは近似です。
しかしこれは十分良い近似になっているので、しばらくは厳密にステップ関数であるとして扱います。
後でこの部分に戻ってきて、この近似によるずれの影響を議論します。
</p>
<p>
<!--At what value of $x$ does the step occur?  Put another way, how does
the position of the step depend upon the weight and bias?-->
ステップがあるのは$x$の値でいえばどこでしょうか？
言い換えると、段差の位置は重みや階段にどのように依存するでしょうか？
</p>
<p>
<!--To answer this question, try modifying the weight and bias in the
diagram above (you may need to scroll back a bit).  Can you figure out
how the position of the step depends on $w$ and $b$?  With a little
work you should be able to convince yourself that the position of the
step is <em>proportional</em> to $b$, and <em>inversely proportional</em>
to $w$.-->
この答えに答えるために、上図の重みやバイアスを変化させてみてください（少しスクロールする必要があるかもしれません）。
段差の位置が$w$や$b$にどのように依存するかがわかりますか。
少し試してみると、段差の位置は$b$に<em>比例<em>し、$w$に<em>反比例</em>している事がわかると思います。
</p>
<p>
<!--In fact, the step is at position $s = -b/w$, as you can see by
modifying the weight and bias in the following diagram:-->
下図の重みとバイアスを変化させるとわかりますが、実は段差は$s = -b/w$の部分に生じます：
</p>
<p><canvas id="step" width="600" height="285"></canvas></p>
<p>
<!--It will greatly simplify our lives to describe hidden neurons using
just a single parameter, $s$, which is the step position, $s = -b/w$.
Try modifying $s$ in the following diagram, in order to get used to
the new parameterization:-->
隠れニューロンを記述するのに段差の位置を示すパラメータ$s = -b/w$を用いると、解析を著しく単純にできます。
新しいパラメータ付けに慣れるために、下図の$s$を変化させてみてください。
</p>
<p><canvas id="step_parameterization" width="600" height="285"></canvas></p>
<p>
<!--As noted above, we've implicitly set the weight $w$ on the input to be
some large value - big enough that the step function is a very good
approximation.  We can easily convert a neuron parameterized in this
way back into the conventional model, by choosing the bias $b = -w s$.-->
前述したように、入力の重み$w$を十分大きく取り、ステップ関数が良い近似になっていることを我々は暗黙のうちに仮定しています。
バイアスを$b = -w s$と選ぶことで、1つのパラメータ$s$で特徴づけられたニューロンを前のモデルに戻す事ができます。
</p>
<p>
<!--Up to now we've been focusing on the output from just the top hidden
neuron.  Let's take a look at the behavior of the entire network.  In
particular, we'll suppose the hidden neurons are computing step
functions parameterized by step points $s_1$ (top neuron) and $s_2$
(bottom neuron).  And they'll have respective output weights $w_1$ and
$w_2$.  Here's the network:-->
これまで、私達は上の隠れニューロンの出力に注目してきました。
ここでニューラルネットワーク全体の挙動を見てみましょう。
2つの隠れニューロンは段差の位置が$s_1$（上ニューロン）と$s_2$（下ニューロン）でパラメータ付けられたステップ関数を計算しているとします。
さらに、出力の重みをそれぞれ$w_1$, $w_2$とします。ニューラルネットワークは以下の通りです：
</p>
<p><canvas id="two_hn_network" width="600" height="285"></canvas></p>
<p>
<!--What's being plotted on the right is the <em>weighted output</em> $w_1
a_1 + w_2 a_2$ from the hidden layer.  Here, $a_1$ and $a_2$ are the
outputs from the top and bottom hidden neurons, respectively*<span class="marginnote">
*Note, by the way, that the output from the whole
  network is $\sigma(w_1 a_1+w_2 a_2 + b)$, where $b$ is the bias on
  the output neuron.  Obviously, this isn't the same as the weighted
  output from the hidden layer, which is what we're plotting here.
  We're going to focus on the weighted output from the hidden layer
  right now, and only later will we think about how that relates to
  the output from the whole network.</span>.  These outputs are denoted with
$a$s because they're often known as the neurons' <em>activations</em>.-->
右に図示しているのは隠れ層からの<em>重み付き出力</em>$w_1 a_1 + w_2 a_2$です。
ここで、$a_1$と$a_2$はそれぞれ上下の隠れニューロンからの出力です*<span class="marginnote">
* ところで、$b$は出力ニューロンのバイアスとすれば、全ニューラルネットワークの出力は$\sigma(w_1 a_1+w_2 a_2 + b)$である事に注意してください。
もちろんこの値は今図示している隠れ層の重み付き出力とは異なります。
今私達は隠れ層からの重み付き出力に注目しているので、ニューラルネットワーク全体の出力との関連付けはその後を考えます。
</span>
これらの出力はしばしば<em>活性(activation)</em>と呼ばれるため、$a$で表す事にします。
</p>
<p>
<!--Try increasing and decreasing the step point $s_1$ of the top hidden
neuron.  Get a feel for how this changes the weighted output from the
hidden layer.
It's particularly worth understanding what happens when $s_1$ goes
past $s_2$.  You'll see that the graph changes shape when this
happens, since we have moved from a situation where the top hidden
neuron is the first to be activated to a situation where the bottom
hidden neuron is the first to be activated.-->
上の隠れニューロンの段差地点$s_1$を増減させて、隠れ層からの重み付き出力をどのように変化させるかについて感覚を掴んでください。
特に$s_1$を$s_2$に通り越した時に何が起こるかを理解するのは有用です。
上の隠れニューロンが反応する状況から下の隠れニューロンが反応する状況に変化するために、グラフの形が変化するのがわかると思います。
</p>
<p>
<!--Similarly, try manipulating the step point $s_2$ of the bottom hidden
neuron, and get a feel for how this changes the combined output from
the hidden neurons.-->
同様に、下側の隠れニューロンでの段差地点$s_2$を増減させて、出力が変化する様子の感覚を掴んでください。
</p>
<p>
<!--Try increasing and decreasing each of the output weights.  Notice how
this rescales the contribution from the respective hidden neurons.
What happens when one of the weights is zero?-->
出力の重みをそれぞれ増減させてみてください。
それに応じてそれぞれの隠れニューロンからの寄与が拡大・縮小される事がわかります。
重みのうちの1つを$0$にするとどのようなことが起こるでしょうか？
</p>
<p>
<!--Finally, try setting $w_1$ to be $0.8$ and $w_2$ to be $-0.8$.  You
get a "bump" function, which starts at point $s_1$, ends at point
$s_2$, and has height $0.8$.  For instance, the weighted output might
look like this:-->
最後に$w_1$を$0.8$、$w_2$を$-0.8$にセットしてみてください。
$s_1$から始まり$s_2$で終わる高さ$0.8$のコブ状の関数が得られます。
例えば、重み付き出力はこのような感じです：
</p>
<p><img src="images/bump_function.jpg"></p>
<p>
<!--Of course, we can rescale the bump to have any height at all.  Let's
use a single parameter, $h$, to denote the height.  To reduce clutter
I'll also remove the "$s_1 = \ldots$" and "$w_1 = \ldots$" notations.-->
もちろん、コブの高さを任意に拡大・縮小できます。
高さを表すパラメータ$h$を導入しましょう。
煩わしさを減らす為に"$s_1 = \ldots$"や"$w_1 = \ldots$"などの式を省略します。
</p>
<p><canvas id="bump_fn" width="600" height="285"></canvas></p>
<p>
<!--Try changing the value of $h$ up and down, to see how the height of
the bump changes.  Try changing the height so it's negative, and
observe what happens.  And try changing the step points to see how
that changes the shape of the bump.-->
$h$の値を増減させてみて、コブの高さが変化する様子を見てください。
高さを負の値に変化させて何が起こるかを観察してください。
さらに、段差地点を変更してコブの形がどのように変化するかを見てください。
</p>
<p>
<!--You'll notice, by the way, that we're using our neurons in a way that
can be thought of not just in graphical terms, but in more
conventional programming terms, as a kind of <tt>if-then-else</tt>
statement, e.g.:-->
ところで、我々はニューロンを視覚的説明の観点からだけではなく、プログラミングの観点からも見ることができ、
ニューロンを以下の様な<tt>if-then-else</tt>構文のようもみなせる事に気づいたかもしれません。例えば次の通りです
</p>
<p><div class="highlight"><pre>
    <span class="k">if </span>input &gt;<span class="o">=</span> step point:
        重み付き出力に1を加える
    <span class="k">else</span>:
        重み付き出力に0を加える
</pre></div>
</p>
<p>
<!--For the most part I'm going to stick with the graphical point of view.
But in what follows you may sometimes find it helpful to switch points
of view, and think about things in terms of <tt>if-then-else</tt>.-->
以降の説明の大部分では視覚的な観点にこだわろうと思います。
しかしこれ以降の説明を、<tt>if-then-else</tt>の観点で考えると理解に役立つかも知れません。
</p>
<p>
<!--We can use our bump-making trick to get two bumps, by gluing two pairs
of hidden neurons together into the same network:-->
このコブを作るトリックを利用し、2組の隠れニューロンのペアを1つのニューラルネットワーク内でくっつける事で、2つのコブを作る事ができます：
</p>
<p><canvas id="double_bump" width="600" height = "280"></canvas></p>
<p>
<!--I've suppressed the weights here, simply writing the $h$ values for
each pair of hidden neurons.  Try increasing and decreasing both $h$
values, and observe how it changes the graph.  Move the bumps around
by changing the step points.-->
ここでは重みは書かず、それぞれの隠れニューロンのペアに対して$h$を書きました。
両方の$h$の値を増減させて、グラフがどのように変化するかを観察してください。
また、段差の地点を変更させることでコブを移動させてください。
</p>
<p>
<!--More generally, we can use this idea to get as many peaks as we want,
of any height.  In particular, we can divide the interval $[0, 1]$ up
into a large number, $N$, of subintervals, and use $N$ pairs of hidden
neurons to set up peaks of any desired height.  Let's see how this
works for $N = 5$.  That's quite a few neurons, so I'm going to pack
things in a bit.  Apologies for the complexity of the diagram: I could
hide the complexity by abstracting away further, but I think it's
worth putting up with a little complexity, for the sake of getting a
more concrete feel for how these networks work.-->
より一般的には、このアイデアを利用して好きな高さで好きな数のピークを構成できます。
大きな数$N$を用いて区間$[0, 1]$を$N$個の部分区間に分割します。
そして、$N$組の隠れニューロンのペアを用いて好きな高さのピークを構成できます。
$N = 5$の場合について見てみましょう。たくさんのニューロンがあるので少し詰めて描いています。
図が複雑になってしまいすみません：省略して描けば複雑さを隠せるのですが、
ニューラルネットワークにふるまいを具体的にイメージするため、若干の複雑さは我慢する価値はあると思います。
</p>
<p><canvas id="five_bumps" width="600" height = "620"></canvas></p>
<p>
<!--You can see that there are five pairs of hidden neurons.  The step
points for the respective pairs of neurons are $0, 1/5$, then $1/5,
2/5$, and so on, out to $4/5, 5/5$.  These values are fixed - they
make it so we get five evenly spaced bumps on the graph.-->
5組の隠れニューロンのペアが見て取れると思います。
それぞれのペアの段差地点は$(0, 1/5), (1/5, 2/5), \ldots , (4/5, 5/5)$です。
これらの値は固定されており、5つの等間隔に配置されたコブを持つグラフが得られるようにします。
</p>
<p>
<!--Each pair of neurons has a value of $h$ associated to it.  Remember,
the connections output from the neurons have weights $h$ and $-h$ (not
marked).  Click on one of the $h$ values, and drag the mouse to the
right or left to change the value.  As you do so, watch the function
change.  By changing the output weights we're actually
<em>designing</em> the function!-->
それぞれのニューロンペアには$h$の値が伴っています。
これらのニューロン達から出ている枝にはそれぞれ$h$、$-h$の重みが与えられています（これらは図には載せていません）。
$h$のうち一つをクリックし、左右にドラッグして値を変えてみてください。
どのように関数が変化するかを観察しましょう。
重みを変えることで、関数を<em>設計</em>することができるのです！
</p>
<p>
<!--Contrariwise, try clicking on the graph, and dragging up or down to
change the height of any of the bump functions.  As you change the
heights, you can see the corresponding change in $h$ values.  And,
although it's not shown, there is also a change in the corresponding
output weights, which are $+h$ and $-h$.-->
逆に、グラフをクリックし上下にドラッグして、どれかのコブの高さを変えてみてください。
高さを変化するに対応して、$h$の値が変化するのがわかると思います。
図には現れませんが、対応する出力の重み（これらの値は$+h$と$-h$です）も変化しています。
</p>
<p>
<!--In other words, we can directly manipulate the function appearing in
the graph on the right, and see that reflected in the $h$ values on
the left.  A fun thing to do is to hold the mouse button down and drag
the mouse from one side of the graph to the other.  As you do you this you draw out a function,
and get to watch the parameters in the neural network adapt.-->
言い換えると、我々は右のグラフ内の関数を直接操作すると、それが左のニューラルネット内の$h$の値として反映されているのがわかります。
マウスのボタンを押しっぱなしにして、一方からもう一方までドラッグしてみると面白いでしょう。
関数を様々な形に引き伸ばすのに応じて、ニューラルネットワークのパラメータが変化する様子を見て取れます。
</p>
<p>
<!--Time for a challenge.-->
それではチャレンジの時間です。
</p>
<p>
<!--Let's think back to the function I plotted at the beginning of the
chapter:-->
この章の最初に私が描いた関数を思い出してください。
</p>
<p><center> <canvas id="function_3" width="300" height="300"></canvas>
</center></p>
<p>
<!--I didn't say it at the time, but what I plotted is actually the function
<a class="displaced_anchor" name="eqtn106"></a>\begin{eqnarray}
f(x) = 0.2+0.4 x^2+0.3 \sin(15 x) + 0.05 \cos(50 x),
\tag{106}\end{eqnarray}
plotted over $x$ from $0$ to $1$, and with the $y$ axis taking
values from $0$ to $1$.-->
その時には言いませんでしたが、私が描いたのは
<a class="displaced_anchor" name="eqtn106"></a>\begin{eqnarray}
f(x) = 0.2+0.4 x^2+0.3 \sin(15 x) + 0.05 \cos(50 x),
\tag{106}\end{eqnarray}
という関数の$x$が$0$から$1$の部分で、$y$軸は$0$から$1$の値を取っています。
</p>
<p>
<!--That's obviously not a trivial function.-->
見ての通りこれは簡単な関数ではありません。
</p>
<p>
<!--You're going to figure out how to compute it using a neural network.-->
これをニューラルネットワークで計算する方法を導き出してみましょう。
</p>
<p>
<!--In our networks above we've been analysing the weighted combination
$\sum_j w_j a_j$ output from the hidden neurons.  We now know how to
get a lot of control over this quantity.  But, as I noted earlier,
this quantity is not what's output from the network.  What's output
from the network is $\sigma(\sum_j w_j a_j + b)$ where $b$ is the bias
on the output neuron.  Is there some way we can achieve control over
the actual output from the network?-->
前述のニューラルネットワークでは、隠れニューロンからの出力の重み付き和$\sum_j w_j a_j$を解析してきました。
私達は既にこの値の調節方法を良く知っています。
しかし、前に指摘したようにこの値はニューラルネットワークの出力ではありません。
ニューラルネットワークの本来の出力は$\sigma(\sum_j w_j a_j + b)$です。
ここで、$b$は出力ニューロンのバイアス項です。
ニューラルネットワークの実際の出力を調節する方法は何かないでしょうか？
</p>
<p>
<!--The solution is to design a neural network whose hidden layer has a
weighted output given by $\sigma^{-1} \circ f(x)$, where $\sigma^{-1}$
is just the inverse of the $\sigma$ function.  That is, we want the
weighted output from the hidden layer to be:-->
答えは、隠れ層の重み付き出力が$\sigma^{-1} \circ f(x)$を持つニューラルネットワークを設計することです。
ここで$\sigma^{-1}$は$\sigma$関数の逆関数です。
すなわち、隠れ層からの重み付き出力を以下のようにします：
</p>
<p><center> <canvas id="inverted_function" width="340"
height="300"></canvas> </center></p>
<p>
<!--If we can do this, then the output from the network as a whole will be
a good approximation to $f(x)$*<span class="marginnote">
*Note that I have set the bias on the output neuron to $0$.</span>.-->
もしこれができれば、ニューラルネットワーク全体での出力は$f(x)$の良い近似となります*<span class="marginnote">
*出力ニューロンのバイアスを$0$としている事に注意してください。</span>。
</p>
<p>
<!--Your challenge, then, is to design a neural network to approximate the
goal function shown just above.  To learn as much as possible, I want
you to solve the problem twice.  The first time, please click on the
graph, directly adjusting the heights of the different bump functions.
You should find it fairly easy to get a good match to the goal
function.  How well you're doing is measured by the <em>average
  deviation</em> between the goal function and the function the network is
actually computing.  Your challenge is to drive the average deviation
as <em>low</em> as possible.  You complete the challenge when you drive
the average deviation to $0.40$ or below.-->
チャレンジするのはニューラルネットワークを設計し、前述した目標関数を近似することです。
できるだけ多くのことを学ぶために、この問題を2回解いてほしいと思っています。
1回目はグラフをクリックし、それぞれのコブの高さを直接調節してください。
目標関数によくマッチする関数を得るのは比較的簡単だと感じるはずです。
調整がどの程度上手く行えているかは、
目的関数とニューラルネットワークが計算している関数との間の<em>平均偏差</em>で測定できます。
チャレンジするのは平均偏差を出来るだけ<em>小さく</em>することです。
平均偏差を$0.40$かそれ未満に抑える事ができたらチャレンジは終了です。
</p>
<p>
<!--Once you've done that, click on "Reset" to randomly re-initialize
the bumps.  The second time you solve the problem, resist the urge to
click on the graph.  Instead, modify the $h$ values on the left-hand
side, and again attempt to drive the average deviation to $0.40$ or
below.-->
1度目がうまくできたら、リセットボタンを押してコブをランダムに初期化し直してください。
2度目はグラフをクリックしたくなる気持ちを抑えて問題を解いてください。
その代わり左側の$h$の値を変更することで平均偏差を再び$0.40$かそれ未満に抑えてみてください。
</p>
<p><canvas id="design_function" width="600" height = "620"></canvas></p>
<p>
<!--You've now figured out all the elements necessary for the network to
approximately compute the function $f(x)$!  It's only a coarse
approximation, but we could easily do much better, merely by
increasing the number of pairs of hidden neurons, allowing more bumps.-->
これでニューラルネットワークを用いて関数$f(x)$を近似的に計算するために必要な要素が全て揃いました。
これは粗い近似ですが、隠れニューロンのペアの数を増やすことで簡単に近似精度を良くできます。
</p>
<p>
<!--In particular, it's easy to convert all the data we have found back
into the standard parameterization used for neural networks.  Let me
just recap quickly how that works.-->
私達が見つけ出したデータをニューラルネットワークの通常のパラメータでの表現に戻すのは簡単です。
どのように行うかを簡単におさらいします。
</p>
<p>
<!--The first layer of weights all have some large, constant value, say $w
= 1000$.-->
最初の層のニューロンの重みは全て、例えば$w=1000$などの適当な大きな定数です。
</p>
<p>
<!--The biases on the hidden neurons are just $b = -w s$.  So, for
instance, for the second hidden neuron $s = 0.2$ becomes $b = -1000
\times 0.2 = -200$.-->
隠れニューロンのバイアスは$b = -w s$です。例えば、2つ目の隠れニューロンで$s = 0.2$ならば、$b = -1000 \times 0.2 = -200$です。
</p>
<p>
<!--The final layer of weights are determined by the $h$ values.  So, for
instance, the value you've chosen above for the first $h$, $h = $
<span id="h" style="font-family: MJX_Main;"></span>, means that
the output weights from the top two hidden neurons are
<span id="w1" style="font-family: MJX_Main;"></span>
and
<span id="w2" style="font-family: MJX_Main;"></span>
, respectively.  And so on, for the entire layer of output weights.-->
最終層の重みは$h$の値によって決まります。例えば、最初の$h$について$h=$
<span id="h" style="font-family: MJX_Main;"></span>
を選んでいるので、1番上の2つの隠れニューロンでの出力側の重みは、それぞれ
<span id="w1" style="font-family: MJX_Main;"></span>
と
<span id="w2" style="font-family: MJX_Main;"></span>
です。
他の出力側の重みでも同様です。
</p>
<p>
<!--Finally, the bias on the output neuron is $0$.-->
最後に出力ニューロンでのバイアスは$0$です。
</p>
<p>
<!--That's everything: we now have a complete description of a neural
network which does a pretty good job computing our original goal
function.  And we understand how to improve the quality of the
approximation by improving the number of hidden neurons.-->
以上で、目標関数を十分良く計算できるニューラルネットワークを記述できました。
また、隠れニューロンの数を増やして近似の精度を良くする方法もわかりました。
</p>
<p>
<!--What's more, there was nothing special about our original goal
function, $f(x) = 0.2+0.4 x^2+0.3 \sin(15 x) + 0.05 \cos(50 x)$.  We
could have used this procedure for any continuous function from $[0,
1]$ to $[0, 1]$.  In essence, we're using our single-layer neural
networks to build a lookup table for the function.  And we'll be able
to build on this idea to provide a general proof of universality.-->
さらに、私達は目標関数$f(x) = 0.2+0.4 x^2+0.3 \sin(15 x) + 0.05 \cos(50 x)$について特別な仮定を置いていません。
私達は$[0, 1]$から$[0, 1]$への任意の連続関数に対してこの手順を利用できます。
本質的な部分は関数のルックアップテーブルを構築するのに、1層のニューラルネットワークを用いていることです。
このアイデアを用いて、一般の場合の普遍性定理の証明を行うことできます。
</p>
<p><h3><a name="many_input_variables"></a><a href="#many_input_variables">
<!--Many input variables-->
多変数の場合
</a></h3></p>
<p>
<!--Let's extend our results to the case of many input variables.  This
sounds complicated, but all the ideas we need can be understood in the
case of just two inputs.  So let's address the two-input case.-->
以上の結果を多変数の場合に拡張しましょう。
これは複雑そうに聞こえるかも知れません。
しかし、必要なアイデアは全て入力が2つの場合で理解できますので、2入力の場合に注目して考えてみましょう。
</p>
<p>
<!--We'll start by considering what happens when we have two inputs to a neuron:-->
ニューラルネットワークが2入力を持つ場合に何が起こるかを考える所から始めましょう：
</p>
<p><center> <canvas id="two_inputs" width="350" height="220"></canvas>
</center></p>
<p>
<!--Here, we have inputs $x$ and $y$, with corresponding weights $w_1$ and
$w_2$, and a bias $b$ on the neuron.  Let's set the weight $w_2$ to
$0$, and then play around with the first weight, $w_1$, and the bias,
$b$, to see how they affect the output from the neuron:-->
入力$x$, $y$とそれぞれに対応した重み$w_1$, $w_2$とバイアス$b$があります。
$w_2$の重みを$0$にした状態で1つ目の重み$w_1$とバイアス$b$をいじり、それらがニューロンの出力にどのように影響を与えるかを見てみましょう。
</p>
<p><script src="js/three.min.js"></script></p>
<p><canvas id="ti_graph" width="200" height="220"></canvas>
<span id="ti_graph_3d" style="position: absolute; left: 260px;"></span></p>
<p>
<!--As you can see, with $w_2 = 0$ the input $y$ makes no difference to
the output from the neuron.  It's as though $x$ is the only input.-->
見ての通り、$w_2=0$とすると入力$y$の値はニューロンの出力に何の違いも生み出しません。
まるで$x$のみが入力であるかのように振る舞います。
</p>
<p>
<!--Given this, what do you think happens when we increase the weight
$w_1$ to $w_1 = 100$, with $w_2$ remaining $0$?  If you don't
immediately see the answer, ponder the question for a bit, and see if
you can figure out what happens.  Then try it out and see if you're
right.  I've shown what happens in the following movie:-->
これを踏まえて、$w_2$を0としたまま$w_1$の重みを増やして$w_1 = 100$とした時、何が起こると思いますか。
もしすぐにこの答えが分からなかったら何が起こるかを少し考えてから、それが正しいかを試してみてください。
以下の動画で何が起こるかを示しています：
</p>
<p><div>
	<div id="b" class="videoOverlay"
	     style="width: 460px; height: 252px; opacity: 0.8"
	     onclick="playVideo('b');">
	  <img style="left: 160px; top: 70px;"
	       src="images/play.png" width="128px">
	</div>
	  <video id="vb" width="460" height="252" preload
		 onended="videoEnded('b');">
	    <source type="video/mp4"
		    src="movies/step_3d.mp4">
	    <source type="video/webm"
		    src="movies/step_3d.webm"></p>
<p>	  </video>
</div></p>
<p>
<!--Just as in our earlier discussion, as the input weight gets larger the
output approaches a step function.  The difference is that now the
step function is in three dimensions.  Also as before, we can move the
location of the step point around by modifying the bias.  The actual
location of the step point is $s_x \equiv -b / w_1$.-->
以前議論したように、入力の重みが大きくなるにつれて出力はステップ関数に近づきます。
前と異なるのは今回はステップ関数が3次元である点です。
前と同様にバイアスを変更することで段差の位置を動かすことができます。
実際の段差の位置は$s_x \equiv -b / w_1$です。
</p>
<p>
<!--Let's redo the above using the position of the step as the parameter:-->
段差の位置をパラメータとして、同じことをもう1度行ってみましょう：
</p>
<p><canvas id="ti_graph_redux" width="200" height="220"></canvas> <span
id="ti_graph_redux_3d" style="position: absolute; left:
260px;"></span></p>
<p>
<!--Here, we assume the weight on the $x$ input has some large value
- I've used $w_1 = 1000$ - and the weight $w_2 = 0$.  The
number on the neuron is the step point, and the little $x$ above the
number reminds us that the step is in the $x$ direction.
Of course, it's also possible to get a step function in the $y$
direction, by making the weight on the $y$ input very large (say, $w_2
= 1000$), and the weight on the $x$ equal to $0$, i.e., $w_1 = 0$:-->
ここで、$x$の重みは適当な大きな値（私は$w_1=1000$を用いました）とし、$w_2=0$としています。
ニューロン内の数字は段差の位置を示し、数字の上の小さな$x$の文字は段差が$x$方向である事を表しています。
もちろん、$y$の重みを大きな値（例えば$w_2=1000$）とし$x$の重みを$0$とする（すなわち$w_1=0$とする）ことで
段差を$y$方向に向けられます：
</p>
<p><canvas id="y_step" width="200" height="220"></canvas> <span
id="y_step_3d" style="position: absolute; left: 260px;"></span></p>
<p>
<!--The number on the neuron is again the step point, and in this case the
little $y$ above the number reminds us that the step is in the $y$
direction.  I could have explicitly marked the weights on the $x$ and
$y$ inputs, but decided not to, since it would make the diagram rather
cluttered.  But do keep in mind that the little $y$ marker implicitly
tells us that the $y$ weight is large, and the $x$ weight is $0$.-->
前と同様に、ニューロン内の数は段差地点です。
数字の上の小さな$y$は段差が今度は$y$方向を向いていることを表しています。
$x$と$y$それぞれの重みを明示することも出来ましたが、そうすると図が混雑してしまうので書きませんでした。
しかし、小さな文字$y$により暗に$y$の重みが大きくて$x$の重みが$0$であることを示していることを忘れないで下さい。
</p>
<p>
<!--We can use the step functions we've just constructed to compute a
three-dimensional bump function.  To do this, we use two neurons, each
computing a step function in the $x$ direction.  Then we combine those
step functions with weight $h$ and $-h$, respectively, where $h$ is
the desired height of the bump.  It's all illustrated in the following
diagram:-->
今つくったステップ関数を用いて3次元のコブ状の関数を構成できます。
そのためには、$x$方向のステップ関数を計算する2つのニューロンを用意し、それらのステップ関数をそれぞれ重み$h$と$-h$で結合すればよいです。
ここで$h$は作りたいコブの高さです。
以上の内容が下図に表されています。
</p>
<p><canvas id="bump_3d" width="300" height="220"></canvas> <span
id="bump_3d_graph" style="position: absolute; left: 360px;"></span></p>
<p>
<!--Try changing the value of the height, $h$. Observe how it relates to
the weights in the network.  And see how it changes the height of the
bump function on the right.-->
高さ$h$の値を変更してみて、それがネットワークの重みとどのように関係しているかを観察してみてください。
また、$h$の値により右のコブ状の関数の高さがどのように変化するかを見てください。
</p>
<p>
<!--Also, try changing the step point $0.30$ associated to the top hidden
neuron.  Witness how it changes the shape of the bump.  What happens
when you move it past the step point $0.70$ associated to the bottom
hidden neuron?-->
さらに、上の隠れニューロンによって決められている段差地点を$0.30$から変更してください。
コブの形がどのように変化するかを見てみましょう。
下の隠れニューロンによって決められる段差地点を$0.70$から変えた時に何が起こるでしょうか？
<!--
2015/1/26 Kenta OONO
ここのpastのニュアンスがよくわからなかった。0.70を越えてという風に訳すところかもしれないが、
対応する図では初期値が0.70となっており、通り過ぎるという事を示唆しているようには思えなかった。
-->
</p>
<p>
<!--We've figured out how to make a bump function in the $x$ direction.
Of course, we can easily make a bump function in the $y$ direction, by
using two step functions in the $y$ direction.  Recall that we do this
by making the weight large on the $y$ input, and the weight $0$ on the
$x$ input.  Here's the result:-->
私達は$x$方向にコブ状の関数を作る方法を見つけました。
もちろん、$y$方向のステップ関数を用いることで$y$方向のコブ状の関数を簡単に作れます。
そうするには、入力$y$の重みを大きくして入力$x$の重みを$0$にすればよい事を思い出してください。
下図が結果です：
</p>
<p><canvas id="bump_3d_y" width="300" height="220"></canvas> <span
id="bump_3d_y_graph" style="position: absolute; left: 360px;"></span></p>
<p>
<!--This looks nearly identical to the earlier network!  The only thing
explicitly shown as changing is that there's now little $y$ markers on
our hidden neurons.  That reminds us that they're producing $y$ step
functions, not $x$ step functions, and so the weight is very large on
the $y$ input, and zero on the $x$ input, not vice versa.  As before,
I decided not to show this explicitly, in order to avoid clutter.-->
これは先程のネットワークとほとんど同じものです！
図中に示した中で唯一変更しているのは、隠れニューロン内の小さな文字を$y$にした点だけです。
これは、今作っているのが$x$方向ではなく$y$方向のステップ関数であることを思い出すためのものです。
つまり入力$y$はとても大きくて$x$は$0$であり逆ではありません。
前と同様に図が煩雑にならないようにこの事は明示しませんでした。
</p>
<p>
<!--Let's consider what happens when we add up two bump functions, one in
the $x$ direction, the other in the $y$ direction, both of height $h$:-->
共に高さが$h$である$x$方向と$y$方向の2つのコブ状の関数を足しあわせた時に、何が起こるかを考えてみましょう：
</p>
<p><canvas id="xy_bump" width="300" height="270"></canvas> <span
id="xy_bump_3d" style="position: absolute; left: 360px;"></span></p>
<p>
<!--To simplify the diagram I've dropped the connections with zero weight.
For now, I've left in the little $x$ and $y$ markers on the hidden
neurons, to remind you in what directions the bump functions are being
computed.  We'll drop even those markers later, since they're implied
by the input variable.-->
図を簡単にするために、重み$0$の枝は図から取り除きました。
この図ではどちらの方向のコブ状の関数を計算しているかを忘れないように隠れニューロン内の$x$と$y$の小さな文字は残しておきました。
しかし、コブの方向は入力変数からわかるので今後はこれらの文字も取り除きます。
</p>
<p>
<!--Try varying the parameter $h$.  As you can see, this causes the output
weights to change, and also the heights of both the $x$ and $y$ bump
functions.-->
パラメータ$h$を変化させてみてください。見ての通りこれにより出力の重みが変化し、$x$と$y$両方のコブ状の関数の高さも変化します。
</p>
<p>
<!--What we've built looks a little like a <em>tower</em> function:-->
これにより少し<em>塔</em>に似た形の関数を構成することが出来ました：
</p>
<p><center> <span id="tower" style="position: absolute;"></span> <div
style="height: 230px"></div> </center></p>
<p>
<!--If we could build such tower functions, then we could use them to
approximate arbitrary functions, just by adding up many towers of
different heights, and in different locations:-->
もし、このような塔状の関数を構成できれば、様々な位置にある様々な高さの塔を足しあわせることで任意の関数を近似できます：
</p>
<p><center> <span id="many_towers" style="position: absolute;"></span>
<div style="height: 230px"></div> </center></p>
<p>
<!--
Of course, we haven't yet figured out how to build a tower function.
What we have constructed looks like a central tower, of height $2h$,
with a surrounding plateau, of height $h$.-->
もちろん、我々はこのような塔状の関数を構成する方法をまだ見出していません。
私達が実際に構成できているのは中央に高さ$2h$の塔がありそれを高さ$h$の台地が囲むような関数です。
</p>
<p>
<!--But we can make a tower function.  Remember that earlier we saw
neurons can be used to implement a type of <tt>if-then-else</tt>
statement:-->
ところが、私達はここから塔状の関数を構成できるのです。
ニューロンを<tt>if-then-else</tt>構文の形の関数を用いたことを思い出してください。
<!--
2015/4/19 Kenta OONO
英文の主客が逆のように思う。if-then-elseを用いてニューロンを構成しているのでは？
-->
</p>
<p>
<!--
<div class="highlight"><pre>
    <span class="k">if </span>input &gt;<span class="o">=</span> threshold:
        output 1
    <span class="k">else</span>:
        output 0
</pre></div>-->
<div class="highlight"><pre>
    <span class="k">if </span>input &gt;<span class="o">=</span> threshold:
        1を出力
    <span class="k">else</span>:
        0を出力
</pre></div>
</p>
<p>
<!--That was for a neuron with just a single input.  What we want is to
apply a similar idea to the combined output from the hidden neurons:-->
これは1入力しかないニューロンについてのものでした。
今欲しいのは同様のアイデアを複数の隠れニューロン集合からの出力をまとめた値に対して適用したものです：
</p>
<p>
<!--
<div class="highlight"><pre>
    <span class="k">if </span>combined output from hidden neurons &gt;<span class="o">=</span> threshold:
        output 1
     <span class="k">else</span>:
        output 0
</pre></div>-->
<div class="highlight"><pre>
    <span class="k">if </span>隠れニューロン集合からの出力をまとめた値 &gt;<span class="o">=</span> threshold:
        1を出力
    <span class="k">else</span>:
        0を出力
</pre></div>
</p>
<p>
<!--If we choose the <tt>threshold</tt> appropriately - say, a value of
$3h/2$, which is sandwiched between the height of the plateau and the
height of the central tower - we could squash the plateau down to
zero, and leave just the tower standing.-->
<tt>threshold</tt>を適切に、例えば台地の高さと中央の塔の高さに挟まれている$3h/2$のように、設定すると、
塔が立った状態で残したまま、台地を高さ$0$に潰すことができます。
</p>
<p>
<!--Can you see how to do this?  Try experimenting with the following
network to figure it out.  Note that we're now plotting the output
from the entire network, not just the weighted output from the hidden
layer.  This means we add a bias term to the weighted output from the
hidden layer, and apply the sigma function.  Can you find values for
$h$ and $b$ which produce a tower?  This is a bit tricky, so if you
think about this for a while and remain stuck, here's two hints: (1)
To get the output neuron to show the right kind of <tt>if-then-else</tt>
behaviour, we need the input weights (all $h$ or $-h$) to be large;
and (2) the value of $b$ determines the scale of the
<tt>if-then-else</tt> threshold.-->
これをどのように行っているかわかりますか？
理解するために次のニューラルネットワークで実験してみましょう。
今回は隠れ層からの重み付き出力ではなく、ニューラルネットワーク全体の出力を図示していることに気をつけてください。
つまり、隠れ層からの重み付き出力にバイアス項を加え、シグマ関数を適用しています。
塔を作るための$h$と$b$の値を見つけられますか。
若干トリッキーなので、しばらく考えてそれでもわからなかったら次の2つのヒントを見てください：
 (1) 出力ニューロンが適切な<tt>if-then-else</tt>の挙動を示すためには、入力の重み（全ての$h$と$-h$）が大きな値でなければなりません。
 (2) $b$の値は<tt>if-then-else</tt>の閾値を決定します。
</p>
<p><canvas id="tower_construction" width="300" height="270"></canvas>
<span id="tower_construction_3d" style="position: absolute; left:
350px;"></span></p>
<p>
<!--With our initial parameters, the output looks like a flattened version
of the earlier diagram, with its tower and plateau.  To get the
desired behaviour, we increase the parameter $h$ until it becomes
large.  That gives the <tt>if-then-else</tt> thresholding
behaviour.  Second, to get the threshold right, we'll choose $b
\approx -3h/2$.  Try it, and see how it works!-->
初期パラメータでは、出力は前の図での塔と台地を潰したような形をしています。
望みの塔状の関数を作るには、まずパラメータ$h$の値を十分に大きくしてください。
そうすると、関数を<tt>if-then-else</tt>の形の閾値で切るような形にすることが出来ます。
次に、$b\approx -3h/2$として適切な閾値を設定してください。
実際にやってみてどのようになるかを確かめてください！
</p>
<p>
<!--Here's what it looks like, when we use $h = 10$:-->
下の動画は$h=10$とした場合の様子です：
</p>
<p><div>
	<div id="c" class="videoOverlay"
	     style="width: 556px; height: 284px; opacity: 0.8"
	     onclick="playVideo('c');">
	  <img style="left: 210px; top: 80px;"
	       src="images/play.png" width="128px">
	</div>
	  <video id="vc" width="556" height="284" preload
		 onended="videoEnded('c');">
	    <source type="video/mp4"
		    src="movies/tower_construction.mp4">
	    <source type="video/webm"
		    src="movies/tower_construction.webm">
</video></div></p>
<p>
<!--Even for this relatively modest value of $h$, we get a pretty good
tower function.  And, of course, we can make it as good as we want by
increasing $h$ still further, and keeping the bias as $b = -3h/2$.-->
このような比較的小さな値の$h$でも、かなり良い塔状の関数を作る事ができます。
もちろん$h$をさらに大きくしつつ$b=-3h/2$を保つことで、この関数の形をいくらでも良くすることができます。
</p>
<p>
<!--Let's try gluing two such networks together, in order to compute two
different tower functions.  To make the respective roles of the two
sub-networks clear I've put them in separate boxes, below: each box
computes a tower function, using the technique described above.  The
graph on the right shows the weighted output from the <em>second</em>
hidden layer, that is, it's a weighted combination of tower functions.-->
今度は、2つの塔状の関数を表現するために、2つのネットワークをくっつけてみましょう。
それぞれの部分ネットワークの役割を明確にするために、下図ではそれぞれを別の箱に入れました。
それぞれの箱はこれまでのテクニックを用いて塔状の関数を計算しています。
右側のグラフは<em>2層目</em>の隠れ層からの重み付き出力、すなわち、2つの塔状の関数の重み付きの重ねあわせを表しています。
</p>
<p><canvas id="the_two_towers" width="320" height="580"></canvas> <span
id="the_two_towers_3d" style="position: absolute; left: 370px;
margin-top: 180px;"></span></p>
<p>
<!--In particular, you can see that by modifying the weights in the final
layer you can change the height of the output towers.-->
特に、最終層の重みを変える事で、出力される塔の高さを変えることができます。
</p>
<p>
<!--The same idea can be used to compute as many towers as we like.  We
can also make them as thin as we like, and whatever height we like.
As a result, we can ensure that the weighted output from the second
hidden layer approximates any desired function of two variables:-->
同じアイデアで好きな数の塔を計算することができます。さらにそれぞれの塔は好きな幅で好きな高さにすることもできます。
従って2つ目の隠れ層の出力はどんな2変数関数も近似できることがわかります：
</p>
<p><center> <span id="many_towers_2" style="position: absolute;"></span>
<div style="height: 230px"></div> </center></p>
<p>
<!--In particular, by making the weighted output from the second hidden
layer a good approximation to $\sigma^{-1} \circ f$, we ensure the
output from our network will be a good approximation to any desired
function, $f$.-->
特に、2つ目の隠れ層の重み付き和を$\sigma^{-1} \circ f$の良い近似にする事で、
ニューラルネットワーク全体の出力を好きな関数$f$の良い近似とする事ができます。
</p>
<p>
<!--What about functions of more than two variables?-->
2変数関数以上の関数ではどうでしょうか？
</p>
<p>
<!--Let's try three variables $x_1, x_2, x_3$.  The following network can
be used to compute a tower function in four dimensions:-->
3つの変数$x_1, x_2, x_3$を考えてみましょう。
次のニューラルネットワークは4次元空間の中の塔状の関数を表現する事ができます。
</p>
<p><canvas id="tower_n_dim" width="300" height="410"></canvas></p>
<p>
<!--Here, the $x_1, x_2, x_3$ denote inputs to the network.  The $s_1,
t_1$ and so on are step points for neurons - that is, all the
weights in the first layer are large, and the biases are set to give
the step points $s_1, t_1, s_2, \ldots$.  The weights in the second
layer alternate $+h, -h$, where $h$ is some very large number.  And
the output bias is $-5h/2$.-->
ここで、$x_1, x_2, x_3$はネットワークの入力を表します。
$s_1, t_1$等でニューロンの段差の点を表します。
すなわち、最初の層の全ての重みを大きくし、バイアスを段差点が$s_1, t_1, s_2, \ldots$となるようにセットします。
2つ目の層の重みは交互に$+h$と$-h$とします。ここで$h$は適当な大きな値です。さらに出力のバイアスを$-5h/2$とします。
</p>
<p>
<!--This network computes a function which is $1$ provided three
conditions are met: $x_1$ is between $s_1$ and $t_1$; $x_2$ is between
$s_2$ and $t_2$; and $x_3$ is between $s_3$ and $t_3$.  The network is
$0$ everywhere else.  That is, it's a kind of tower which is $1$ in a
little region of input space, and $0$ everywhere else.-->
「$x_1$が$s_1$と$t_1$の間にある」「$x_2$が$s_2$と$t_2$の間にある」「$x_3$が$s_3$と$t_3$の間にある」の3つの条件が満たされると、
このネットワークは$1$を出力します。それ以外の場合には$0$を出力します。
すなわち、この関数は入力空間の微小な領域では$1$を出力しそれ以外では$0$を出力する一種の塔状の関数を表しています。
</p>
<p>
<!--By glueing together many such networks we can get as many towers as we
want, and so approximate an arbitrary function of three variables.
Exactly the same idea works in $m$ dimensions.  The only change needed
is to make the output bias $(-m+1/2)h$, in order to get the right kind
of sandwiching behavior to level the plateau.-->
このようなネットワークをたくさんつなげる事で、好きな数の塔を作り3変数の任意の関数を近似できます。
同様のアイデアを$m$次元の場合にも当てはめられます。
唯一変更しなければならない点は、台地の高さにあわせて適切な閾値を設定するために出力のバイアスを$(-m+1/2)h$とする点のみです。
</p>
<p>
<!--Okay, so we now know how to use neural networks to approximate a
real-valued function of many variables.  What about vector-valued
functions $f(x_1, \ldots, x_m) \in R^n$?  Of course, such a function
can be regarded as just $n$ separate real-valued functions, $f^1(x_1,
\ldots, x_m), f^2(x_1, \ldots, x_m)$, and so on.  So we create a
network approximating $f^1$, another network for $f^2$, and so on.
And then we simply glue all the networks together.  So that's also
easy to cope with.-->
これで、ニューラルネットワークを用いて実数値の多変数関数を近似する方法がわかりました。
それでは、ベクトル値関数$f(x_1, \ldots, x_m) \in R^n$の場合はどうでしょうか？
もちろん、このような関数は$n$個の独立した実数値関数$f^1(x_1,\ldots, x_m), f^2(x_1, \ldots, x_m)$とみなす事ができます。
従って、$f^1$を近似するニューラルネットワーク、$f^2$を近似する別のニューラルネットワーク・・・を構成できます。
それらを単純につなぎ合わせれば、簡単にベクトル値関数の場合にも対応することができます。
</p>
<p><h4><a name="problem_863961"></a><a href="#problem_863961">
<!--Problem-->
問題
</a></h4><ul></p>
<p><li>
<!--We've seen how to use networks with two hidden layers to
  approximate an arbitrary function.  Can you find a proof showing
  that it's possible with just a single hidden layer?  As a hint, try
  working in the case of just two input variables, and showing that:
  (a) it's possible to get step functions not just in the $x$ or $y$
  directions, but in an arbitrary direction; (b) by adding up many of
  the constructions from part (a) it's possible to approximate a tower
  function which is circular in shape, rather than rectangular; (c)
  using these circular towers, it's possible to approximate an
  arbitrary function.  To do part (c) it may help to use ideas from a
  <a href="#fixing_up_the_step_functions">bit later in this
    chapter</a>.-->
ここまで2つの隠れ層を用いて任意の関数を近似してきました。
それでは、1つの隠れ層だけを用いても近似が可能であることを証明できますか？
ヒントとして、入力変数が2つしかない場合で次のことを示してください：
 (a) 階段上の関数は$x$や$y$方向だけではなく任意の方向にも構成できること
 (b) (a) での構成した関数を足し合わせて、長方形ではなく円周の形をした塔状の関数を近似できること
 (c) この塔状の関数を用いて任意の関数を近似できる事。
 (c) を解く際に、<a href="#fixing_up_the_step_functions">この章で少し後</a>に紹介するアイデアが役立つかもしれません。
</p>
<p></ul></p>
<p><h3><a name="extension_beyond_sigmoid_neurons"></a><a href="#extension_beyond_sigmoid_neurons">シグモイド関数以外への拡張</a></h3></p>
<p>
<!--We've proved that networks made up of sigmoid neurons can compute any
function.  Recall that in a sigmoid neuron the inputs $x_1, x_2,
\ldots$ result in the output $\sigma(\sum_j w_j x_j + b)$, where $w_j$
are the weights, $b$ is the bias, and $\sigma$ is the sigmoid
function:-->
私達はこれまでシグモイドニューロンでできたニューラルネットワークが任意の関数を計算できることを証明してきました。
シグモイドニューロンでは、入力$x_1, x_2, \ldots$から$\sigma(\sum_j w_j x_j + b)$を出力することを思い出してください。
ここで$w_j$は重み、$b$はバイアス、$\sigma$はシグモイド関数です。
</p>
<p><center> <canvas id="sigmoid" width="500" height="200"></canvas>
</center></p>
<p>
<!--What if we consider a different type of neuron, one using some other
activation function, $s(z)$:-->
それでは、これとは異なる活性化関数$s(z)$を用いるニューロンを考えたらどうなるでしょうか：
</p>
<p><center> <canvas id="sigmoid_like" width="500" height="200"></canvas>
</center></p>
<p>
<!--That is, we'll assume that if our neurons has inputs $x_1, x_2,
\ldots$, weights $w_1, w_2, \ldots$ and bias $b$, then the output is
$s(\sum_j w_j x_j + b)$.-->
つまり、ニューロンが入力$x_1, x_2, \ldots$、重み$w_1, w_2, \ldots$、バイアス$b$から、$s(\sum_j w_j x_j + b)$を出力するとします。
</p>
<p>
<!--We can use this activation function to get a step function, just as we
did with the sigmoid.  Try ramping up the weight in the following, say
to $w = 100$:-->
</p>
シグモイド関数の時に行ったのと同様にして、この活性化関数を用いてステップ関数を作ることが出来ます。
次の図の重みを例えば$w = 100$まで増加させてみてください：
<p><canvas id="ramping" width="600" height="200"></canvas></p>
<p>
<!--Just as with the sigmoid, this causes the activation function to
contract, and ultimately it becomes a very good approximation to a
step function.  Try changing the bias, and you'll see that we can set
the position of the step to be wherever we choose.  And so we can use
all the same tricks as before to compute any desired function.-->
シグモイド関数の時と同様に活性化関数は縮んでいき、最終的にはステップ関数の良い近似となります。
バイアス項を変化させてみてください。すると、段差を自分が選んだ好きな位置に置けることがわかるはずです。
これにより、私達は任意の好きな関数を計算するためにこれまでと同様のトリックを使うことができるのです。
</p>
<p>
<!--What properties does $s(z)$ need to satisfy in order for this to work?
We do need to assume that $s(z)$ is well-defined as $z \rightarrow
-\infty$ and $z \rightarrow \infty$.  These two limits are the two
values taken on by our step function.  We also need to assume that
these limits are different from one another.  If they weren't, there'd
be no step, simply a flat graph!  But provided the activation function
$s(z)$ satisfies these properties, neurons based on such an activation
function are universal for computation.-->
この話がうまくいくためには、$s(z)$はどのような性質を持っている必要があるでしょうか。
$s(z)$は$z \rightarrow -\infty$と$z \rightarrow \infty$でwell-definedでなければなりません。
これらの極限はステップ関数がとる2つの値です。
また、これらの2つの値は異なることも仮定する必要があります。
もしそうでなかったとすると、階段のないただの平坦なグラフになってしまいます。
$s(z)$がこの性質さえ持っていれば、それを活性化関数にもつニューロンを用いたニューラルネットワークは普遍性を持ちます。
</p>
<p><h4><a name="problems_963556"></a><a href="#problems_963556">
<!--Problems-->
問題
</a></h4><ul>
<li>
<!--Earlier in the book we met another type of neuron known as a <a
  href="chap3.html#other_models_of_artificial_neuron">rectified linear
  unit</a>.  Explain why such neurons don't satisfy the conditions
  just given for universality.  Find a proof of universality showing
  that rectified linear units are universal for computation.-->
この本の前の方で、<a href="chap3.html#other_models_of_artificial_neuron">Rectified Linear Unit</a>と呼ばれる、別の種類のニューロンを取りあげました。
このニューロンが今説明した普遍性のための条件を満たさない理由を説明してください。
それにも関わらず、Rectifier Linear Unitによるニューラルネットワークは普遍性を持ちます。
その事の証明をしてください。
</p>
<p><li>
<!--Suppose we consider linear neurons, i.e., neurons with the
  activation function $s(z) = z$.  Explain why linear neurons don't
  satisfy the conditions just given for universality.  Show that such
  neurons can't be used to do universal computation.-->
線形ニューロン、つまり、$s(z) = z$という活性化関数を持つニューロンを考えます。
線形ニューロンは普遍性定理の条件を満たさない事を説明してください。
また、このニューロンによるニューラルネットワークは普遍性を持たないことを示してください。
</ul></p>
<p><h3><a name="fixing_up_the_step_functions"></a><a href="#fixing_up_the_step_functions">
<!--Fixing up the step functions-->
ステップ関数
</a></h3></p>
<p>
<!--Up to now, we've been assuming that our neurons can produce step
functions exactly.  That's a pretty good approximation, but it is only
an approximation.  In fact, there will be a narrow window of failure,
illustrated in the following graph, in which the function behaves very
differently from a step function:-->
これまで、我々が考えていたニューロンはステップ関数を計算できると仮定してきました。
この仮定はとてもよい近似ですが、近似でしかありません。
実際、次のグラフで示した非常に狭い「窓」において、ニューロンの出力はステップ関数とは大きく異なる振る舞いをしています：
</p>
<p><canvas id="failure" width="220" height="200"></canvas></p>
<p>
<!--In these windows of failure the explanation I've given for
universality will fail.-->
この窓の範囲では、普遍性について私が説明してきたことは成り立っていません。
</p>
<p>
<!--Now, it's not a terrible failure.  By making the weights input to the
neurons big enough we can make these windows of failure as small as we
like.  Certainly, we can make the window much narrower than I've shown
above - narrower, indeed, than our eye could see.  So perhaps we
might not worry too much about this problem.-->
しかし、これはそれほどひどい問題ではありません。
ニューロンへの入力対する重みを十分大きくすることで、この窓を好きなだけ小さくすることが出来ます。
実際、この窓を前に示した図中での幅よりも狭くしていき、目で判別できないほどの狭さにすることができます。
ですのでおそらく私達はこの問題に気にすることはないでしょう。
</p>
<p>
<!--Nonetheless, it'd be nice to have some way of addressing the problem.-->
ですが、この問題について何らかの対処をしておくのは悪いことではありません。
</p>
<p>
<!--In fact, the problem turns out to be easy to fix.  Let's look at the
fix for neural networks computing functions with just one input and
one input.  The same ideas work also to address the problem when there
are more inputs and outputs.-->
実際、この問題は簡単に解決出来ます。
入力も出力も1つしかないニューラルネットワークについてそれをみていきましょう。
もっと多くの入出力がある場合も同じアイデアが使えます。
</p>
<p>
<!--In particular, suppose we want our network to compute some function,
$f$.  As before, we do this by trying to design our network so that
the weighted output from our hidden layer of neurons is $\sigma^{-1}
\circ f(x)$:-->
ニューラルネットワークをある適当な関数$f$を計算するようにしたいとします。
前と同じように、ニューラルネットワークが隠れ層の重み付き出力が$\sigma^{-1} \circ f(x)$を出力するように設計してこれを実現します：
</p>
<p><center> <canvas id="inverted_function_2" width="340"
height="300"></canvas> </center></p>
<p>
<!--If we were to do this using the technique described earlier, we'd use
the hidden neurons to produce a sequence of bump functions:-->
これまで説明したテクニックを用いて、隠れ層ニューロンを用いてコブ状の関数の列を作ることが出来ます。
</p>
<p><center> <canvas id="series_of_bumps" width="340"
height="300"></canvas> </center></p>
<p>
<!--Again, I've exaggerated the size of the windows of failure, in order
to make them easier to see.  It should be pretty clear that if we add
all these bump functions up we'll end up with a reasonable
approximation to $\sigma^{-1} \circ f(x)$, except within the windows
of failure.-->
見やすくするために窓の幅を強調しています。
これらのコブ状の関数を足し合わせれば、窓の外では$\sigma^{-1} \circ f(x)$の良い近似になることは比較的明らかでしょう。
</p>
<p>
<!--Suppose that instead of using the approximation just described, we use
a set of hidden neurons to compute an approximation to <em>half</em> our
original goal function, i.e., to $\sigma^{-1} \circ f(x) / 2$.  Of
course, this looks just like a scaled down version of the last graph:-->
この近似方法を用いる代わりに、目的の関数の<em>半分</em>、
すなわち$\sigma^{-1} \circ f(x) / 2$を近似するように隠れニューロンを調整したとします。
もちろんこれは、直前のグラフの高さを低くしたもののような形となります。
</p>
<p><center> <canvas id="half_bumps" width="340" height="300"></canvas>
</center></p>
<p>
<!--And suppose we use another set of hidden neurons to compute an
approximation to $\sigma^{-1} \circ f(x)/ 2$, but with the bases of
the bumps <em>shifted</em> by half the width of a bump:-->
そして、もう1組の隠れニューロンのセットを用いて、同様に$\sigma^{-1} \circ f(x)/ 2$を近似します。しかし、今度は位置をコブの幅の半分だけ<em>ずらした</em>関数を用います：
</p>
<p><center> <canvas id="shifted_bumps" width="340" height="300"></canvas>
</center></p>
<p>
<!--Now we have two different approximations to $\sigma^{-1} \circ f(x) /
2$.  If we add up the two approximations we'll get an overall
approximation to $\sigma^{-1} \circ f(x)$.  That overall approximation
will still have failures in small windows.  But the problem will be
much less that before.  The reason is that points in a failure window
for one approximation won't be in a failure window for the other.  And
so the approximation will be a factor roughly $2$ better in those
windows.-->
これにより$\sigma^{-1} \circ f(x) / 2$の異なる近似が1つ得られました。
これらを足し合わせることで、$\sigma^{-1} \circ f(x)$全体の近似が得られます。
この関数は依然として近似がうまくできていない微小な窓を持っています。
しかし、前と比べるとその問題はずっと小さくなっています。
というのもｍ一方の近似関数での窓は他方では窓になっていないからです。
従ってこれらの窓において近似はおおよそ$2$倍程度良くなっています。
</p>
<p>
<!--We could do even better by adding up a large number, $M$, of
overlapping approximations to the function $\sigma^{-1} \circ f(x) /
M$. Provided the windows of failure are narrow enough, a point will
only ever be in one window of failure.  And provided we're using a
large enough number $M$ of overlapping approximations, the result will
be an excellent overall approximation.-->
大きな数$M$について、$M$個の互いに重なりあった$\sigma^{-1} \circ f(x) / M $の近似関数を足し合わせることで、近似をさらに良くできます。
窓が十分狭ければ数直線上の各点は高々1個の関数の窓の内側に含まれます。
そして、重なり合っている関数の個数$M$を十分大きく取れば、得られる関数は全体を十分に近似できるものになります。
</p>
<p><h3><a name="conclusion"></a><a href="#conclusion">
<!--Conclusion-->
結論
</a></h3></p>
<p>
<!--The explanation for universality we've discussed is certainly not a
practical prescription for how to compute using neural networks!  In
this, it's much like proofs of universality for <tt>NAND</tt> gates and
the like.  For this reason, I've focused mostly on trying to make the
construction clear and easy to follow, and not on optimizing the
details of the construction.  However, you may find it a fun and
instructive exercise to see if you can improve the construction.-->
ここまで行ってきた普遍性定理の説明はニューラルネットワークを用いた計算に対する現実的な処方箋ではありません！
その観点では<tt>NAND</tt>ゲートに対する普遍性定理と状況は同じです。
ですのでニューラルネットの構成方法は明快で解説が追いやすくなることに注力し、構成方法の細かい部分の最適化はしませんでした。
この構成方法を更に良くするのは面白く勉強になるでしょう。
</p>
<p>
<!--Although the result isn't directly useful in constructing networks,
it's important because it takes off the table the question of whether
any particular function is computable using a neural network.  The
answer to that question is always "yes".  So the right question to
ask is not whether any particular function is computable, but rather
what's a <em>good</em> way to compute the function.-->
本章での結果はニューラルネットワークを構成するのに直接有用ではありませんが、
任意の関数をニューラルネットワークで計算可能かという問いを議題から外すことができる、という点では重要です。
その問いへの答えは常に「可能である」だからです。
従ってある特定の関数を計算可能かではなく、その関数を計算する<em>良い</em>方法は何かというのが正しい問題設定となります。
</p>
<p>
<!--The universality construction we've developed uses just two hidden
layers to compute an arbitrary function.  Furthermore, as we've
discussed, it's possible to get the same result with just a single
hidden layer.  Given this, you might wonder why we would ever be
interested in deep networks, i.e., networks with many hidden layers.
Can't we simply replace those networks with shallow, single hidden
layer networks?-->
普遍性定理の証明では任意の関数を計算するのに、2層の隠れ層を用いました。
さらに、前述のように1層の隠れ層でも同様の結果が得られます。
そうするとなぜ私達が深いネットワーク、つまり隠れ層をたくさん持つネットワークに興味を持っているかを疑問に思うかもしれません。
このような深いネットワークを隠れ層が1層しかない浅いネットワークに置き換えることは出来ないでしょうか？
</p>
<p><span class="marginnote"><strong>
<!--Chapter acknowledgements:-->
謝辞
</strong>
<!--Thanks to <a href ="http://jendodd.com">Jen Dodd</a> and <a
  href="http://colah.github.io/about.html">Chris Olah</a> for many
  discussions about universality in neural networks.  My thanks, in
  particular, to Chris for suggesting the use of a lookup table to
  prove universality.  The interactive visual form of the chapter is
  inspired by the work of people such as <a
  href="http://bost.ocks.org/mike/algorithms/">Mike Bostock</a>, <a
  href="http://www-cs-students.stanford.edu/&#126;amitp/">Amit
  Patel</a>, <a href="http://worrydream.com">Bret Victor</a>, and <a
  href="http://acko.net/">Steven Wittens</a>. -->
<a href ="http://jendodd.com">Jen Dodd</a>と<a href="http://colah.github.io/about.html">Chris Olah</a>にはニューラルネットワークの普遍性に関して多くの議論を行ったことを感謝します。
特にChrisには、普遍性定理の証明の中でルックアップテーブルを利用することを提案してもらいました。
本章のインタラクティブな図は
<a href="http://bost.ocks.org/mike/algorithms/">Mike Bostock</a>、
<a href="http://www-cs-students.stanford.edu/&#126;amitp/">Amit Patel</a>、
<a href="http://worrydream.com">Bret Victor</a>、
<a href="http://acko.net/">Steven Wittens</a>
などの仕事を参考にしました。
</span>
</p>
<p>
<!--While in principle that's possible, there are good practical reasons to use deep networks.
As argued in
<a href="chap1.html#toward_deep_learning">Chapter 1</a>, deep networks
have a hierarchical structure which makes them particularly well
adapted to learn the hierarchies of knowledge that seem to be useful
in solving real-world problems.  Put more concretely, when attacking
problems such as image recognition, it helps to use a system that
understands not just individual pixels, but also increasingly more
complex concepts: from edges to simple geometric shapes, all the way
up through complex, multi-object scenes. In later chapters, we'll see
evidence suggesting that deep networks do a better job than shallow
networks at learning such hierarchies of knowledge.  To sum up:
universality tells us that neural networks can compute any function;
and empirical evidence suggests that deep networks are the networks
best adapted to learn the functions useful in solving many real-world
problems.-->
原理的にはそのような浅いネットワークへの置き替えは可能ですが、
深いネットワークを利用するのにはきちんとした現実的な理由があります。
<a href="chap1.html#toward_deep_learning">第1章</a>で議論した通り、
深いネットワークは階層的な構造をしています。
このことは知識の階層構造を学習するのに適しており、現実世界の問題を解くのに有用です。
もう少し具体的に言うと、画像認識などの問題に取り組む場合、個々のピクセルを認識するだけではなく、
輪郭や単純な図形から複雑で多物体からなるシーンまで、より複雑な概念を理解するのに役立ちます。
この後の章では、深いネットワークが浅いネットワークに比べてこのような知識階層を学習するのに適していることを示唆する証拠を見ていきます。
まとめると、普遍性定理からニューラルネットワークが任意の関数を計算できることがわかりました。
そして、深いネットワークは多くの現実世界の問題を解くのに有用な関数を学習するのに適していることが経験的にわかっています。
</p>
<p><!-- Seems to be necessary to ensure the font loads --> <span style="font-family: MJX_Math; color: #fff;">.</span> <span style="font-family: MJX_Main; color: #fff;">.</span></p>
<p><script src="js/chap4.js"></script>
<script src="js/canvas.js"></script>
<script src="js/neuron.js"></script>
<script src="js/scrubbable.js"></script>
<script src="js/button.js"></script></p>
<p>
</div><div class="footer"> <span class="left_footer"> In academic work,
please cite this book as: Michael A. Nielsen, "Neural Networks and
Deep Learning", Determination Press, 2014

<br/>
<br/>

This work is licensed under a <a rel="license"
href="http://creativecommons.org/licenses/by-nc/3.0/deed.en_GB"
style="color: #eee;">Creative Commons Attribution-NonCommercial 3.0
Unported License</a>.  This means you're free to copy, share, and
build on this book, but not to sell it.  If you're interested in
commercial use, please <a
href="mailto:mn@michaelnielsen.org">contact me</a>.
</span>
<span class="right_footer">
Last update: Tue Sep  2 09:19:44 2014
<br/>
<br/>
<br/>
<a rel="license" href="http://creativecommons.org/licenses/by-nc/3.0/deed.en_GB"><img alt="Creative Commons Licence" style="border-width:0" src="http://i.creativecommons.org/l/by-nc/3.0/88x31.png" /></a>
</span>
</div>
<script>
  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');

  ga('create', 'UA-44208967-1', 'neuralnetworksanddeeplearning.com');
  ga('send', 'pageview');

</script>
</body>
</html>