Atcold · hiragaatsuya · May 17, 2021 · May 17, 2021 · Jun 1, 2021 · Jul 4, 2021
diff --git a/docs/ja/week02/02-1.md b/docs/ja/week02/02-1.md
@@ -31,7 +31,7 @@ The parametrised model (function) takes in an input, has a parameter vector and
 
 <!-- | <center><img src="{{site.baseurl}}/images/week02/02-1/Figure1.jpg" alt="Figure1" style="zoom: 33%;" /></center> |
 | <center>Figure 1: Computation Graph representation for a Parametrised Model </center>| -->
-<center><img src="{{site.baseurl}}/images/week02/02-1/Figure1.jpg" alt="Figure1" style="zoom: 33%;" /></center> |
+| <center><img src="{{site.baseurl}}/images/week02/02-1/Figure1.jpg" alt="Figure1" style="zoom: 33%;" /></center> |
 | <center>図1: パラメトリックモデルを表現した計算グラフ </center>|
 
 <!-- Examples of parametrised functions -
@@ -222,7 +222,7 @@ If we do this on a single sample, we will get a very noisy trajectory as shown i
 <!-- 
 | <center><img src="{{site.baseurl}}/images/week02/02-1/Figure2.png" alt="Figure2" style="zoom:80%;" /></center> |
 | <center>Figure 3: Stochastic Gradient Descent trajectory for per sample update </center>| -->
- <center><img src="{{site.baseurl}}/images/week02/02-1/Figure2.png" alt="Figure2" style="zoom:80%;" /></center> |
+| <center><img src="{{site.baseurl}}/images/week02/02-1/Figure2.png" alt="Figure2" style="zoom:80%;" /></center> |
 | <center>図3: サンプルごとの更新の確率的勾配降下法の軌跡 </center>|
 
 <!-- In practice, we use batches instead of doing stochastic gradient descent on a single sample. We compute the average of the gradient over a batch of samples, not a single sample, and then take one step. The only reason for doing this is that we can make more efficient use of the existing hardware  (i.e. GPUs, multicore CPUs) if we use batches since it's easier to parallelize. Batching is the simplest way to parallelize. -->
@@ -334,7 +334,7 @@ $$
 
 <!-- | <center><img src="{{site.baseurl}}/images/week02/02-1/Figure6.png" alt="Figure6" style="zoom: 25%;" /></center> |
 |        <center>Figure 7: Backpropagation through weighted sum        </center>| -->
- <center><img src="{{site.baseurl}}/images/week02/02-1/Figure6.png" alt="Figure6" style="zoom: 25%;" /></center> |
+| <center><img src="{{site.baseurl}}/images/week02/02-1/Figure6.png" alt="Figure6" style="zoom: 25%;" /></center> |
 |        <center>図7: 重み付き和を通った誤差逆伝播       </center>|
 
 
@@ -431,7 +431,7 @@ out = model(image)
 
 <!-- | <center><img src="{{site.baseurl}}/images/week02/02-1/Figure9.png" alt="Figure9" style="zoom:33%;" /></center> |
 |    <center>Figure 8: Backpropagation through a functional module     </center>| -->
- <center><img src="{{site.baseurl}}/images/week02/02-1/Figure9.png" alt="Figure9" style="zoom:33%;" /></center> |
+| <center><img src="{{site.baseurl}}/images/week02/02-1/Figure9.png" alt="Figure9" style="zoom:33%;" /></center> |
 |    <center>図8: 関数モジュールを通した誤差逆伝播     </center>|
 
 

diff --git a/docs/ja/week02/02-2.md b/docs/ja/week02/02-2.md
@@ -20,7 +20,7 @@ We next consider a concrete example of backpropagation assisted by a visual grap
 
 ### 例
 
-次に、グラフを用いて誤差逆伝播の具体例を考えます。任意の関数 $G(w)$ をコスト関数 $C$ に入力すると、グラフとして表現できます。ヤコビ行列を乗算する操作によって、このグラフを勾配の逆伝播を計算するグラフに変換することができます(PyTorch と TensorFlow は、ユーザーのために自動的にこれを行うことに注意してください。つまり、順伝播のグラフを自動的に「反転」させて、勾配を逆伝播する微分グラフを作成します)。
+次に、グラフを用いて誤差逆伝播の具体例を考えます。任意の関数 $G(w)$ をコスト関数 $C$ に入力することを、グラフとして表現できます。ヤコビ行列を乗算する操作によって、このグラフを勾配の逆伝播を計算するグラフに変換することができます(PyTorch と TensorFlow は、ユーザーのために自動的にこれを行うことに注意してください。つまり、順伝播のグラフを自動的に「反転」させて、勾配を逆伝播する微分グラフを作成します)。
 
 <center><img src="{{site.baseurl}}/images/week02/02-2/02-2-1.png" alt="Gradient diagram" style="zoom:40%;" /></center>
 

diff --git a/docs/ja/week02/02-3.md b/docs/ja/week02/02-3.md
@@ -249,7 +249,7 @@ $$
 
 <!-- What might an example configuration for the case above look like? In this case, one has input of dimension two ($n=2$), the single hidden layer could have dimensionality of 1000 ($d = 1000$), and we have 3 classes ($C=3$). There are good practical reasons to not have so many neurons in one hidden layer, so it could make sense to split that single hidden layer into 3 with 10 neurons each ($1000 \rightarrow 10 \times 10 \times 10$). -->
 
-上記の場合の設定例は、どのように見えるでしょうか？この場合、2次元の入力があり($n=2$)、1つの隠れ層の次元数は1000($d = 1000$)で、3つのクラスがあります($C=3$)。1つの隠れ層にそれほど多くのニューロンを入れたくないという実用的な理由があるので、その1つの隠れ層を10個ずつのニューロンで3つに分割するのは理にかなっているかもしれません ($1000 \rightarrow 10 \times 10 \times 10$)。
+上記の場合の設定例は、どのように見えるでしょうか？この場合、2次元の入力があり($n=2$)、1つの隠れ層の次元数は1000($d = 1000$)で、3つのクラスがあります($K=3$)。1つの隠れ層にそれほど多くのニューロンを入れたくないという実用的な理由があるので、その1つの隠れ層を10個ずつのニューロンで3つに分割するのは理にかなっているかもしれません ($1000 \rightarrow 10 \times 10 \times 10$)。
 
 
 <!-- ## [Neural network (training I)](https://www.youtube.com/watch?v=WAn6lip5oWk&t=822s) -->
@@ -309,7 +309,7 @@ $$
 
 For the case of *nearly perfect prediction* ($\sim$ means *circa*): -->
 
-インスタンスごとの損失はどうなるでしょうka
+インスタンスごとの損失はどうなるでしょうか
 ？
 
 ほぼ*完璧な予測*の場合には、（$\sim$ は *おおよそ* の意味）
@@ -417,7 +417,7 @@ As it can be seen in **Fig. 9**, when trying to separate the spiral data with li
 
 アフィン変換とは、回転、反射、平行移動、スケーリング、剪断の5つであることを覚えておいてください。
 
-**図9**にあるように、線形決定境界で螺旋データを分離しようとした場合、つまり`nn.linear()` モジュールのみを使用しそれらの間に非線形性を持たせな買った場合、最高の分類精度は50%です。
+**図9**にあるように、線形決定境界で螺旋データを分離しようとした場合、つまり`nn.linear()` モジュールのみを使用しそれらの間に非線形性を持たせなかった場合、最高の分類精度は50%です。
 
 <!-- <center>
 <img src="{{site.baseurl}}/images/week02/02-3/3-linear.png" style="zoom: 60%; background-color:#DCDCDC;" /><br>

diff --git a/docs/ja/week02/02.md b/docs/ja/week02/02.md
@@ -26,7 +26,7 @@ We give a brief introduction to supervised learning using artificial neural netw
 パラメトリックモデルとは何かを理解することから始め、損失関数とは何かを議論します。次に、伝統的なニューラルネットワークにおいて、勾配に基づく方法が誤差逆伝播アルゴリズムでどのように使用されているかを見ていきます。最後に、PyTorchでニューラルネットワークを実装する方法を学び、誤差逆伝播法のより一般的な形について議論して、このセクションを締めくくります。
 
 
-##　レクチャーパートB
+## レクチャーパートB
 
 誤差逆伝播法の具体例から始め、ヤコビ行列の次元について議論します。次に、様々な基本的なニューラルネットモジュールを見て、その勾配を計算し、softmaxとlogsoftmaxについて簡単に議論します。このパートのもう一つのトピックは、誤差逆伝播法のための実践的なコツです。
 

diff --git a/docs/ja/week03/03-1.md b/docs/ja/week03/03-1.md
@@ -61,7 +61,7 @@ This provides us with some insight into why the 2-neuron hidden layers are harde
 この結果は、2つのニューロンからなる隠れ層が訓練しにくい理由についての、いくつかの洞察を与えてくれます。この6層ネットワークは、各隠れ層ごとに1つのバイアスを持っています。したがって、これらのバイアスのうちの1つが右上の象限から点を移動させた場合、その点の値はReLU演算子によってゼロになります。値が一度ゼロになると、後の層がどのようにデータを変換しても、値はゼロのままです。ニューラルネットワークを「太く」する - 具体的には隠れ層により多くのニューロンを追加する - あるいはより多くの隠れ層を追加するか、またはその両方を行うことによって - 訓練しやすくすることができます。このコースでは、与えられた問題に対して最適なネットワークアーキテクチャを決定する方法を探っていきます。
 
 <!-- ## [Parameter transformations](https://www.youtube.com/watch?v=FW5gFiJb-ig&t=477s) -->
-## [パラーメータ変換](https://www.youtube.com/watch?v=FW5gFiJb-ig&t=477s)
+## [パラメータ変換](https://www.youtube.com/watch?v=FW5gFiJb-ig&t=477s)
 
 <!-- General parameter transformation means that our parameter vector $w$ is the output of a function. By this transformation, we can map original parameter space into another space. In Figure 5, $w$ is actually the output of $H$ with the parameter $u$. $G(x,w)$ is a network and $C(y,\bar y)$ is a cost function. The backpropagation formula is also adapted as follows, -->
 
@@ -77,7 +77,7 @@ $$
 
 <!-- These formulas are applied in a matrix form. Note that the dimensions of the terms should be consistent. The dimension of $u$,$w$,$\frac{\partial H}{\partial u}^\top$,$\frac{\partial C}{\partial w}^\top$ are $[N_u \times 1]$,$[N_w \times 1]$,$[N_u \times N_w]$,$[N_w \times 1]$, respectively. Therefore, the dimension of our backpropagation formula is consistent. -->
 
-これらの式は行列演算として適用されます。なお、各項の次元は一致している必要があります。$u$,$w$,$\frac{\partial H}{\partial u}^\top$,$\frac{\partial C}{\partial w}^\top$の次元は、それぞれ、$[N_u \times 1]$,$[N_w \times 1]$,$[N_u \times N_w]$,$[N_w \times 1]$,$[N_w \times 1]$となります。したがって、バックプロパゲーションの次元は一致していることになります。
+これらの式は行列演算として適用されます。なお、各項の次元は一致している必要があります。$u$,$w$,$\frac{\partial H}{\partial u}^\top$,$\frac{\partial C}{\partial w}^\top$の次元は、それぞれ、$[N_u \times 1]$,$[N_w \times 1]$,$[N_u \times N_w]$,$[N_w \times 1]$となります。したがって、バックプロパゲーションの次元は一致していることになります。
 
 
 <!-- <center><img src="{{site.baseurl}}/images/week03/03-1/PT.png" alt="Network" style="zoom:35%;" /><br>
@@ -98,7 +98,7 @@ $$
 
 <!-- We force shared parameters to be equal, so the gradient w.r.t. to shared parameters will be summed in the backprop. For example the gradient of the cost function $C(y, \bar y)$ with respect to $u_1$ will be the sum of the gradient of the cost function $C(y, \bar y)$ with respect to $w_1$ and the gradient of the cost function $C(y, \bar y)$ with respect to $w_2$. -->
 
-共有パラメーターが等しくなるように強制するため、共有されたパラメータの勾配はバックプロパゲーション時に足し合わされます。たとえば、$u_1$に関するコスト関数$C(y, \bar y)$の勾配は、$w_1$に関するコスト関数$C(y, \bar y)$の勾配と、$w_2$に関するコスト関数$C(y, \bar y)$の勾配との合計となります。
+共有パラメータが等しくなるように強制するため、共有されたパラメータの勾配はバックプロパゲーション時に足し合わされます。たとえば、$u_1$に関するコスト関数$C(y, \bar y)$の勾配は、$w_1$に関するコスト関数$C(y, \bar y)$の勾配と、$w_2$に関するコスト関数$C(y, \bar y)$の勾配との合計となります。
 
 
 <!-- ### Hypernetwork -->
@@ -196,7 +196,7 @@ $$y_i = \sum_j w_j x_{i+j}$$
 
 画像のような2次元の入力に対しては，2次元での畳み込みを利用します。
 
-$$y_{ij} = \sum_{kl} w_{kl} x_{i+k, j+l}$$
+$$y_{ij} = \sum_{k, l} w_{k, l} x_{i+k, j+l}$$
-$$y_{ij} = \sum_{k, l} w_{k, l} x_{i+k, j+l}$$
+$$y_{i, j} = \sum_{k, l} w_{k, l} x_{i+k, j+l}$$
-$$y_{ij} = \sum_{k, l} w_{k, l} x_{i+k, j+l}$$
+$$y_{i, j} = \sum_{k, l} w_{k, l} x_{i+k, j+l}$$
 
 <!-- This definition can easily be extended beyond two dimensions to three or four dimensions. Here $w$ is called the *convolution kernel* -->
 

diff --git a/docs/ja/week03/03-2.md b/docs/ja/week03/03-2.md
@@ -60,7 +60,7 @@ The next year, some changes were made: separate pooling was introduced. Separate
 ベル研究所に移った後、LeCunnの研究は、より大きなCNNを訓練するために、米国郵政公社の手書きの郵便番号を使用することにシフトしました。
 
 * サイズ256 (16$\times$16)の入力層。
-* サイズ12 5$\times$5のカーネル。ただし、ストライドが2 (2ピクセルで止まる)、すなわち次のレイヤーは解像度が低減。
+* 5$\times$5のカーネルを12カーネル。ただし、ストライドが2 (2ピクセルで止まる)、すなわち次のレイヤーは解像度が低減。
 * セパレートプーリング**なし**。
 
 
@@ -314,7 +314,7 @@ Visual neural scientists and computer vision people have the problem of defining
 
 ### 特徴結合問題とは？
 
-視覚神経科学者やコンピュータビジョンの研究者たちは、オブジェクトをオブジェクトとして定義する問題を抱えています。オブジェクトは特徴の集合体です。このオブジェクトを形成するために、すべての特徴を結合するにはどうすればよいのでしょうか？
+視覚神経科学者やコンピュータビジョンの研究者たちは、オブジェクトをオブジェクトとして定義する問題を抱えています。オブジェクトは特徴の集合体です。このオブジェクトを形成するために、すべての特徴を結合するにはどうすればよいのでしょうか？
 
 
 <!-- ### How to solve it?
@@ -354,8 +354,8 @@ We can build a CNN with 2 convolution layers with stride 1 and 2 pooling layers
     <b>図5</b> 様々な入力サイズのバインディングについてのConvNetアーキテクチャ
 </center>
 
-<!-- Let’s assume we add 4 units at the input layer (pink units above), so that we can get 4 more units after the first convolution layer, 2 more units after the first pooling layer, 2 more units after the second convolution layer, and 1 more output. Therefore, window size to generate a new output is 4 (2 stride $\times$2)<!--the overall subsampling we have shown from input to output is 4 (2x2)-->
-<!-- . Moreover, this is a demonstration of the fact that if we increase the size of the input, we will increase the size of every layer, proving CNNs' capability in handling dynamic length inputs. --> -->
+<!-- Let’s assume we add 4 units at the input layer (pink units above), so that we can get 4 more units after the first convolution layer, 2 more units after the first pooling layer, 2 more units after the second convolution layer, and 1 more output. Therefore, window size to generate a new output is 4 (2 stride $\times$2) the overall subsampling we have shown from input to output is 4 (2x2)-->
+<!-- . Moreover, this is a demonstration of the fact that if we increase the size of the input, we will increase the size of every layer, proving CNNs' capability in handling dynamic length inputs. -->
 ここで、入力層に4個のユニット（上のピンク色のユニット）を追加すると、第1の畳み込み層の後にさらに4個、第1のプーリング層の後にさらに2個、第2の畳み込み層の後にさらに2個、の隠れ層の出力が、そして1個の最終的な出力が得られることになります。したがって、新たな出力を生成するためのウィンドウサイズは4 (2 ストライド $\times$2)<!--入力から出力までの全体的なサブサンプリングは4 (2x2)-->となります。さらに、これは、入力のサイズを大きくすれば、すべての層のサイズも大きくなることを示しています。すなわち、CNNが可変長の入力を扱う能力を持っていることを証明しています。
 
 <!-- ## What are CNN good for

diff --git a/docs/ja/week03/03-3.md b/docs/ja/week03/03-3.md
@@ -59,7 +59,7 @@ If our data exhibits locality, each neuron needs to be connected to only a few l
 ## [不変性と等価性を生み出すために自然の信号の性質を活用する](https://www.youtube.com/watch?v=kwPWpVverkw&t=1074s)
 
 
-### 局所性 $Rightarrow$ スパース性
+### 局所性 $\Rightarrow$ スパース性
 
 図1は、5層の全結合ネットワークを示しています。各矢印は、入力に乗算される重みを表しています。見ての通り、このネットワークは非常に計算量が多いです。
 
@@ -96,7 +96,7 @@ The choice of kernel size is empirical. 3 * 3 convolution seems to be the minima
 |<b>Figure 4(a):</b> Kernels on 1D Data | <b>Figure 4(b):</b> Data with Zero Padding| -->
 
 
-### 定常性　\Rightarrow$ パラメータ共有
+### 定常性　$\Rightarrow$ パラメータ共有
 
 データが定常性を示す場合には、ネットワークアーキテクチャー全体で小さなパラメータのセットを複数回使用することができます。例えば、図3(a)のスパースネットワークでは、3つの共有パラメータ（黄色、オレンジ、赤）のセットを使用することができます。そうすると、パラメータの数は9個から3個に減ってしまいます。新しいアーキテクチャは、これらの特定の重みを訓練するためのデータがより多くあるので、より良く機能するかもしれません。
 スパース性とパラメータ共有を適用した後の重みは、畳み込みカーネルと呼ばれます。

diff --git a/docs/ja/week03/03.md b/docs/ja/week03/03.md
@@ -10,7 +10,7 @@ translator: Takashi Shinozaki
 ## レクチャー　パート　A
 
 <!-- We first see a visualization of a 6-layer neural network. Next we begin with the topic of Convolutions and Convolution Neural Networks (CNN). We review several types of parameter transformations in the context of CNNs and introduce the idea of a kernel, which is used to learn features in a hierarchical manner. Thereby allowing us to classify our input data which is the basic idea motivating the use of CNNs. -->
-まず最初に6層のニューラルネットワークの可視化の結果について概観し、その後、畳み込みと畳み込みニューラルネットワーク(CNN)について見てゆきます。CNNを理解するために必要ないくつかのタイプのパラメータ変換について解説し、階層的な方法で特徴を学習するためにカーネルを用いる手法ついて紹介します。これによって様々な入力データを分類することができるようになりますが、これがCNNを用いる動機となる基本的な考え方となります。
+まず最初に6層のニューラルネットワークの可視化の結果について概観し、その後、畳み込みと畳み込みニューラルネットワーク(CNN)について見ていきます。CNNを理解するために必要ないくつかのタイプのパラメータ変換について解説し、階層的な方法で特徴を学習するためにカーネルを用いる手法ついて紹介します。これによって様々な入力データを分類することができるようになりますが、これがCNNを用いる動機となる基本的な考え方となります。
 
 
 <!-- ## Lecture part B -->
-Original file line number
+Diff line change
@@ Expand Up @@
     ### 例
-    次に、グラフを用いて誤差逆伝播の具体例を考えます。任意の関数 $G(w)$ をコスト関数 $C$ に入力すると、グラフとして表現できます。ヤコビ行列を乗算する操作によって、このグラフを勾配の逆伝播を計算するグラフに変換することができます(PyTorch と TensorFlow は、ユーザーのために自動的にこれを行うことに注意してください。つまり、順伝播のグラフを自動的に「反転」させて、勾配を逆伝播する微分グラフを作成します)。
+    次に、グラフを用いて誤差逆伝播の具体例を考えます。任意の関数 $G(w)$ をコスト関数 $C$ に入力することを、グラフとして表現できます。ヤコビ行列を乗算する操作によって、このグラフを勾配の逆伝播を計算するグラフに変換することができます(PyTorch と TensorFlow は、ユーザーのために自動的にこれを行うことに注意してください。つまり、順伝播のグラフを自動的に「反転」させて、勾配を逆伝播する微分グラフを作成します)。
     <center><img src="{{site.baseurl}}/images/week02/02-2/02-2-1.png" alt="Gradient diagram" style="zoom:40%;" /></center>
@@ Expand Down @@