1
1
# -*- coding: utf-8 -*-
2
2
r"""
3
- Sequence Models and Long Short-Term Memory Networks
3
+ ์ํ์ค ๋ชจ๋ธ๊ณผ LSTM ๋คํธ์ํฌ
4
4
===================================================
5
+ **๋ฒ์ญ**: `๋ฐ์๋ฏผ <https://github.com/convin305>`_
5
6
6
- At this point, we have seen various feed-forward networks. That is,
7
- there is no state maintained by the network at all. This might not be
8
- the behavior we want. Sequence models are central to NLP: they are
9
- models where there is some sort of dependence through time between your
10
- inputs. The classical example of a sequence model is the Hidden Markov
11
- Model for part-of-speech tagging. Another example is the conditional
12
- random field.
13
-
14
- A recurrent neural network is a network that maintains some kind of
15
- state. For example, its output could be used as part of the next input,
16
- so that information can propagate along as the network passes over the
17
- sequence. In the case of an LSTM, for each element in the sequence,
18
- there is a corresponding *hidden state* :math:`h_t`, which in principle
19
- can contain information from arbitrary points earlier in the sequence.
20
- We can use the hidden state to predict words in a language model,
21
- part-of-speech tags, and a myriad of other things.
22
-
23
-
24
- LSTMs in Pytorch
7
+ ์ง๊ธ๊น์ง ์ฐ๋ฆฌ๋ ๋ค์ํ ์์ ํ(feed-forward) ์ ๊ฒฝ๋ง๋ค์ ๋ณด์ ์์ต๋๋ค.
8
+ ์ฆ, ๋คํธ์ํฌ์ ์ํด ์ ์ง๋๋ ์ํ๊ฐ ์ ํ ์๋ค๋ ๊ฒ์
๋๋ค.
9
+ ์ด๊ฒ์ ์๋ง ์ฐ๋ฆฌ๊ฐ ์ํ๋ ๋์์ด ์๋ ์๋ ์์ต๋๋ค.
10
+ ์ํ์ค ๋ชจ๋ธ์ NLP์ ํต์ฌ์
๋๋ค. ์ด๋ ์
๋ ฅ ๊ฐ์ ์ผ์ข
์ ์๊ฐ์ ์ข
์์ฑ์ด ์กด์ฌํ๋ ๋ชจ๋ธ์ ๋งํฉ๋๋ค.
11
+ ์ํ์ค ๋ชจ๋ธ์ ๊ณ ์ ์ ์ธ ์๋ ํ์ฌ ํ๊น
์ ์ํ ํ๋ ๋ง๋ฅด์ฝํ ๋ชจ๋ธ์
๋๋ค.
12
+ ๋ ๋ค๋ฅธ ์๋ ์กฐ๊ฑด๋ถ ๋๋ค ํ๋์
๋๋ค.
13
+
14
+ ์ํ ์ ๊ฒฝ๋ง์ ์ผ์ข
์ ์ํ๋ฅผ ์ ์งํ๋ ๋คํธ์ํฌ์
๋๋ค.
15
+ ์๋ฅผ ๋ค๋ฉด, ์ถ๋ ฅ์ ๋ค์ ์
๋ ฅ์ ์ผ๋ถ๋ก ์ฌ์ฉ๋ ์ ์์ต๋๋ค.
16
+ ์ ๋ณด๋ ๋คํธ์ํฌ๊ฐ ์ํ์ค๋ฅผ ํต๊ณผํ ๋ ์ ํ๋ ์ ์์ต๋๋ค.
17
+ LSTM์ ๊ฒฝ์ฐ์, ์ํ์ค์ ๊ฐ ์์์ ๋์ํ๋ *์๋ ์ํ(hidden state)* :math:`h_t` ๊ฐ ์กด์ฌํ๋ฉฐ,
18
+ ์ด๋ ์์น์ ์ผ๋ก ์ํ์ค์ ์๋ถ๋ถ์ ์๋ ์์ ํฌ์ธํธ์ ์ ๋ณด๋ฅผ ํฌํจํ ์ ์์ต๋๋ค.
19
+ ์ฐ๋ฆฌ๋ ์๋ ์ํ๋ฅผ ์ด์ฉํ์ฌ ์ธ์ด ๋ชจ๋ธ์์์ ๋จ์ด,
20
+ ํ์ฌ ํ๊ทธ ๋ฑ ๋ฌด์ํ ๋ง์ ๊ฒ๋ค์ ์์ธกํ ์ ์์ต๋๋ค.
21
+
22
+
23
+ Pytorch์์์ LSTM
25
24
~~~~~~~~~~~~~~~~~
26
25
27
- Before getting to the example, note a few things. Pytorch's LSTM expects
28
- all of its inputs to be 3D tensors. The semantics of the axes of these
29
- tensors is important. The first axis is the sequence itself, the second
30
- indexes instances in the mini-batch, and the third indexes elements of
31
- the input. We haven't discussed mini-batching, so let's just ignore that
32
- and assume we will always have just 1 dimension on the second axis. If
33
- we want to run the sequence model over the sentence "The cow jumped",
34
- our input should look like
26
+ ์์ ๋ฅผ ์์ํ๊ธฐ ์ ์, ๋ช ๊ฐ์ง ์ฌํญ์ ์ ์ํ์ธ์.
27
+ Pytorch์์์ LSTM์ ๋ชจ๋ ์
๋ ฅ์ด 3D Tensor ์ผ ๊ฒ์ผ๋ก ์์ํฉ๋๋ค.
28
+ ์ด๋ฌํ ํ
์ ์ถ์ ์๋ฏธ๋ ์ค์ํฉ๋๋ค.
29
+ ์ฒซ ๋ฒ์งธ ์ถ์ ์ํ์ค ์์ฒด์ด๊ณ , ๋ ๋ฒ์งธ ์ถ์ ๋ฏธ๋ ๋ฐฐ์น์ ์ธ์คํด์ค๋ฅผ ์ธ๋ฑ์ฑํ๋ฉฐ,
30
+ ์ธ ๋ฒ์งธ ์ถ์ ์
๋ ฅ ์์๋ฅผ ์ธ๋ฑ์ฑํฉ๋๋ค.
31
+ ๋ฏธ๋ ๋ฐฐ์น์ ๋ํด์๋ ๋
ผ์ํ์ง ์์์ผ๋ฏ๋ก ์ด๋ฅผ ๋ฌด์ํ๊ณ ,
32
+ ๋ ๋ฒ์งธ ์ถ์ ๋ํด์๋ ํญ์ 1์ฐจ์๋ง ๊ฐ์ง ๊ฒ์ด๋ผ๊ณ ๊ฐ์ ํ๊ฒ ์ต๋๋ค.
33
+ ๋ง์ฝ ์ฐ๋ฆฌ๊ฐ "The cow jumped."๋ผ๋ ๋ฌธ์ฅ์ ๋ํด ์ํ์ค ๋ชจ๋ธ์ ์คํํ๋ ค๋ฉด,
34
+ ์
๋ ฅ์ ๋ค์๊ณผ ๊ฐ์์ผ ํฉ๋๋ค.
35
35
36
36
.. math::
37
37
42
42
q_\text{jumped}
43
43
\end{bmatrix}
44
44
45
- Except remember there is an additional 2nd dimension with size 1.
45
+ ๋ค๋ง, ์ฌ์ด์ฆ๊ฐ 1์ธ ์ถ๊ฐ์ ์ธ 2์ฐจ์์ด ์๋ค๋ ๊ฒ์ ๊ธฐ์ตํด์ผ ํฉ๋๋ค.
46
46
47
- In addition, you could go through the sequence one at a time, in which
48
- case the 1st axis will have size 1 also.
47
+ ๋ํ ํ ๋ฒ์ ํ๋์ฉ ์ํ์ค๋ฅผ ์งํํ ์ ์์ผ๋ฉฐ,
48
+ ์ด ๊ฒฝ์ฐ ์ฒซ ๋ฒ์งธ ์ถ๋ ์ฌ์ด์ฆ๊ฐ 1์ด ๋ฉ๋๋ค.
49
49
50
- Let's see a quick example.
50
+ ๊ฐ๋จํ ์๋ฅผ ์ดํด๋ณด๊ฒ ์ต๋๋ค.
51
51
"""
52
52
53
53
# Author: Robert Guthrie
61
61
62
62
######################################################################
63
63
64
- lstm = nn .LSTM (3 , 3 ) # Input dim is 3, output dim is 3
65
- inputs = [torch .randn (1 , 3 ) for _ in range (5 )] # make a sequence of length 5
64
+ lstm = nn .LSTM (3 , 3 ) # ์
๋ ฅ 3์ฐจ์, ์ถ๋ ฅ 3์ฐจ์
65
+ inputs = [torch .randn (1 , 3 ) for _ in range (5 )] # ๊ธธ์ด๊ฐ 5์ธ ์ํ์ค๋ฅผ ๋ง๋ญ๋๋ค
66
66
67
- # initialize the hidden state .
67
+ # ์๋ ์ํ๋ฅผ ์ด๊ธฐํํฉ๋๋ค .
68
68
hidden = (torch .randn (1 , 1 , 3 ),
69
69
torch .randn (1 , 1 , 3 ))
70
70
for i in inputs :
71
- # Step through the sequence one element at a time .
72
- # after each step, hidden contains the hidden state .
71
+ # ํ ๋ฒ์ ํ ์์์ฉ ์ํ์ค๋ฅผ ํต๊ณผํฉ๋๋ค .
72
+ # ๊ฐ ๋จ๊ณ๊ฐ ๋๋๋ฉด, hidden์๋ ์๋ ์ํ๊ฐ ํฌํจ๋ฉ๋๋ค .
73
73
out , hidden = lstm (i .view (1 , 1 , - 1 ), hidden )
74
74
75
- # alternatively, we can do the entire sequence all at once.
76
- # the first value returned by LSTM is all of the hidden states throughout
77
- # the sequence. the second is just the most recent hidden state
78
- # (compare the last slice of "out" with "hidden" below, they are the same )
79
- # The reason for this is that :
80
- # "out" will give you access to all hidden states in the sequence
81
- # "hidden" will allow you to continue the sequence and backpropagate,
82
- # by passing it as an argument to the lstm at a later time
83
- # Add the extra 2nd dimension
75
+ # ์๋๋ฉด ์ฐ๋ฆฌ๋ ์ ์ฒด ์ํ์ค๋ฅผ ํ ๋ฒ์ ์ํํ ์๋ ์์ต๋๋ค.
76
+ # LSTM์ ์ํด ๋ฐํ๋ ์ฒซ ๋ฒ์งธ ๊ฐ์ ์ํ์ค ์ ์ฒด์ ๋ํ ์๋ ์ํ์
๋๋ค.
77
+ # ๋ ๋ฒ์งธ๋ ๊ฐ์ฅ ์ต๊ทผ์ ์๋ ์ํ์
๋๋ค.
78
+ # (์๋์ "hidden"๊ณผ "out"์ ๋ง์ง๋ง ์ฌ๋ผ์ด์ค(slice)๋ฅผ ๋น๊ตํด ๋ณด๋ฉด ๋์ ๋์ผํฉ๋๋ค. )
79
+ # ์ด๋ ๊ฒ ํ๋ ์ด์ ๋ ๋ค์๊ณผ ๊ฐ์ต๋๋ค :
80
+ # "out"์ ์ํ์ค์ ๋ชจ๋ ์๋ ์ํ์ ๋ํ ์ก์ธ์ค๋ฅผ ์ ๊ณตํ๊ณ ,
81
+ # "hidden"์ ๋์ค์ lstm์ ์ธ์ ํํ๋ก ์ ๋ฌํ์ฌ
82
+ # ์ํ์ค๋ฅผ ๊ณ์ํ๊ณ , ์ญ์ ํ ํ๋๋ก ํฉ๋๋ค.
83
+ # ์ถ๊ฐ๋ก ๋ ๋ฒ์งธ ์ฐจ์์ ๋ํฉ๋๋ค.
84
84
inputs = torch .cat (inputs ).view (len (inputs ), 1 , - 1 )
85
- hidden = (torch .randn (1 , 1 , 3 ), torch .randn (1 , 1 , 3 )) # clean out hidden state
85
+ hidden = (torch .randn (1 , 1 , 3 ), torch .randn (1 , 1 , 3 )) # ์๋ ์ํ๋ฅผ ์ง์๋๋ค.
86
86
out , hidden = lstm (inputs , hidden )
87
87
print (out )
88
88
print (hidden )
89
89
90
90
91
91
######################################################################
92
- # Example: An LSTM for Part-of-Speech Tagging
92
+ # ์์: ํ์ฌ ํ๊น
์ ์ํ LSTM
93
93
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
94
94
#
95
- # In this section, we will use an LSTM to get part of speech tags. We will
96
- # not use Viterbi or Forward-Backward or anything like that, but as a
97
- # (challenging) exercise to the reader, think about how Viterbi could be
98
- # used after you have seen what is going on. In this example, we also refer
99
- # to embeddings. If you are unfamiliar with embeddings, you can read up
100
- # about them `here <https://tutorials.pytorch.kr/beginner/nlp/word_embeddings_tutorial.html>`__.
95
+ # ์ด ์น์
์์๋ ์ฐ๋ฆฌ๋ ํ์ฌ ํ๊ทธ๋ฅผ ์ป๊ธฐ ์ํด LSTM์ ์ด์ฉํ ๊ฒ์
๋๋ค.
96
+ # ๋นํฐ๋น(Viterbi)๋ ์๋ฐฉํฅ-์ญ๋ฐฉํฅ(Forward-Backward) ๊ฐ์ ๊ฒ๋ค์ ์ฌ์ฉํ์ง ์์ ๊ฒ์
๋๋ค.
97
+ # ๊ทธ๋ฌ๋ (๋์ ์ ์ธ) ์ฐ์ต์ผ๋ก, ์ด๋ป๊ฒ ๋์๊ฐ๋์ง๋ฅผ ํ์ธํ ๋ค์
98
+ # ๋นํฐ๋น๋ฅผ ์ด๋ป๊ฒ ์ฌ์ฉํ ์ ์๋์ง์ ๋ํด์ ์๊ฐํด ๋ณด์๊ธฐ ๋ฐ๋๋๋ค.
99
+ # ์ด ์์์์๋ ์๋ฒ ๋ฉ๋ ์ฐธ์กฐํฉ๋๋ค. ๋ง์ฝ์ ์๋ฒ ๋ฉ์ ์ต์ํ์ง ์๋ค๋ฉด,
100
+ # `์ฌ๊ธฐ <https://tutorials.pytorch.kr/beginner/nlp/word_embeddings_tutorial.html>`__.
101
+ # ์์ ๊ด๋ จ ๋ด์ฉ์ ์ฝ์ ์ ์์ต๋๋ค.
101
102
#
102
- # The model is as follows: let our input sentence be
103
- # :math:`w_1, \dots, w_M`, where :math:`w_i \in V`, our vocab. Also, let
104
- # :math:`T` be our tag set, and :math:`y_i` the tag of word :math:`w_i`.
105
- # Denote our prediction of the tag of word :math:`w_i` by
106
- # :math:`\hat{y}_i`.
103
+ # ๋ชจ๋ธ์ ๋ค์๊ณผ ๊ฐ์ต๋๋ค. ๋จ์ด๊ฐ :math:`w_i \in V` ์ผ ๋,
104
+ # ์
๋ ฅ ๋ฌธ์ฅ์ :math:`w_1, \dots, w_M` ๋ผ๊ณ ํฉ์๋ค. ๋ํ,
105
+ # :math:`T` ๋ฅผ ์ฐ๋ฆฌ์ ํ๊ทธ ์งํฉ๋ผ๊ณ ํ๊ณ , :math:`w_i` ์ ๋จ์ด ํ๊ทธ๋ฅผ :math:`y_i` ๋ผ๊ณ ํฉ๋๋ค.
106
+ # ๋จ์ด :math:`w_i` ์ ๋ํ ์์ธก๋ ํ๊ทธ๋ฅผ :math:`\hat{y}_i` ๋ก ํ์ํฉ๋๋ค.
107
+ #
107
108
#
108
- # This is a structure prediction, model, where our output is a sequence
109
- # :math:`\hat{y}_1, \dots, \hat{y}_M`, where :math:`\hat{y}_i \in T`.
109
+ # ์ด๊ฒ์ :math:`\hat{y}_i \in T` ์ผ ๋, ์ถ๋ ฅ์ด :math:`\hat{y}_1, \dots, \hat{y}_M` ์ํ์ค์ธ
110
+ # ๊ตฌ์กฐ ์์ธก ๋ชจ๋ธ์
๋๋ค.
110
111
#
111
- # To do the prediction, pass an LSTM over the sentence. Denote the hidden
112
- # state at timestep :math:`i` as :math:`h_i`. Also, assign each tag a
113
- # unique index (like how we had word\_to\_ix in the word embeddings
114
- # section). Then our prediction rule for :math:`\hat{y}_i` is
112
+ # ์์ธก์ ํ๊ธฐ ์ํด, LSTM์ ๋ฌธ์ฅ์ ์ ๋ฌํฉ๋๋ค. ํ ์๊ฐ ๋จ๊ณ
113
+ # :math:`i` ์ ์๋ ์ํ๋ :math:`h_i` ๋ก ํ์ํฉ๋๋ค. ๋ํ ๊ฐ ํ๊ทธ์
114
+ # ๊ณ ์ ํ ์ธ๋ฑ์ค๋ฅผ ํ ๋นํฉ๋๋ค (๋จ์ด ์๋ฒ ๋ฉ ์น์
์์ word\_to\_ix ๋ฅผ ์ฌ์ฉํ ๊ฒ๊ณผ ์ ์ฌํฉ๋๋ค.)
115
+ # ๊ทธ๋ฌ๋ฉด :math:`\hat{y}_i` ์์ธก ๊ท์น์ ๋ค์๊ณผ ๊ฐ์ต๋๋ค.
115
116
#
116
117
# .. math:: \hat{y}_i = \text{argmax}_j \ (\log \text{Softmax}(Ah_i + b))_j
117
118
#
118
- # That is, take the log softmax of the affine map of the hidden state ,
119
- # and the predicted tag is the tag that has the maximum value in this
120
- # vector. Note this implies immediately that the dimensionality of the
121
- # target space of :math:`A` is :math:`|T|` .
119
+ # ์ฆ, ์๋ ์ํ์ ์ํ ๋งต( affine map)์ ๋ํด ๋ก๊ทธ ์ํํธ๋งฅ์ค(log softmax)๋ฅผ ์ทจํ๊ณ ,
120
+ # ์์ธก๋ ํ๊ทธ๋ ์ด ๋ฒกํฐ์์ ๊ฐ์ฅ ํฐ ๊ฐ์ ๊ฐ์ง๋ ํ๊ทธ๊ฐ ๋ฉ๋๋ค.
121
+ # ์ด๊ฒ์ ๊ณง :math:`A` ์ ํ๊น ๊ณต๊ฐ์ ์ฐจ์์ด :math:`|T|` ๋ผ๋ ๊ฒ์
122
+ # ์๋ฏธํ๋ค๋ ๊ฒ์ ์์๋์ธ์ .
122
123
#
123
124
#
124
- # Prepare data :
125
+ # ๋ฐ์ดํฐ ์ค๋น :
125
126
126
127
def prepare_sequence (seq , to_ix ):
127
128
idxs = [to_ix [w ] for w in seq ]
128
129
return torch .tensor (idxs , dtype = torch .long )
129
130
130
131
131
132
training_data = [
132
- # Tags are : DET - determiner; NN - noun; V - verb
133
- # For example, the word "The" is a determiner
133
+ # ํ๊ทธ๋ ๋ค์๊ณผ ๊ฐ์ต๋๋ค : DET - ํ์ ์ฌ; NN - ๋ช
์ฌ; V - ๋์ฌ
134
+ # ์๋ฅผ ๋ค์ด, "The" ๋ผ๋ ๋จ์ด๋ ํ์ ์ฌ์
๋๋ค.
134
135
("The dog ate the apple" .split (), ["DET" , "NN" , "V" , "DET" , "NN" ]),
135
136
("Everybody read that book" .split (), ["NN" , "V" , "DET" , "NN" ])
136
137
]
137
138
word_to_ix = {}
138
- # For each words-list (sentence) and tags-list in each tuple of training_data
139
+ # training_data์ ๊ฐ ํํ์ ์๋ ๊ฐ ๋จ์ด ๋ชฉ๋ก(๋ฌธ์ฅ) ๋ฐ ํ๊ทธ ๋ชฉ๋ก์ ๋ํด
139
140
for sent , tags in training_data :
140
141
for word in sent :
141
- if word not in word_to_ix : # word has not been assigned an index yet
142
- word_to_ix [word ] = len (word_to_ix ) # Assign each word with a unique index
142
+ if word not in word_to_ix : # word๋ ์์ง ๋ฒํธ๊ฐ ํ ๋น๋์ง ์์์ต๋๋ค
143
+ word_to_ix [word ] = len (word_to_ix ) # ๊ฐ ๋จ์ด์ ๊ณ ์ ํ ๋ฒํธ ํ ๋น
143
144
print (word_to_ix )
144
- tag_to_ix = {"DET" : 0 , "NN" : 1 , "V" : 2 } # Assign each tag with a unique index
145
+ tag_to_ix = {"DET" : 0 , "NN" : 1 , "V" : 2 } # ๊ฐ ํ๊ทธ์ ๊ณ ์ ํ ๋ฒํธ ํ ๋น
145
146
146
- # These will usually be more like 32 or 64 dimensional.
147
- # We will keep them small, so we can see how the weights change as we train.
147
+ # ์ด๊ฒ๋ค์ ์ผ๋ฐ์ ์ผ๋ก 32๋ 64์ฐจ์์ ๊ฐ๊น์ต๋๋ค.
148
+ # ํ๋ จํ ๋ ๊ฐ์ค์น๊ฐ ์ด๋ป๊ฒ ๋ณํ๋์ง ํ์ธํ ์ ์๋๋ก, ์๊ฒ ์ ์งํ๊ฒ ์ต๋๋ค.
148
149
EMBEDDING_DIM = 6
149
150
HIDDEN_DIM = 6
150
151
151
152
######################################################################
152
- # Create the model :
153
+ # ๋ชจ๋ธ ์์ฑ :
153
154
154
155
155
156
class LSTMTagger (nn .Module ):
@@ -160,11 +161,11 @@ def __init__(self, embedding_dim, hidden_dim, vocab_size, tagset_size):
160
161
161
162
self .word_embeddings = nn .Embedding (vocab_size , embedding_dim )
162
163
163
- # The LSTM takes word embeddings as inputs, and outputs hidden states
164
- # with dimensionality hidden_dim.
164
+ # LSTM์ ๋จ์ด ์๋ฒ ๋ฉ์ ์
๋ ฅ์ผ๋ก ๋ฐ๊ณ ,
165
+ # ์ฐจ์์ด hidden_dim์ธ ์๋ ์ํ๋ฅผ ์ถ๋ ฅํฉ๋๋ค.
165
166
self .lstm = nn .LSTM (embedding_dim , hidden_dim )
166
167
167
- # The linear layer that maps from hidden state space to tag space
168
+ # ์๋ ์ํ ๊ณต๊ฐ์์ ํ๊ทธ ๊ณต๊ฐ์ผ๋ก ๋งคํํ๋ ์ ํ ๋ ์ด์ด
168
169
self .hidden2tag = nn .Linear (hidden_dim , tagset_size )
169
170
170
171
def forward (self , sentence ):
@@ -175,79 +176,79 @@ def forward(self, sentence):
175
176
return tag_scores
176
177
177
178
######################################################################
178
- # Train the model :
179
+ # ๋ชจ๋ธ ํ์ต :
179
180
180
181
181
182
model = LSTMTagger (EMBEDDING_DIM , HIDDEN_DIM , len (word_to_ix ), len (tag_to_ix ))
182
183
loss_function = nn .NLLLoss ()
183
184
optimizer = optim .SGD (model .parameters (), lr = 0.1 )
184
185
185
- # See what the scores are before training
186
- # Note that element i,j of the output is the score for tag j for word i .
187
- # Here we don't need to train, so the code is wrapped in torch.no_grad()
186
+ # ํ๋ จ ์ ์ ์ ์๋ฅผ ํ์ธํ์ธ์.
187
+ # ์ถ๋ ฅ์ i,j์์๋ ๋จ์ด i์ ๋ํ ํ๊ทธ j์ ์ ์์
๋๋ค .
188
+ # ์ฌ๊ธฐ์๋ ํ๋ จ์ ํ ํ์๊ฐ ์์ผ๋ฏ๋ก, ์ฝ๋๋ torch.no_grad()๋ก ๋ํ ๋์ด ์์ต๋๋ค.
188
189
with torch .no_grad ():
189
190
inputs = prepare_sequence (training_data [0 ][0 ], word_to_ix )
190
191
tag_scores = model (inputs )
191
192
print (tag_scores )
192
193
193
- for epoch in range (300 ): # again, normally you would NOT do 300 epochs, it is toy data
194
+ for epoch in range (300 ): # ๋ค์ ๋งํ์ง๋ง, ์ผ๋ฐ์ ์ผ๋ก 300์ํญ์ ์ํํ์ง๋ ์์ต๋๋ค. ์ด๊ฑด ์ฅ๋๊ฐ ๋ฐ์ดํฐ์ด๊ธฐ ๋๋ฌธ์
๋๋ค.
194
195
for sentence , tags in training_data :
195
- # Step 1. Remember that Pytorch accumulates gradients.
196
- # We need to clear them out before each instance
196
+ # 1๋จ๊ณ, Pytorch๋ ๋ณํ๋๋ฅผ ์ถ์ ํ๋ค๋ ๊ฒ์ ๊ธฐ์ตํ์ธ์.
197
+ # ๊ฐ ์ธ์คํด์ค ์ ์ ์ด๋ฅผ ์ง์์ค์ผ ํฉ๋๋ค.
197
198
model .zero_grad ()
198
199
199
- # Step 2. Get our inputs ready for the network, that is, turn them into
200
- # Tensors of word indices.
200
+ # 2๋จ๊ณ, ๋คํธ์ํฌ์ ๋ง๊ฒ ์
๋ ฅ์ ์ค๋น์ํต๋๋ค.
201
+ # ์ฆ, ์
๋ ฅ๋ค์ ๋จ์ด ์ธ๋ฑ์ค๋ค์ ํ
์๋ก ๋ณํํฉ๋๋ค.
201
202
sentence_in = prepare_sequence (sentence , word_to_ix )
202
203
targets = prepare_sequence (tags , tag_to_ix )
203
204
204
- # Step 3. Run our forward pass.
205
+ # 3๋จ๊ณ, ์์ ํ ๋จ๊ณ( forward pass)๋ฅผ ์คํํฉ๋๋ค .
205
206
tag_scores = model (sentence_in )
206
207
207
- # Step 4. Compute the loss, gradients, and update the parameters by
208
- # calling optimizer.step()
208
+ # 4๋จ๊ณ, ์์ค๊ณผ ๊ธฐ์ธ๊ธฐ๋ฅผ ๊ณ์ฐํ๊ณ , optimizer.step()์ ํธ์ถํ์ฌ
209
+ # ๋งค๊ฐ๋ณ์๋ฅผ ์
๋ฐ์ดํธํฉ๋๋ค.
209
210
loss = loss_function (tag_scores , targets )
210
211
loss .backward ()
211
212
optimizer .step ()
212
213
213
- # See what the scores are after training
214
+ # ํ๋ จ ํ์ ์ ์๋ฅผ ํ์ธํด ๋ณด์ธ์.
214
215
with torch .no_grad ():
215
216
inputs = prepare_sequence (training_data [0 ][0 ], word_to_ix )
216
217
tag_scores = model (inputs )
217
218
218
- # The sentence is "the dog ate the apple". i,j corresponds to score for tag j
219
- # for word i. The predicted tag is the maximum scoring tag .
220
- # Here, we can see the predicted sequence below is 0 1 2 0 1
221
- # since 0 is index of the maximum value of row 1,
222
- # 1 is the index of maximum value of row 2, etc .
223
- # Which is DET NOUN VERB DET NOUN, the correct sequence !
219
+ # ๋ฌธ์ฅ์ "the dog ate the apple"์
๋๋ค. i์ j๋ ๋จ์ด i์ ๋ํ ํ๊ทธ j์ ์ ์๋ฅผ ์๋ฏธํฉ๋๋ค.
220
+ # ์์ธก๋ ํ๊ทธ๋ ๊ฐ์ฅ ์ ์๊ฐ ๋์ ํ๊ทธ์
๋๋ค .
221
+ # ์, ์๋์ ์์ธก๋ ์์๊ฐ 0 1 2 0 1์ด๋ผ๋ ๊ฒ์ ํ์ธํ ์ ์์ต๋๋ค.
222
+ # 0์ 1ํ์ ๋ํ ์ต๋๊ฐ์ด๋ฏ๋ก,
223
+ # 1์ 2ํ์ ๋ํ ์ต๋๊ฐ์ด ๋๋ ์์
๋๋ค .
224
+ # DET NOUN VERB DET NOUN์ ์ฌ๋ฐ๋ฅธ ์์์
๋๋ค !
224
225
print (tag_scores )
225
226
226
227
227
228
######################################################################
228
- # Exercise: Augmenting the LSTM part-of-speech tagger with character-level features
229
+ # ์ฐ์ต : ๋ฌธ์-๋จ์ ํน์ง๊ณผ LSTM ํ์ฌ ํ๊ฑฐ ์ฆ๊ฐ
229
230
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
230
231
#
231
- # In the example above, each word had an embedding, which served as the
232
- # inputs to our sequence model. Let's augment the word embeddings with a
233
- # representation derived from the characters of the word. We expect that
234
- # this should help significantly, since character-level information like
235
- # affixes have a large bearing on part-of-speech. For example, words with
236
- # the affix *-ly* are almost always tagged as adverbs in English .
232
+ # ์์ ์์ ์์, ๊ฐ ๋จ์ด๋ ์ํ์ค ๋ชจ๋ธ์ ์
๋ ฅ ์ญํ ์ ํ๋ ์๋ฒ ๋ฉ์ ๊ฐ์ง๋๋ค.
233
+ # ๋จ์ด์ ๋ฌธ์์์ ํ์๋ ํํ์ผ๋ก ๋จ์ด ์๋ฒ ๋ฉ์ ์ฆ๊ฐ์์ผ๋ณด๊ฒ ์ต๋๋ค.
234
+ # ์ ์ฌ(affixes)์ ๊ฐ์ ๋ฌธ์ ์์ค์ ์ ๋ณด๋ ํ์ฌ์ ํฐ ์ํฅ์ ๋ฏธ์น๊ธฐ ๋๋ฌธ์,
235
+ # ์๋นํ ๋์์ด ๋ ๊ฒ์ผ๋ก ์์ํฉ๋๋ค.
236
+ # ์๋ฅผ ๋ค์ด, ์ ์ฌ *-ly* ๊ฐ ์๋ ๋จ์ด๋
237
+ # ์์ด์์ ๊ฑฐ์ ํญ์ ๋ถ์ฌ๋ก ํ๊ทธ๊ฐ ์ง์ ๋ฉ๋๋ค .
237
238
#
238
- # To do this, let :math:`c_w` be the character-level representation of
239
- # word :math:`w`. Let :math:`x_w` be the word embedding as before. Then
240
- # the input to our sequence model is the concatenation of :math:`x_w` and
241
- # :math:`c_w`. So if :math:`x_w` has dimension 5, and :math:`c_w`
242
- # dimension 3, then our LSTM should accept an input of dimension 8.
239
+ # ์ด๊ฒ์ ํ๊ธฐ ์ํด์, :math:`c_w` ๋ฅผ ๋จ์ด :math:`w` ์ C๋ฅผ ๋จ์ด w์ ๋ฌธ์ ์์ค ํํ์ด๋ผ๊ณ ํ๊ณ ,
240
+ # ์ ๊ณผ ๊ฐ์ด :math:`x_w` ๋ฅผ ๋จ์ด์๋ฒ ๋ฉ์ด๋ผ๊ณ ํฉ์๋ค.
241
+ # ๊ทธ๋ ๋ค๋ฉด ์ฐ๋ฆฌ์ ์ํ์ค ๋ชจ๋ธ์ ๋ํ ์
๋ ฅ์ :math:`x_w` ์
242
+ # :math:`c_w` ์ ์ฐ๊ฒฐ์ด๋ผ๊ณ ํ ์ ์์ต๋๋ค. ๋ง์ฝ์ :math:`x_w` ๊ฐ ์ฐจ์ 5๋ฅผ ๊ฐ์ง๊ณ , :math:`c_w`
243
+ # ์ฐจ์ 3์ ๊ฐ์ง๋ฉด LSTM์ ์ฐจ์ 8์ ์
๋ ฅ์ ๋ฐ์๋ค์ฌ์ผ ํฉ๋๋ค.
243
244
#
244
- # To get the character level representation, do an LSTM over the
245
- # characters of a word, and let :math:`c_w` be the final hidden state of
246
- # this LSTM. Hints :
245
+ # ๋ฌธ์ ์์ค์ ํํ์ ์ป๊ธฐ ์ํด์, ๋จ์ด์ ๋ฌธ์์ ๋ํด์ LSTM์ ์ํํ๊ณ
246
+ # :math:`c_w` ๋ฅผ LSTM์ ์ต์ข
์๋ ์ํ๊ฐ ๋๋๋ก ํฉ๋๋ค.
247
+ # ํํธ :
247
248
#
248
- # * There are going to be two LSTM's in your new model.
249
- # The original one that outputs POS tag scores, and the new one that
250
- # outputs a character-level representation of each word.
251
- # * To do a sequence model over characters, you will have to embed characters.
252
- # The character embeddings will be the input to the character LSTM .
249
+ # * ์ ๋ชจ๋ธ์๋ ๋ ๊ฐ์ LSTM์ด ์์ ๊ฒ์
๋๋ค.
250
+ # POS ํ๊ทธ ์ ์๋ฅผ ์ถ๋ ฅํ๋ ์๋์ LSTM๊ณผ
251
+ # ๊ฐ ๋จ์ด์ ๋ฌธ์ ์์ค ํํ์ ์ถ๋ ฅํ๋ ์๋ก์ด LSTM์
๋๋ค.
252
+ # * ๋ฌธ์์ ๋ํด ์ํ์ค ๋ชจ๋ธ์ ์ํํ๋ ค๋ฉด, ๋ฌธ์๋ฅผ ์๋ฒ ๋ฉํด์ผ ํฉ๋๋ค.
253
+ # ๋ฌธ์ ์๋ฒ ๋ฉ์ ๋ฌธ์ LSTM์ ๋ํ ์
๋ ฅ์ด ๋ฉ๋๋ค .
253
254
#
0 commit comments