Skip to content

Commit 4658d55

Browse files
authored
beginner_source/nlp/sequence_models_tutorial.py ๋ฒˆ์—ญ (#780)
* beginner_source/nlp/sequence_models_tutorial.py ๋ฒˆ์—ญ
1 parent 0970896 commit 4658d55

File tree

1 file changed

+122
-121
lines changed

1 file changed

+122
-121
lines changed
Original file line numberDiff line numberDiff line change
@@ -1,37 +1,37 @@
11
# -*- coding: utf-8 -*-
22
r"""
3-
Sequence Models and Long Short-Term Memory Networks
3+
์‹œํ€€์Šค ๋ชจ๋ธ๊ณผ LSTM ๋„คํŠธ์›Œํฌ
44
===================================================
5+
**๋ฒˆ์—ญ**: `๋ฐ•์ˆ˜๋ฏผ <https://github.com/convin305>`_
56
6-
At this point, we have seen various feed-forward networks. That is,
7-
there is no state maintained by the network at all. This might not be
8-
the behavior we want. Sequence models are central to NLP: they are
9-
models where there is some sort of dependence through time between your
10-
inputs. The classical example of a sequence model is the Hidden Markov
11-
Model for part-of-speech tagging. Another example is the conditional
12-
random field.
13-
14-
A recurrent neural network is a network that maintains some kind of
15-
state. For example, its output could be used as part of the next input,
16-
so that information can propagate along as the network passes over the
17-
sequence. In the case of an LSTM, for each element in the sequence,
18-
there is a corresponding *hidden state* :math:`h_t`, which in principle
19-
can contain information from arbitrary points earlier in the sequence.
20-
We can use the hidden state to predict words in a language model,
21-
part-of-speech tags, and a myriad of other things.
22-
23-
24-
LSTMs in Pytorch
7+
์ง€๊ธˆ๊นŒ์ง€ ์šฐ๋ฆฌ๋Š” ๋‹ค์–‘ํ•œ ์ˆœ์ „ํŒŒ(feed-forward) ์‹ ๊ฒฝ๋ง๋“ค์„ ๋ณด์•„ ์™”์Šต๋‹ˆ๋‹ค.
8+
์ฆ‰, ๋„คํŠธ์›Œํฌ์— ์˜ํ•ด ์œ ์ง€๋˜๋Š” ์ƒํƒœ๊ฐ€ ์ „ํ˜€ ์—†๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
9+
์ด๊ฒƒ์€ ์•„๋งˆ ์šฐ๋ฆฌ๊ฐ€ ์›ํ•˜๋Š” ๋™์ž‘์ด ์•„๋‹ ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.
10+
์‹œํ€€์Šค ๋ชจ๋ธ์€ NLP์˜ ํ•ต์‹ฌ์ž…๋‹ˆ๋‹ค. ์ด๋Š” ์ž…๋ ฅ ๊ฐ„์— ์ผ์ข…์˜ ์‹œ๊ฐ„์  ์ข…์†์„ฑ์ด ์กด์žฌํ•˜๋Š” ๋ชจ๋ธ์„ ๋งํ•ฉ๋‹ˆ๋‹ค.
11+
์‹œํ€€์Šค ๋ชจ๋ธ์˜ ๊ณ ์ „์ ์ธ ์˜ˆ๋Š” ํ’ˆ์‚ฌ ํƒœ๊น…์„ ์œ„ํ•œ ํžˆ๋“  ๋งˆ๋ฅด์ฝ”ํ”„ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
12+
๋˜ ๋‹ค๋ฅธ ์˜ˆ๋Š” ์กฐ๊ฑด๋ถ€ ๋žœ๋ค ํ•„๋“œ์ž…๋‹ˆ๋‹ค.
13+
14+
์ˆœํ™˜ ์‹ ๊ฒฝ๋ง์€ ์ผ์ข…์˜ ์ƒํƒœ๋ฅผ ์œ ์ง€ํ•˜๋Š” ๋„คํŠธ์›Œํฌ์ž…๋‹ˆ๋‹ค.
15+
์˜ˆ๋ฅผ ๋“ค๋ฉด, ์ถœ๋ ฅ์€ ๋‹ค์Œ ์ž…๋ ฅ์˜ ์ผ๋ถ€๋กœ ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
16+
์ •๋ณด๋Š” ๋„คํŠธ์›Œํฌ๊ฐ€ ์‹œํ€€์Šค๋ฅผ ํ†ต๊ณผํ•  ๋•Œ ์ „ํŒŒ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
17+
LSTM์˜ ๊ฒฝ์šฐ์—, ์‹œํ€€์Šค์˜ ๊ฐ ์š”์†Œ์— ๋Œ€์‘ํ•˜๋Š” *์€๋‹‰ ์ƒํƒœ(hidden state)* :math:`h_t` ๊ฐ€ ์กด์žฌํ•˜๋ฉฐ,
18+
์ด๋Š” ์›์น™์ ์œผ๋กœ ์‹œํ€€์Šค์˜ ์•ž๋ถ€๋ถ„์— ์žˆ๋Š” ์ž„์˜ ํฌ์ธํŠธ์˜ ์ •๋ณด๋ฅผ ํฌํ•จํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
19+
์šฐ๋ฆฌ๋Š” ์€๋‹‰ ์ƒํƒœ๋ฅผ ์ด์šฉํ•˜์—ฌ ์–ธ์–ด ๋ชจ๋ธ์—์„œ์˜ ๋‹จ์–ด,
20+
ํ’ˆ์‚ฌ ํƒœ๊ทธ ๋“ฑ ๋ฌด์ˆ˜ํžˆ ๋งŽ์€ ๊ฒƒ๋“ค์„ ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
21+
22+
23+
Pytorch์—์„œ์˜ LSTM
2524
~~~~~~~~~~~~~~~~~
2625
27-
Before getting to the example, note a few things. Pytorch's LSTM expects
28-
all of its inputs to be 3D tensors. The semantics of the axes of these
29-
tensors is important. The first axis is the sequence itself, the second
30-
indexes instances in the mini-batch, and the third indexes elements of
31-
the input. We haven't discussed mini-batching, so let's just ignore that
32-
and assume we will always have just 1 dimension on the second axis. If
33-
we want to run the sequence model over the sentence "The cow jumped",
34-
our input should look like
26+
์˜ˆ์ œ๋ฅผ ์‹œ์ž‘ํ•˜๊ธฐ ์ „์—, ๋ช‡ ๊ฐ€์ง€ ์‚ฌํ•ญ์„ ์œ ์˜ํ•˜์„ธ์š”.
27+
Pytorch์—์„œ์˜ LSTM์€ ๋ชจ๋“  ์ž…๋ ฅ์ด 3D Tensor ์ผ ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒํ•ฉ๋‹ˆ๋‹ค.
28+
์ด๋Ÿฌํ•œ ํ…์„œ ์ถ•์˜ ์˜๋ฏธ๋Š” ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.
29+
์ฒซ ๋ฒˆ์งธ ์ถ•์€ ์‹œํ€€์Šค ์ž์ฒด์ด๊ณ , ๋‘ ๋ฒˆ์งธ ์ถ•์€ ๋ฏธ๋‹ˆ ๋ฐฐ์น˜์˜ ์ธ์Šคํ„ด์Šค๋ฅผ ์ธ๋ฑ์‹ฑํ•˜๋ฉฐ,
30+
์„ธ ๋ฒˆ์งธ ์ถ•์€ ์ž…๋ ฅ ์š”์†Œ๋ฅผ ์ธ๋ฑ์‹ฑํ•ฉ๋‹ˆ๋‹ค.
31+
๋ฏธ๋‹ˆ ๋ฐฐ์น˜์— ๋Œ€ํ•ด์„œ๋Š” ๋…ผ์˜ํ•˜์ง€ ์•Š์•˜์œผ๋ฏ€๋กœ ์ด๋ฅผ ๋ฌด์‹œํ•˜๊ณ ,
32+
๋‘ ๋ฒˆ์งธ ์ถ•์— ๋Œ€ํ•ด์„œ๋Š” ํ•ญ์ƒ 1์ฐจ์›๋งŒ ๊ฐ€์งˆ ๊ฒƒ์ด๋ผ๊ณ  ๊ฐ€์ •ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.
33+
๋งŒ์•ฝ ์šฐ๋ฆฌ๊ฐ€ "The cow jumped."๋ผ๋Š” ๋ฌธ์žฅ์— ๋Œ€ํ•ด ์‹œํ€€์Šค ๋ชจ๋ธ์„ ์‹คํ–‰ํ•˜๋ ค๋ฉด,
34+
์ž…๋ ฅ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์•„์•ผ ํ•ฉ๋‹ˆ๋‹ค.
3535
3636
.. math::
3737
@@ -42,12 +42,12 @@
4242
q_\text{jumped}
4343
\end{bmatrix}
4444
45-
Except remember there is an additional 2nd dimension with size 1.
45+
๋‹ค๋งŒ, ์‚ฌ์ด์ฆˆ๊ฐ€ 1์ธ ์ถ”๊ฐ€์ ์ธ 2์ฐจ์›์ด ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ๊ธฐ์–ตํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
4646
47-
In addition, you could go through the sequence one at a time, in which
48-
case the 1st axis will have size 1 also.
47+
๋˜ํ•œ ํ•œ ๋ฒˆ์— ํ•˜๋‚˜์”ฉ ์‹œํ€€์Šค๋ฅผ ์ง„ํ–‰ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ,
48+
์ด ๊ฒฝ์šฐ ์ฒซ ๋ฒˆ์งธ ์ถ•๋„ ์‚ฌ์ด์ฆˆ๊ฐ€ 1์ด ๋ฉ๋‹ˆ๋‹ค.
4949
50-
Let's see a quick example.
50+
๊ฐ„๋‹จํ•œ ์˜ˆ๋ฅผ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
5151
"""
5252

5353
# Author: Robert Guthrie
@@ -61,95 +61,96 @@
6161

6262
######################################################################
6363

64-
lstm = nn.LSTM(3, 3) # Input dim is 3, output dim is 3
65-
inputs = [torch.randn(1, 3) for _ in range(5)] # make a sequence of length 5
64+
lstm = nn.LSTM(3, 3) # ์ž…๋ ฅ 3์ฐจ์›, ์ถœ๋ ฅ 3์ฐจ์›
65+
inputs = [torch.randn(1, 3) for _ in range(5)] # ๊ธธ์ด๊ฐ€ 5์ธ ์‹œํ€€์Šค๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค
6666

67-
# initialize the hidden state.
67+
# ์€๋‹‰ ์ƒํƒœ๋ฅผ ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค.
6868
hidden = (torch.randn(1, 1, 3),
6969
torch.randn(1, 1, 3))
7070
for i in inputs:
71-
# Step through the sequence one element at a time.
72-
# after each step, hidden contains the hidden state.
71+
# ํ•œ ๋ฒˆ์— ํ•œ ์š”์†Œ์”ฉ ์‹œํ€€์Šค๋ฅผ ํ†ต๊ณผํ•ฉ๋‹ˆ๋‹ค.
72+
# ๊ฐ ๋‹จ๊ณ„๊ฐ€ ๋๋‚˜๋ฉด, hidden์—๋Š” ์€๋‹‰ ์ƒํƒœ๊ฐ€ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค.
7373
out, hidden = lstm(i.view(1, 1, -1), hidden)
7474

75-
# alternatively, we can do the entire sequence all at once.
76-
# the first value returned by LSTM is all of the hidden states throughout
77-
# the sequence. the second is just the most recent hidden state
78-
# (compare the last slice of "out" with "hidden" below, they are the same)
79-
# The reason for this is that:
80-
# "out" will give you access to all hidden states in the sequence
81-
# "hidden" will allow you to continue the sequence and backpropagate,
82-
# by passing it as an argument to the lstm at a later time
83-
# Add the extra 2nd dimension
75+
# ์•„๋‹ˆ๋ฉด ์šฐ๋ฆฌ๋Š” ์ „์ฒด ์‹œํ€€์Šค๋ฅผ ํ•œ ๋ฒˆ์— ์ˆ˜ํ–‰ํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.
76+
# LSTM์— ์˜ํ•ด ๋ฐ˜ํ™˜๋œ ์ฒซ ๋ฒˆ์งธ ๊ฐ’์€ ์‹œํ€€์Šค ์ „์ฒด์— ๋Œ€ํ•œ ์€๋‹‰ ์ƒํƒœ์ž…๋‹ˆ๋‹ค.
77+
# ๋‘ ๋ฒˆ์งธ๋Š” ๊ฐ€์žฅ ์ตœ๊ทผ์˜ ์€๋‹‰ ์ƒํƒœ์ž…๋‹ˆ๋‹ค.
78+
# (์•„๋ž˜์˜ "hidden"๊ณผ "out"์˜ ๋งˆ์ง€๋ง‰ ์Šฌ๋ผ์ด์Šค(slice)๋ฅผ ๋น„๊ตํ•ด ๋ณด๋ฉด ๋‘˜์€ ๋™์ผํ•ฉ๋‹ˆ๋‹ค.)
79+
# ์ด๋ ‡๊ฒŒ ํ•˜๋Š” ์ด์œ ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:
80+
# "out"์€ ์‹œํ€€์Šค์˜ ๋ชจ๋“  ์€๋‹‰ ์ƒํƒœ์— ๋Œ€ํ•œ ์•ก์„ธ์Šค๋ฅผ ์ œ๊ณตํ•˜๊ณ ,
81+
# "hidden"์€ ๋‚˜์ค‘์— lstm์— ์ธ์ˆ˜ ํ˜•ํƒœ๋กœ ์ „๋‹ฌํ•˜์—ฌ
82+
# ์‹œํ€€์Šค๋ฅผ ๊ณ„์†ํ•˜๊ณ , ์—ญ์ „ํŒŒ ํ•˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.
83+
# ์ถ”๊ฐ€๋กœ ๋‘ ๋ฒˆ์งธ ์ฐจ์›์„ ๋”ํ•ฉ๋‹ˆ๋‹ค.
8484
inputs = torch.cat(inputs).view(len(inputs), 1, -1)
85-
hidden = (torch.randn(1, 1, 3), torch.randn(1, 1, 3)) # clean out hidden state
85+
hidden = (torch.randn(1, 1, 3), torch.randn(1, 1, 3)) # ์€๋‹‰ ์ƒํƒœ๋ฅผ ์ง€์›๋‹ˆ๋‹ค.
8686
out, hidden = lstm(inputs, hidden)
8787
print(out)
8888
print(hidden)
8989

9090

9191
######################################################################
92-
# Example: An LSTM for Part-of-Speech Tagging
92+
# ์˜ˆ์‹œ: ํ’ˆ์‚ฌ ํƒœ๊น…์„ ์œ„ํ•œ LSTM
9393
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
9494
#
95-
# In this section, we will use an LSTM to get part of speech tags. We will
96-
# not use Viterbi or Forward-Backward or anything like that, but as a
97-
# (challenging) exercise to the reader, think about how Viterbi could be
98-
# used after you have seen what is going on. In this example, we also refer
99-
# to embeddings. If you are unfamiliar with embeddings, you can read up
100-
# about them `here <https://tutorials.pytorch.kr/beginner/nlp/word_embeddings_tutorial.html>`__.
95+
# ์ด ์„น์…˜์—์„œ๋Š” ์šฐ๋ฆฌ๋Š” ํ’ˆ์‚ฌ ํƒœ๊ทธ๋ฅผ ์–ป๊ธฐ ์œ„ํ•ด LSTM์„ ์ด์šฉํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค.
96+
# ๋น„ํ„ฐ๋น„(Viterbi)๋‚˜ ์ˆœ๋ฐฉํ–ฅ-์—ญ๋ฐฉํ–ฅ(Forward-Backward) ๊ฐ™์€ ๊ฒƒ๋“ค์€ ์‚ฌ์šฉํ•˜์ง€ ์•Š์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค.
97+
# ๊ทธ๋Ÿฌ๋‚˜ (๋„์ „์ ์ธ) ์—ฐ์Šต์œผ๋กœ, ์–ด๋–ป๊ฒŒ ๋Œ์•„๊ฐ€๋Š”์ง€๋ฅผ ํ™•์ธํ•œ ๋’ค์—
98+
# ๋น„ํ„ฐ๋น„๋ฅผ ์–ด๋–ป๊ฒŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š”์ง€์— ๋Œ€ํ•ด์„œ ์ƒ๊ฐํ•ด ๋ณด์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค.
99+
# ์ด ์˜ˆ์‹œ์—์„œ๋Š” ์ž„๋ฒ ๋”ฉ๋„ ์ฐธ์กฐํ•ฉ๋‹ˆ๋‹ค. ๋งŒ์•ฝ์— ์ž„๋ฒ ๋”ฉ์— ์ต์ˆ™ํ•˜์ง€ ์•Š๋‹ค๋ฉด,
100+
# `์—ฌ๊ธฐ <https://tutorials.pytorch.kr/beginner/nlp/word_embeddings_tutorial.html>`__.
101+
# ์—์„œ ๊ด€๋ จ ๋‚ด์šฉ์„ ์ฝ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
101102
#
102-
# The model is as follows: let our input sentence be
103-
# :math:`w_1, \dots, w_M`, where :math:`w_i \in V`, our vocab. Also, let
104-
# :math:`T` be our tag set, and :math:`y_i` the tag of word :math:`w_i`.
105-
# Denote our prediction of the tag of word :math:`w_i` by
106-
# :math:`\hat{y}_i`.
103+
# ๋ชจ๋ธ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. ๋‹จ์–ด๊ฐ€ :math:`w_i \in V` ์ผ ๋•Œ,
104+
# ์ž…๋ ฅ ๋ฌธ์žฅ์„ :math:`w_1, \dots, w_M` ๋ผ๊ณ  ํ•ฉ์‹œ๋‹ค. ๋˜ํ•œ,
105+
# :math:`T` ๋ฅผ ์šฐ๋ฆฌ์˜ ํƒœ๊ทธ ์ง‘ํ•ฉ๋ผ๊ณ  ํ•˜๊ณ , :math:`w_i` ์˜ ๋‹จ์–ด ํƒœ๊ทธ๋ฅผ :math:`y_i` ๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.
106+
# ๋‹จ์–ด :math:`w_i` ์— ๋Œ€ํ•œ ์˜ˆ์ธก๋œ ํƒœ๊ทธ๋ฅผ :math:`\hat{y}_i` ๋กœ ํ‘œ์‹œํ•ฉ๋‹ˆ๋‹ค.
107+
#
107108
#
108-
# This is a structure prediction, model, where our output is a sequence
109-
# :math:`\hat{y}_1, \dots, \hat{y}_M`, where :math:`\hat{y}_i \in T`.
109+
# ์ด๊ฒƒ์€ :math:`\hat{y}_i \in T` ์ผ ๋•Œ, ์ถœ๋ ฅ์ด :math:`\hat{y}_1, \dots, \hat{y}_M` ์‹œํ€€์Šค์ธ
110+
# ๊ตฌ์กฐ ์˜ˆ์ธก ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
110111
#
111-
# To do the prediction, pass an LSTM over the sentence. Denote the hidden
112-
# state at timestep :math:`i` as :math:`h_i`. Also, assign each tag a
113-
# unique index (like how we had word\_to\_ix in the word embeddings
114-
# section). Then our prediction rule for :math:`\hat{y}_i` is
112+
# ์˜ˆ์ธก์„ ํ•˜๊ธฐ ์œ„ํ•ด, LSTM์— ๋ฌธ์žฅ์„ ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค. ํ•œ ์‹œ๊ฐ„ ๋‹จ๊ณ„
113+
# :math:`i` ์˜ ์€๋‹‰ ์ƒํƒœ๋Š” :math:`h_i` ๋กœ ํ‘œ์‹œํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ ๊ฐ ํƒœ๊ทธ์—
114+
# ๊ณ ์œ ํ•œ ์ธ๋ฑ์Šค๋ฅผ ํ• ๋‹นํ•ฉ๋‹ˆ๋‹ค (๋‹จ์–ด ์ž„๋ฒ ๋”ฉ ์„น์…˜์—์„œ word\_to\_ix ๋ฅผ ์‚ฌ์šฉํ•œ ๊ฒƒ๊ณผ ์œ ์‚ฌํ•ฉ๋‹ˆ๋‹ค.)
115+
# ๊ทธ๋Ÿฌ๋ฉด :math:`\hat{y}_i` ์˜ˆ์ธก ๊ทœ์น™์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.
115116
#
116117
# .. math:: \hat{y}_i = \text{argmax}_j \ (\log \text{Softmax}(Ah_i + b))_j
117118
#
118-
# That is, take the log softmax of the affine map of the hidden state,
119-
# and the predicted tag is the tag that has the maximum value in this
120-
# vector. Note this implies immediately that the dimensionality of the
121-
# target space of :math:`A` is :math:`|T|`.
119+
# ์ฆ‰, ์€๋‹‰ ์ƒํƒœ์˜ ์•„ํ•€ ๋งต(affine map)์— ๋Œ€ํ•ด ๋กœ๊ทธ ์†Œํ”„ํŠธ๋งฅ์Šค(log softmax)๋ฅผ ์ทจํ•˜๊ณ ,
120+
# ์˜ˆ์ธก๋œ ํƒœ๊ทธ๋Š” ์ด ๋ฒกํ„ฐ์—์„œ ๊ฐ€์žฅ ํฐ ๊ฐ’์„ ๊ฐ€์ง€๋Š” ํƒœ๊ทธ๊ฐ€ ๋ฉ๋‹ˆ๋‹ค.
121+
# ์ด๊ฒƒ์€ ๊ณง :math:`A` ์˜ ํƒ€๊นƒ ๊ณต๊ฐ„์˜ ์ฐจ์›์ด :math:`|T|` ๋ผ๋Š” ๊ฒƒ์„
122+
# ์˜๋ฏธํ•œ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ์•„๋‘์„ธ์š”.
122123
#
123124
#
124-
# Prepare data:
125+
# ๋ฐ์ดํ„ฐ ์ค€๋น„:
125126

126127
def prepare_sequence(seq, to_ix):
127128
idxs = [to_ix[w] for w in seq]
128129
return torch.tensor(idxs, dtype=torch.long)
129130

130131

131132
training_data = [
132-
# Tags are: DET - determiner; NN - noun; V - verb
133-
# For example, the word "The" is a determiner
133+
# ํƒœ๊ทธ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค: DET - ํ•œ์ •์‚ฌ;NN - ๋ช…์‚ฌ;V - ๋™์‚ฌ
134+
# ์˜ˆ๋ฅผ ๋“ค์–ด, "The" ๋ผ๋Š” ๋‹จ์–ด๋Š” ํ•œ์ •์‚ฌ์ž…๋‹ˆ๋‹ค.
134135
("The dog ate the apple".split(), ["DET", "NN", "V", "DET", "NN"]),
135136
("Everybody read that book".split(), ["NN", "V", "DET", "NN"])
136137
]
137138
word_to_ix = {}
138-
# For each words-list (sentence) and tags-list in each tuple of training_data
139+
# training_data์˜ ๊ฐ ํŠœํ”Œ์— ์žˆ๋Š” ๊ฐ ๋‹จ์–ด ๋ชฉ๋ก(๋ฌธ์žฅ) ๋ฐ ํƒœ๊ทธ ๋ชฉ๋ก์— ๋Œ€ํ•ด
139140
for sent, tags in training_data:
140141
for word in sent:
141-
if word not in word_to_ix: # word has not been assigned an index yet
142-
word_to_ix[word] = len(word_to_ix) # Assign each word with a unique index
142+
if word not in word_to_ix: # word๋Š” ์•„์ง ๋ฒˆํ˜ธ๊ฐ€ ํ• ๋‹น๋˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค
143+
word_to_ix[word] = len(word_to_ix) # ๊ฐ ๋‹จ์–ด์— ๊ณ ์œ ํ•œ ๋ฒˆํ˜ธ ํ• ๋‹น
143144
print(word_to_ix)
144-
tag_to_ix = {"DET": 0, "NN": 1, "V": 2} # Assign each tag with a unique index
145+
tag_to_ix = {"DET": 0, "NN": 1, "V": 2} # ๊ฐ ํƒœ๊ทธ์— ๊ณ ์œ ํ•œ ๋ฒˆํ˜ธ ํ• ๋‹น
145146

146-
# These will usually be more like 32 or 64 dimensional.
147-
# We will keep them small, so we can see how the weights change as we train.
147+
# ์ด๊ฒƒ๋“ค์€ ์ผ๋ฐ˜์ ์œผ๋กœ 32๋‚˜ 64์ฐจ์›์— ๊ฐ€๊น์Šต๋‹ˆ๋‹ค.
148+
# ํ›ˆ๋ จํ•  ๋•Œ ๊ฐ€์ค‘์น˜๊ฐ€ ์–ด๋–ป๊ฒŒ ๋ณ€ํ•˜๋Š”์ง€ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋„๋ก, ์ž‘๊ฒŒ ์œ ์ง€ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.
148149
EMBEDDING_DIM = 6
149150
HIDDEN_DIM = 6
150151

151152
######################################################################
152-
# Create the model:
153+
# ๋ชจ๋ธ ์ƒ์„ฑ:
153154

154155

155156
class LSTMTagger(nn.Module):
@@ -160,11 +161,11 @@ def __init__(self, embedding_dim, hidden_dim, vocab_size, tagset_size):
160161

161162
self.word_embeddings = nn.Embedding(vocab_size, embedding_dim)
162163

163-
# The LSTM takes word embeddings as inputs, and outputs hidden states
164-
# with dimensionality hidden_dim.
164+
# LSTM์€ ๋‹จ์–ด ์ž„๋ฒ ๋”ฉ์„ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›๊ณ ,
165+
# ์ฐจ์›์ด hidden_dim์ธ ์€๋‹‰ ์ƒํƒœ๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
165166
self.lstm = nn.LSTM(embedding_dim, hidden_dim)
166167

167-
# The linear layer that maps from hidden state space to tag space
168+
# ์€๋‹‰ ์ƒํƒœ ๊ณต๊ฐ„์—์„œ ํƒœ๊ทธ ๊ณต๊ฐ„์œผ๋กœ ๋งคํ•‘ํ•˜๋Š” ์„ ํ˜• ๋ ˆ์ด์–ด
168169
self.hidden2tag = nn.Linear(hidden_dim, tagset_size)
169170

170171
def forward(self, sentence):
@@ -175,79 +176,79 @@ def forward(self, sentence):
175176
return tag_scores
176177

177178
######################################################################
178-
# Train the model:
179+
# ๋ชจ๋ธ ํ•™์Šต:
179180

180181

181182
model = LSTMTagger(EMBEDDING_DIM, HIDDEN_DIM, len(word_to_ix), len(tag_to_ix))
182183
loss_function = nn.NLLLoss()
183184
optimizer = optim.SGD(model.parameters(), lr=0.1)
184185

185-
# See what the scores are before training
186-
# Note that element i,j of the output is the score for tag j for word i.
187-
# Here we don't need to train, so the code is wrapped in torch.no_grad()
186+
# ํ›ˆ๋ จ ์ „์˜ ์ ์ˆ˜๋ฅผ ํ™•์ธํ•˜์„ธ์š”.
187+
# ์ถœ๋ ฅ์˜ i,j์š”์†Œ๋Š” ๋‹จ์–ด i์— ๋Œ€ํ•œ ํƒœ๊ทธ j์˜ ์ ์ˆ˜์ž…๋‹ˆ๋‹ค.
188+
# ์—ฌ๊ธฐ์„œ๋Š” ํ›ˆ๋ จ์„ ํ•  ํ•„์š”๊ฐ€ ์—†์œผ๋ฏ€๋กœ, ์ฝ”๋“œ๋Š” torch.no_grad()๋กœ ๋ž˜ํ•‘ ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
188189
with torch.no_grad():
189190
inputs = prepare_sequence(training_data[0][0], word_to_ix)
190191
tag_scores = model(inputs)
191192
print(tag_scores)
192193

193-
for epoch in range(300): # again, normally you would NOT do 300 epochs, it is toy data
194+
for epoch in range(300): # ๋‹ค์‹œ ๋งํ•˜์ง€๋งŒ, ์ผ๋ฐ˜์ ์œผ๋กœ 300์—ํญ์„ ์ˆ˜ํ–‰ํ•˜์ง€๋Š” ์•Š์Šต๋‹ˆ๋‹ค. ์ด๊ฑด ์žฅ๋‚œ๊ฐ ๋ฐ์ดํ„ฐ์ด๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.
194195
for sentence, tags in training_data:
195-
# Step 1. Remember that Pytorch accumulates gradients.
196-
# We need to clear them out before each instance
196+
# 1๋‹จ๊ณ„, Pytorch๋Š” ๋ณ€ํ™”๋„๋ฅผ ์ถ•์ ํ•œ๋‹ค๋Š” ๊ฒƒ์„ ๊ธฐ์–ตํ•˜์„ธ์š”.
197+
# ๊ฐ ์ธ์Šคํ„ด์Šค ์ „์— ์ด๋ฅผ ์ง€์›Œ์ค˜์•ผ ํ•ฉ๋‹ˆ๋‹ค.
197198
model.zero_grad()
198199

199-
# Step 2. Get our inputs ready for the network, that is, turn them into
200-
# Tensors of word indices.
200+
# 2๋‹จ๊ณ„, ๋„คํŠธ์›Œํฌ์— ๋งž๊ฒŒ ์ž…๋ ฅ์„ ์ค€๋น„์‹œํ‚ต๋‹ˆ๋‹ค.
201+
# ์ฆ‰, ์ž…๋ ฅ๋“ค์„ ๋‹จ์–ด ์ธ๋ฑ์Šค๋“ค์˜ ํ…์„œ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
201202
sentence_in = prepare_sequence(sentence, word_to_ix)
202203
targets = prepare_sequence(tags, tag_to_ix)
203204

204-
# Step 3. Run our forward pass.
205+
# 3๋‹จ๊ณ„, ์ˆœ์ „ํŒŒ ๋‹จ๊ณ„(forward pass)๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.
205206
tag_scores = model(sentence_in)
206207

207-
# Step 4. Compute the loss, gradients, and update the parameters by
208-
# calling optimizer.step()
208+
# 4๋‹จ๊ณ„, ์†์‹ค๊ณผ ๊ธฐ์šธ๊ธฐ๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ , optimizer.step()์„ ํ˜ธ์ถœํ•˜์—ฌ
209+
# ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์—…๋ฐ์ดํŠธํ•ฉ๋‹ˆ๋‹ค.
209210
loss = loss_function(tag_scores, targets)
210211
loss.backward()
211212
optimizer.step()
212213

213-
# See what the scores are after training
214+
# ํ›ˆ๋ จ ํ›„์˜ ์ ์ˆ˜๋ฅผ ํ™•์ธํ•ด ๋ณด์„ธ์š”.
214215
with torch.no_grad():
215216
inputs = prepare_sequence(training_data[0][0], word_to_ix)
216217
tag_scores = model(inputs)
217218

218-
# The sentence is "the dog ate the apple". i,j corresponds to score for tag j
219-
# for word i. The predicted tag is the maximum scoring tag.
220-
# Here, we can see the predicted sequence below is 0 1 2 0 1
221-
# since 0 is index of the maximum value of row 1,
222-
# 1 is the index of maximum value of row 2, etc.
223-
# Which is DET NOUN VERB DET NOUN, the correct sequence!
219+
# ๋ฌธ์žฅ์€ "the dog ate the apple"์ž…๋‹ˆ๋‹ค. i์™€ j๋Š” ๋‹จ์–ด i์— ๋Œ€ํ•œ ํƒœ๊ทธ j์˜ ์ ์ˆ˜๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.
220+
# ์˜ˆ์ธก๋œ ํƒœ๊ทธ๋Š” ๊ฐ€์žฅ ์ ์ˆ˜๊ฐ€ ๋†’์€ ํƒœ๊ทธ์ž…๋‹ˆ๋‹ค.
221+
# ์ž, ์•„๋ž˜์˜ ์˜ˆ์ธก๋œ ์ˆœ์„œ๊ฐ€ 0 1 2 0 1์ด๋ผ๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
222+
# 0์€ 1ํ–‰์— ๋Œ€ํ•œ ์ตœ๋Œ“๊ฐ’์ด๋ฏ€๋กœ,
223+
# 1์€ 2ํ–‰์— ๋Œ€ํ•œ ์ตœ๋Œ“๊ฐ’์ด ๋˜๋Š” ์‹์ž…๋‹ˆ๋‹ค.
224+
# DET NOUN VERB DET NOUN์€ ์˜ฌ๋ฐ”๋ฅธ ์ˆœ์„œ์ž…๋‹ˆ๋‹ค!
224225
print(tag_scores)
225226

226227

227228
######################################################################
228-
# Exercise: Augmenting the LSTM part-of-speech tagger with character-level features
229+
# ์—ฐ์Šต : ๋ฌธ์ž-๋‹จ์œ„ ํŠน์ง•๊ณผ LSTM ํ’ˆ์‚ฌ ํƒœ๊ฑฐ ์ฆ๊ฐ•
229230
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
230231
#
231-
# In the example above, each word had an embedding, which served as the
232-
# inputs to our sequence model. Let's augment the word embeddings with a
233-
# representation derived from the characters of the word. We expect that
234-
# this should help significantly, since character-level information like
235-
# affixes have a large bearing on part-of-speech. For example, words with
236-
# the affix *-ly* are almost always tagged as adverbs in English.
232+
# ์œ„์˜ ์˜ˆ์ œ์—์„œ, ๊ฐ ๋‹จ์–ด๋Š” ์‹œํ€€์Šค ๋ชจ๋ธ์— ์ž…๋ ฅ ์—ญํ• ์„ ํ•˜๋Š” ์ž„๋ฒ ๋”ฉ์„ ๊ฐ€์ง‘๋‹ˆ๋‹ค.
233+
# ๋‹จ์–ด์˜ ๋ฌธ์ž์—์„œ ํŒŒ์ƒ๋œ ํ‘œํ˜„์œผ๋กœ ๋‹จ์–ด ์ž„๋ฒ ๋”ฉ์„ ์ฆ๊ฐ€์‹œ์ผœ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
234+
# ์ ‘์‚ฌ(affixes)์™€ ๊ฐ™์€ ๋ฌธ์ž ์ˆ˜์ค€์˜ ์ •๋ณด๋Š” ํ’ˆ์‚ฌ์— ํฐ ์˜ํ–ฅ์„ ๋ฏธ์น˜๊ธฐ ๋•Œ๋ฌธ์—,
235+
# ์ƒ๋‹นํ•œ ๋„์›€์ด ๋  ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒํ•ฉ๋‹ˆ๋‹ค.
236+
# ์˜ˆ๋ฅผ ๋“ค์–ด, ์ ‘์‚ฌ *-ly* ๊ฐ€ ์žˆ๋Š” ๋‹จ์–ด๋Š”
237+
# ์˜์–ด์—์„œ ๊ฑฐ์˜ ํ•ญ์ƒ ๋ถ€์‚ฌ๋กœ ํƒœ๊ทธ๊ฐ€ ์ง€์ •๋ฉ๋‹ˆ๋‹ค.
237238
#
238-
# To do this, let :math:`c_w` be the character-level representation of
239-
# word :math:`w`. Let :math:`x_w` be the word embedding as before. Then
240-
# the input to our sequence model is the concatenation of :math:`x_w` and
241-
# :math:`c_w`. So if :math:`x_w` has dimension 5, and :math:`c_w`
242-
# dimension 3, then our LSTM should accept an input of dimension 8.
239+
# ์ด๊ฒƒ์„ ํ•˜๊ธฐ ์œ„ํ•ด์„œ, :math:`c_w` ๋ฅผ ๋‹จ์–ด :math:`w` ์˜ C๋ฅผ ๋‹จ์–ด w์˜ ๋ฌธ์ž ์ˆ˜์ค€ ํ‘œํ˜„์ด๋ผ๊ณ  ํ•˜๊ณ ,
240+
# ์ „๊ณผ ๊ฐ™์ด :math:`x_w` ๋ฅผ ๋‹จ์–ด์ž„๋ฒ ๋”ฉ์ด๋ผ๊ณ  ํ•ฉ์‹œ๋‹ค.
241+
# ๊ทธ๋ ‡๋‹ค๋ฉด ์šฐ๋ฆฌ์˜ ์‹œํ€€์Šค ๋ชจ๋ธ์— ๋Œ€ํ•œ ์ž…๋ ฅ์€ :math:`x_w` ์™€
242+
# :math:`c_w` ์˜ ์—ฐ๊ฒฐ์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋งŒ์•ฝ์— :math:`x_w` ๊ฐ€ ์ฐจ์› 5๋ฅผ ๊ฐ€์ง€๊ณ , :math:`c_w`
243+
# ์ฐจ์› 3์„ ๊ฐ€์ง€๋ฉด LSTM์€ ์ฐจ์› 8์˜ ์ž…๋ ฅ์„ ๋ฐ›์•„๋“ค์—ฌ์•ผ ํ•ฉ๋‹ˆ๋‹ค.
243244
#
244-
# To get the character level representation, do an LSTM over the
245-
# characters of a word, and let :math:`c_w` be the final hidden state of
246-
# this LSTM. Hints:
245+
# ๋ฌธ์ž ์ˆ˜์ค€์˜ ํ‘œํ˜„์„ ์–ป๊ธฐ ์œ„ํ•ด์„œ, ๋‹จ์–ด์˜ ๋ฌธ์ž์— ๋Œ€ํ•ด์„œ LSTM์„ ์ˆ˜ํ–‰ํ•˜๊ณ 
246+
# :math:`c_w` ๋ฅผ LSTM์˜ ์ตœ์ข… ์€๋‹‰ ์ƒํƒœ๊ฐ€ ๋˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.
247+
# ํžŒํŠธ:
247248
#
248-
# * There are going to be two LSTM's in your new model.
249-
# The original one that outputs POS tag scores, and the new one that
250-
# outputs a character-level representation of each word.
251-
# * To do a sequence model over characters, you will have to embed characters.
252-
# The character embeddings will be the input to the character LSTM.
249+
# * ์ƒˆ ๋ชจ๋ธ์—๋Š” ๋‘ ๊ฐœ์˜ LSTM์ด ์žˆ์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค.
250+
# POS ํƒœ๊ทธ ์ ์ˆ˜๋ฅผ ์ถœ๋ ฅํ•˜๋Š” ์›๋ž˜์˜ LSTM๊ณผ
251+
# ๊ฐ ๋‹จ์–ด์˜ ๋ฌธ์ž ์ˆ˜์ค€ ํ‘œํ˜„์„ ์ถœ๋ ฅํ•˜๋Š” ์ƒˆ๋กœ์šด LSTM์ž…๋‹ˆ๋‹ค.
252+
# * ๋ฌธ์ž์— ๋Œ€ํ•ด ์‹œํ€€์Šค ๋ชจ๋ธ์„ ์ˆ˜ํ–‰ํ•˜๋ ค๋ฉด, ๋ฌธ์ž๋ฅผ ์ž„๋ฒ ๋”ฉํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
253+
# ๋ฌธ์ž ์ž„๋ฒ ๋”ฉ์€ ๋ฌธ์ž LSTM์— ๋Œ€ํ•œ ์ž…๋ ฅ์ด ๋ฉ๋‹ˆ๋‹ค.
253254
#

0 commit comments

Comments
ย (0)