6.5 RNNLMのさらなる改善

高速化のためにGPUを使います。

from ivory.common.context import np
np.context = 'gpu'

[1] 2019-06-20 20:37:10 (192ms) python3 (192ms)

PTBデータセットを読み出します。

from ivory.common.dataset import TimeDataset
from ivory.utils.repository import import_module

ptb = import_module("scratch2/dataset/ptb")
corpus, _, _ = ptb.load_data("train")
corpus_val, _, _ = ptb.load_data("val")
corpus_test, _, _ = ptb.load_data("test")
vocab_size = int(max(corpus) + 1)
x, t = corpus[:-1], corpus[1:]
data = TimeDataset((x, t), time_size=35, batch_size=20)
data.epochs = 1
data

[2] 2019-06-20 20:37:11 (529ms) python3 (721ms)

TimeDataset(time_size=35, batch_size=20, epochs=1, len=1327, column=0, size=(929588,))

ハイパーパラメータの設定を行います。

wordvec_size = 650
hidden_size = 650
lr = 20.0
max_epoch = 40
max_grad = 0.25
dropout = 0.5

[3] 2019-06-20 20:37:11 (4.00ms) python3 (725ms)

モデルを作成します。

from ivory.core.trainer import sequential

net = [
    ("input", vocab_size),
    ("embedding", wordvec_size, "dropout"),
    ("lstm", hidden_size, "dropout"),
    ("lstm", hidden_size, "dropout"),
    ("affine", vocab_size, "softmax_cross_entropy"),
]
trainer = sequential(net, optimizer="sgd", metrics=["loss"])
trainer.optimizer.learning_rate = lr
trainer.max_grad = max_grad
model = trainer.model
for layer in model.layers:
    print(layer)

[4] 2019-06-20 20:37:11 (223ms) python3 (949ms)

<Embedding('Embedding.1', (10000, 650)) at 0x1cf823acba8>
<Dropout('Dropout.1', (650,)) at 0x1cf823accc0>
<LSTM('LSTM.1', (650, 650)) at 0x1cf823acdd8>
<Dropout('Dropout.2', (650,)) at 0x1cf823acf98>
<LSTM('LSTM.2', (650, 650)) at 0x1cf82368160>
<Dropout('Dropout.3', (650,)) at 0x1cf82368400>
<Affine('Affine.1', (650, 10000)) at 0x1cf823685f8>
<SoftmaxCrossEntropy('SoftmaxCrossEntropy.1', (10000,)) at 0x1cf823688d0>

重みの初期値とドロップアウト率を設定します。

from ivory.common.context import np

model.init(std="xavier", dropout_ratio=dropout)
for p in model.weights:
    if p.name != "b":
        std1 = f"{float(p.d.std()):.03f}"
        std2 = f"{float(np.sqrt(1/p.d.shape[0])):.03f}"
        print(p.layer.name, p.name, std1, std2, type(p.d), p.d.dtype)

for layer in model.layers:
    if layer.name.startswith('Dropout'):
        print(layer.name, layer.dropout_ratio.d)

[5] 2019-06-20 20:37:11 (114ms) python3 (1.06s)

Embedding.1 W 0.010 0.010 <class 'cupy.core.core.ndarray'> float32
LSTM.1 W 0.039 0.039 <class 'cupy.core.core.ndarray'> float32
LSTM.1 U 0.039 0.039 <class 'cupy.core.core.ndarray'> float32
LSTM.2 W 0.039 0.039 <class 'cupy.core.core.ndarray'> float32
LSTM.2 U 0.039 0.039 <class 'cupy.core.core.ndarray'> float32
Affine.1 W 0.039 0.039 <class 'cupy.core.core.ndarray'> float32
Dropout.1 0.5
Dropout.2 0.5
Dropout.3 0.5

重みの共有をします。

em = model.layers[0]
affine = model.layers[-2]
affine.W.share_variable(em.W, transpose=True)
trainer.build()
for v in trainer.optimizer.variables:
    print(v)

[6] 2019-06-20 20:37:12 (15.5ms) python3 (1.08s)

<Variable(['Embedding.1.W', 'Affine.1.W'], (10000, 650)) at 0x1cf82368ba8>
<Variable(['LSTM.1.W'], (650, 2600)) at 0x1cf82368be0>
<Variable(['LSTM.1.U'], (650, 2600)) at 0x1cf82368cf8>
<Variable(['LSTM.1.b'], (2600,)) at 0x1cf82368c18>
<Variable(['LSTM.2.W'], (650, 2600)) at 0x1cf82368dd8>
<Variable(['LSTM.2.U'], (650, 2600)) at 0x1cf82368c88>
<Variable(['LSTM.2.b'], (2600,)) at 0x1cf82368e48>
<Variable(['Affine.1.b'], (10000,)) at 0x1cf82368f28>

訓練を実施します。

trainer.fit(data)
it = iter(trainer)
loss = next(it)[1]
print(data.iteration, int(np.exp(loss)))

for i in range(8):
    loss = 0.0
    for _ in range(20):
        loss += next(it)[1]
    loss /= 20.0
    print(data.iteration, int(np.exp(loss)))

[7] 2019-06-20 20:37:12 (28.2s) python3 (29.3s)

0 9989
20 3680
40 1958
60 1303
80 1091
100 837
120 808
140 698
160 695

「ゼロから作るDeep Learning ❷」のch06/train_better_rnnlm.pyの実行結果の冒頭を記載します。

| epoch 1 |  iter 1 / 1327 | time 2[s] | perplexity 9999.86
| epoch 1 |  iter 21 / 1327 | time 60[s] | perplexity 4233.17
| epoch 1 |  iter 41 / 1327 | time 116[s] | perplexity 1645.35
| epoch 1 |  iter 61 / 1327 | time 172[s] | perplexity 1346.09
| epoch 1 |  iter 81 / 1327 | time 227[s] | perplexity 1022.61
| epoch 1 |  iter 101 / 1327 | time 283[s] | perplexity 845.07
| epoch 1 |  iter 121 / 1327 | time 339[s] | perplexity 810.82
| epoch 1 |  iter 141 / 1327 | time 395[s] | perplexity 749.34
| epoch 1 |  iter 161 / 1327 | time 451[s] | perplexity 685.36

実際の訓練は独立したスクリプトファイルを作成して実行します。次節で、結果を検証します。