@@ -227,21 +227,19 @@ The task consists of feeding a large array of decimal numbers to the network, al
227
227
228
228
#### Implementation results
229
229
230
- The model takes time to learn this task. It's symbolized by a very long plateau (could take ~ 8 epochs on some runs).
231
-
232
230
```
233
- 200000/200000 [==============================] - 293s 1ms /step - loss: 0.1731 - val_loss: 0.1662
234
- 200000/200000 [==============================] - 289s 1ms /step - loss: 0.1675 - val_loss: 0.1665
235
- 200000/200000 [==============================] - 287s 1ms /step - loss: 0.1670 - val_loss: 0.1665
236
- 200000/200000 [==============================] - 288s 1ms /step - loss: 0.1668 - val_loss: 0.1669
237
- 200000/200000 [==============================] - 285s 1ms /step - loss: 0.1085 - val_loss: 0.0019
238
- 200000/200000 [==============================] - 285s 1ms /step - loss: 0.0011 - val_loss: 4.1667e-04
239
- 200000/200000 [==============================] - 282s 1ms /step - loss: 6.0470e-04 - val_loss: 6.7708e-04
240
- 200000/200000 [==============================] - 282s 1ms /step - loss: 4.3099e-04 - val_loss: 7.3898e-04
241
- 200000/200000 [==============================] - 282s 1ms /step - loss: 3.9102e-04 - val_loss: 1.8727e-04
242
- 200000/200000 [==============================] - 280s 1ms /step - loss: 3.1040e-04 - val_loss: 0.0010
243
- 200000/200000 [==============================] - 281s 1ms /step - loss: 3.1166e -04 - val_loss: 2.2333e -04
244
- 200000/200000 [==============================] - 281s 1ms /step - loss: 2.8046e -04 - val_loss: 1.5194e -04
231
+ 782/782 [==============================] - 154s 197ms /step - loss: 0.8437 - val_loss: 0.1883
232
+ 782/782 [==============================] - 154s 196ms /step - loss: 0.0702 - val_loss: 0.0111
233
+ 782/782 [==============================] - 153s 195ms /step - loss: 0.0053 - val_loss: 0.0038
234
+ 782/782 [==============================] - 154s 196ms /step - loss: 0.0035 - val_loss: 0.0027
235
+ 782/782 [==============================] - 153s 196ms /step - loss: 0.0030 - val_loss: 0.0065
236
+ 782/782 [==============================] - 151s 193ms /step - loss: 0.0027 - val_loss: 0.0018
237
+ 782/782 [==============================] - 152s 194ms /step - loss: 0.0025 - val_loss: 0.0036
238
+ 782/782 [==============================] - 153s 196ms /step - loss: 0.0024 - val_loss: 0.0018
239
+ 782/782 [==============================] - 152s 194ms /step - loss: 0.0023 - val_loss: 0.0016
240
+ 782/782 [==============================] - 152s 194ms /step - loss: 0.0014 - val_loss: 3.7456e-04
241
+ 782/782 [==============================] - 153s 196ms /step - loss: 9.4740e -04 - val_loss: 7.0205e -04
242
+ 782/782 [==============================] - 152s 194ms /step - loss: 6.9630e -04 - val_loss: 3.7180e -04
245
243
```
246
244
247
245
### Copy Memory Task
@@ -263,13 +261,14 @@ The idea is to copy the content of the vector x to the end of the large array. T
263
261
#### Implementation results (first epochs)
264
262
265
263
```
266
- 30000/30000 [==============================] - 30s 1ms/step - loss: 0.1174 - acc: 0.9586 - val_loss: 0.0370 - val_acc: 0.9859
267
- 30000/30000 [==============================] - 26s 874us/step - loss: 0.0367 - acc: 0.9859 - val_loss: 0.0363 - val_acc: 0.9859
268
- 30000/30000 [==============================] - 26s 852us/step - loss: 0.0361 - acc: 0.9859 - val_loss: 0.0358 - val_acc: 0.9859
269
- 30000/30000 [==============================] - 26s 872us/step - loss: 0.0355 - acc: 0.9859 - val_loss: 0.0349 - val_acc: 0.9859
270
- 30000/30000 [==============================] - 25s 850us/step - loss: 0.0339 - acc: 0.9864 - val_loss: 0.0291 - val_acc: 0.9881
271
- 30000/30000 [==============================] - 26s 856us/step - loss: 0.0235 - acc: 0.9896 - val_loss: 0.0159 - val_acc: 0.9944
272
- 30000/30000 [==============================] - 26s 872us/step - loss: 0.0169 - acc: 0.9929 - val_loss: 0.0125 - val_acc: 0.9966
264
+ 118/118 [==============================] - 17s 143ms/step - loss: 1.1732 - accuracy: 0.6725 - val_loss: 0.1119 - val_accuracy: 0.9796
265
+ 118/118 [==============================] - 15s 125ms/step - loss: 0.0645 - accuracy: 0.9831 - val_loss: 0.0402 - val_accuracy: 0.9853
266
+ 118/118 [==============================] - 15s 125ms/step - loss: 0.0393 - accuracy: 0.9856 - val_loss: 0.0372 - val_accuracy: 0.9857
267
+ 118/118 [==============================] - 15s 125ms/step - loss: 0.0361 - accuracy: 0.9858 - val_loss: 0.0344 - val_accuracy: 0.9860
268
+ 118/118 [==============================] - 15s 125ms/step - loss: 0.0345 - accuracy: 0.9860 - val_loss: 0.0335 - val_accuracy: 0.9864
269
+ 118/118 [==============================] - 15s 125ms/step - loss: 0.0325 - accuracy: 0.9867 - val_loss: 0.0268 - val_accuracy: 0.9886
270
+ 118/118 [==============================] - 15s 125ms/step - loss: 0.0268 - accuracy: 0.9885 - val_loss: 0.0206 - val_accuracy: 0.9908
271
+ 118/118 [==============================] - 15s 125ms/step - loss: 0.0228 - accuracy: 0.9900 - val_loss: 0.0169 - val_accuracy: 0.9933
273
272
```
274
273
275
274
### Sequential MNIST
@@ -286,11 +285,16 @@ The idea here is to consider MNIST images as 1-D sequences and feed them to the
286
285
#### Implementation results
287
286
288
287
```
289
- 60000/60000 [==============================] - 118s 2ms/step - loss: 0.2348 - acc: 0.9265 - val_loss: 0.1308 - val_acc: 0.9579
290
- 60000/60000 [==============================] - 116s 2ms/step - loss: 0.0973 - acc: 0.9698 - val_loss: 0.0645 - val_acc: 0.9798
291
- [...]
292
- 60000/60000 [==============================] - 112s 2ms/step - loss: 0.0075 - acc: 0.9978 - val_loss: 0.0547 - val_acc: 0.9894
293
- 60000/60000 [==============================] - 111s 2ms/step - loss: 0.0093 - acc: 0.9968 - val_loss: 0.0585 - val_acc: 0.9895
288
+ 1875/1875 [==============================] - 46s 25ms/step - loss: 0.0949 - accuracy: 0.9706 - val_loss: 0.0763 - val_accuracy: 0.9756
289
+ 1875/1875 [==============================] - 46s 25ms/step - loss: 0.0831 - accuracy: 0.9743 - val_loss: 0.0656 - val_accuracy: 0.9807
290
+ 1875/1875 [==============================] - 46s 25ms/step - loss: 0.0752 - accuracy: 0.9763 - val_loss: 0.0604 - val_accuracy: 0.9802
291
+ 1875/1875 [==============================] - 46s 25ms/step - loss: 0.0685 - accuracy: 0.9785 - val_loss: 0.0588 - val_accuracy: 0.9813
292
+ 1875/1875 [==============================] - 46s 25ms/step - loss: 0.0624 - accuracy: 0.9801 - val_loss: 0.0545 - val_accuracy: 0.9822
293
+ 1875/1875 [==============================] - 46s 25ms/step - loss: 0.0603 - accuracy: 0.9812 - val_loss: 0.0478 - val_accuracy: 0.9835
294
+ 1875/1875 [==============================] - 46s 25ms/step - loss: 0.0566 - accuracy: 0.9821 - val_loss: 0.0546 - val_accuracy: 0.9826
295
+ 1875/1875 [==============================] - 46s 25ms/step - loss: 0.0503 - accuracy: 0.9843 - val_loss: 0.0441 - val_accuracy: 0.9853
296
+ 1875/1875 [==============================] - 46s 25ms/step - loss: 0.0486 - accuracy: 0.9840 - val_loss: 0.0572 - val_accuracy: 0.9832
297
+ 1875/1875 [==============================] - 46s 25ms/step - loss: 0.0453 - accuracy: 0.9858 - val_loss: 0.0424 - val_accuracy: 0.9862
294
298
```
295
299
296
300
## Testing
0 commit comments