Skip to content
Tyler McEntee edited this page May 19, 2017 · 6 revisions

This is an active project.

Before reading any of these, do a git pull and see if it fixed your issue.


Cannot Load Checkpoint or Sample Data

symptoms

luajit failing with cannot open mode r, similar to issue #46

solutions

  • double check the directory you're loading the file from
  • check that the current account has read permissions on that file

###Cannot Generate Text symptoms

luajit failing with bad argument #1 to 'size', similar to issue #44

luajit failing with bad argument #1 to '?' (empty tensor at /.../torch/generic/Tensor.c:851)

solutions

  • if you specified primetext, verify that each character has already been seen in the training set

Fails to start, claiming cunn/cutorch package not found

symptoms

cunn/cutorch package not found

solutions

  • verify you have NVidia's CUDA runtime installed
  • install cunn or cutorch package with luarocks install cunn or luarocks install cutorch
  • execute the command as root with sudo

CPU training executes with only one thread

symptoms

only a single core of a multicore system is being heavily used

solutions

  • compile OpenBLAS with multithreading support, and compile and link Torch against it.

loss is exploding, aborting.

symptoms

train_loss value slowly decreases then suddenly increases to a much higher value

solutions

cuda runtime error (2) : out of memory

symptoms

train.lua crashes early on, with the aforementioned error

solutions

  • lower num_layers parameter, which defaults to 2 if omitted
  • lower rnn_size parameter, which defaults to 128 if omitted
  • lower batch_size parameter, which defaults to 50 if omitted
  • Linux and some Windows users can take advantage of nvidia-smi to view available memory in the GPU