Experimenting with LSTMs for text generation. I trained a 1-layer LSTM with softmax, as in most tutorials, and it learned pretty quickly to make english-looking words, but a 2-layer was taking much longer to make anything but jibberish; I started out with losses of 3.10+.
So, I pretrained for only 1 epoch on a 1-layer network, transferred those embryonic weights to the 1st layer in the 2-layer, and the loss is already under 2. Very impressive kick-start to training!
#neuralnetworks #datascience
Update: it continues to train much, much faster than an LSTM without pretraining. Prevous models without pretraining hovered above loss 3 for many epochs, this one is dropping nicely; on fifth epoch now and it's fallen from ~2 to 1.4x.
Also, the 2-layer without pretraining couldn't generate anything in first few epochs, this one generated word-like output in first epoch. I only pretrained as a single layer for one epoch, and it's made a huge difference.
#neuralnetworks #datascience
@cathal I use something like that for a password generator, but your words are better
Aaaand we're back up to a loss of 9, with the generator creating long sequences of low-entropy words like "it it it it it" and "eneaenenenenent enene tet in teentent evenent". Perhaps it's vaulted off a local minimum and will find a deeper well of meaning? Perhaps this is the end for this iteration of Tiny Nietsche? Stay tuned. Or don't.
@cathal > eneaenenenenent enene tet in teentent evenent Chris Waddle
@iona That's letterwang!
One of the things I like best about training char-rnns is the compelling fake words they invent from their developing statistical models during training:
- 'pariefation'
- 'diffict'
- 'enfitily'
- 'beliexcabrections'
- 'grat asphyeish'