Alright, it's round 2 of making overcomplicated models to generate fake Nietsche!
This time I'm experimenting with a 2-layer LSTM, but second layer is smaller. But the interesting bit is that I'm taking the final state of both LSTMs and concatenating them, so the next-letter-prediction layer is receiving a 'short' and 'long' view over the text.
It's still very possible that the model is too complex for the task/dataset, though. We'll see!
#datascience #neuralnetworks
An interesting effect here, I wonder if it's common to all Seq2Seq networks: As the shitty-Nietsche network trains and the loss decreases, the lower-temperature* text samples seem to get better, but the higher-temperature* ones get dumber and more chaotic? My intuition is grasping at it, it kinda makes sense.
*temperature; randomness applied to the predicted characters, to perturb outputs. Can force network to be more creative by making it "recover" from randomness.
#neuralnetworks
"precious too
wrilt so
bewarding yonsiging,
unpeeds wond-wind throths."
Preach it, shitty partially-trained Nietsche network