Long short-term memory recurrent neural networks, or LSTM RNNs for short, are neural networks that can memorize and regurgitate sequential data. They’ve become very popular these days, primarly because they can be used to create bots that can generate articles, stories, music, poems, screenplays - you name it! How? Well, its because a lot of things humans do involve sequences.
To make things clearer, let me give you a few examples. An LSTM can be trained on a novel to generate sentences that look very similar to those present in the novel. If you train it with multiple novels written by the same author, the LSTM will start sounding like the author. Similarly, an LSTM trained using a collection of songs that belong to a specific genre of music will be able to generate “songs” that belong to the same genre. I hope you get the idea.
In this tutorial, I’ll show you how to use Deeplearning4J to create an LSTM that can generate sentences that are similar to those written by the prolific 19th century author Emma Leslie. We’ll be using her novel Hayslope Grange as our training data, so, before we begin, I suggest you download the novel as plain text and store it somewhere on your computer.
The first thing you need to do is, of course, convert the text of the novel into a string you can use in your program. The easiest way to do so is to use the
IOUtils.toString() method, which is available in the Apache Commons IO library.
You are free to use all the text that’s available in the novel. I, however, in order to speed things up, will be using only the first 50000 characters.
Our neural network shall have an input LSTM layer, a hidden layer, and an output RNN layer. How many neurons should be present in these layers? Well, that depends on how many different characters your network can handle.
For now, let’s say our network can handle all the letters of the English alphabet, in both uppercase and lowercase, all the digits from 0-9, and a few special characters.
The following string has all the characters I will be using:
Our input and output layers must have one neuron for each character that’s present in the above string. As for the hidden layer, I’ll use 30 neurons. You are free to change that number.
To create an input LSTM layer with DL4J, you must use the
GravesLSTM class. Similarly, to create an output RNN layer, you must use the
RnnOutputLayer class. While creating these layers, you must remember to specify the activation functions they should use. For best results, using
TANH for the input layer and
SOFTMAX for the output layer is recommended.
Accordingly, add the following code:
Well, now that our neuron layers are ready, we can use them to create a
MultiLayerNetwork object. You must, however, also create a
NeuralNetConfiguration.Builder in order to configure the network. As always, during the configuration, you must specify details such as which optimization algorithm and updater to use, how to initialize the weights, and what the learning rate should be.
Here’s the configuration I used:
As you might already know, DL4J expects you to place all your training data inside
INDArray objects. That means, we must now create two
INDArray objects: one for the input values and one for the expected output values, or labels.
INDArray inputArray = Nd4j.zeros(1, inputLayer.getNIn(), inputData.length()); INDArray inputLabels = Nd4j.zeros(1, outputLayer.getNOut(), inputData.length());
While using an LSTM, the label for an input value is nothing but the next input value. For example, if your training data is the string “abc”, for the input value “a”, the label will be “b”. Similarly, for the input value “b”, the label will be “c”.
Keeping that in mind, you can populate the
INDArray objects using the following code:
INDArray objects, you can now create a
DataSet that can be directly used by your neural network.
At this point, all you need to do is call the
fit() method and pass the data set to the neural network. However, trying to fit the data set just once is usually not enough. You must fit it several times before the neural network becomes accurate. I suggest you do it a thousand times at least.
Obviously, fitting the data a thousand times is going to take quite a long time. To be able to see the intermediate results the network generates for every iteration, you can pass test data to the network after each call to the
The test data, of course, will be another
INDArray object whose size is equal to the size of the network’s input layer. It will contain the index of just one character, which will also be the first character of the network’s generated text.
Using that character, our LSTM will generate a new character. By passing that generated character back to the LSTM as the next test data, you can generate another character. By repeating these steps again and again, you can generate strings that are arbitrarily long.
The following code shows you how to generate a string that’s 200 characters long for every iteration:
As you can see in the above code, we are finding the output neuron with the highest value and using it to determine the next character. Also note that you must always remember to call the
rnnClearPreviousState() method in every iteration.
And that’s all there is to creating an LSTM. Go ahead and run the program to see it generate sentences and paragraphs that are eerily similar to valid English prose.
Here are some paragraphs my LSTM generated after being trained for nearly an hour:
As you can see, the paragraphs still contain quite a lot of jibberish. By increasing the number of iterations, and by using more of the novel, you can get better results.
You now know how to create a simple LSTM using DeepLearning4J. Although we created a character-based LSTM, it is possible to create LSTMs that are word based, which will generate sentences that are more natural. Doing so, however, would definitely be slightly more complex.