This week I experimented with using different kinds of neural nets to generate text and music (listen to my music generation results here). I grabbed the Kaggle Wine Reviews dataset and used that text to train a character prediction model.
I first used a plain RNN. Each letter feeds into a linear layer, but then the output of that layer loops back, and is concatenated with the next letter and given as input to that same layer. I was curious to try out a “skip” connection idea – taking inspiration from ResNet and the idea of giving two parallel paths for the data: one direct, and one through a linear-relu combination. I take the output from one character and feed it not only as input for processing the next character, but also give it a shortcut path (so it is part of the input a few or several characters later). I also tried a variation where the history of the last 10 characters was put into a small neural net, and I considered the output from that as a hidden state that I passed to the next timestep.
All of these variations trained well, but I didn’t find a combination that trained reliable more quickly or to a better accuracy. (I tested the models on their ability to predict the next letter of the wine review corpus, after having been fed 24 characters in a row). I also tested a GRU layer on this task, and reached similar, but reliably better accuracy.
As a fun aside from trying out these models, I was able to create some entertaining wine reviews. I start the models off with an initial seed (perhaps “Drink now through 2025. “) and then I ask the model to predict the next letter. I then feed this back into the model to ask for the next letter after that. When the model makes a prediction, you can choose either to take the model’s top guess, or you can choose to sample randomly, based on the likelihood it assigns each letter. I found the latter approach yielded much better results (in taking only the top prediction, there is far too much repetition, and I ended up with reviews like “cherries and fruit and cherries and fruit and cherries and fruit…”).
At each step, the model is predicting only the next character. It initially has no knowledge of words or grammar, and it only learns to predict the likelihood of each letter (a-z, A-Z, 0-9, and a few special letters with accents). Impressively, it learned to create fairly plausible reviews. Occasionally, it would create new words (including my new favorite – “winemanically”).