Generating Violin/Piano Duo
Multiple Instrument Generation
I worked this week on expanding my music generation model to include violin & piano duets. Unfortunately midi violin sounds awful, so you’ll have to imagine a decent sounding violinist playing these clips, but I’m excited that the model is learning to create violin and piano parts that intertwine. As these generations get better, I’m planning to record live musicians playing a few of them as part of my final project.
I also passed the LSTM generated midi file to MuseScore, a free music notation software, so that I can see it written out as a page of sheet music. The output could use some hand formatting, but you can see the LSTM generation as a musical score here.
I initially thought of splitting the generation process in half: creating first the piano accompaniment, and then passing this to a second network that would generate a melody overtop for the violin. However, I really wanted the violin and piano parts to be more interconnected. In great chamber music, the violinist will begin a melodic thought, and then pass it to the pianist who will finish it, and vice versa. In order to capture this, I decided to ask the LSTM to generate both parts simultaneously.
My other concern was that the LSTM would never suggest violin notes. In chamber music, a pianist might average 10 notes or more for every one violin note. I worried that violin notes would be so unlikely that the LSTM would never propose them. This did turn out to be a problem at first, but I was able to overcome this by changing the embedding size and adjusting the way I sampled the output.
Preprocessing and Data Augmentation
Taking the data from midi format to LSTM input proved to be more difficult than I was expecting. For piano solo repertoire, I only needed to collect each note’s pitch and timing offset. The duration of a piano note does matter, but it isn’t essential, and the music sounds decent even if all the notes are played as quarter notes.
Violin, on the other hand, frequently holds notes for a long time. Additionally, the multiple-instrument files were often in slightly different formats. Some had many tracks, others listed a flute (or other melody instrument) instead of a violin. It took some work to create code that could handle any general case.
Lastly, I had to reconsider the way I was augmenting the data. For piano solo, I was taking every piece and also transposing it into every possible key (since I’d already translated everything into an input array, this was very simply a matter of shifting the array up or down). I think this is generally a decent approximation for the piano: even though there are probably a few pieces in the training set that would be physically unplayable, for the most part a pianist is able to play a piece in a different key.
For violin, this is not automatically the case. First, the violin doesn’t go below the open G of its lowest string. Secondly, there are much more specific constraints on which pairs of notes the violin can play at once (to play two notes at once, the violinist must be able to play one note on one string, and one note on a neighboring string). Often times these double stops (or even triple stops – when three notes on three different strings are played at once) depend on one of the strings being open (ie, the sound of the string when the violinist doesn’t use any fingers).
However, while I had a very large set of piano solo midi files, the violin/piano set is much smaller. I’m currently working on how to deal with this in the best way. Possibly I’ll train on a large set of data (including cello/piano, flute/piano, and violin/piano transposed into unrealistic keys), and then I’ll fine tune on the smaller set of actual violin/piano pieces. Or I’ll need to think of a new way to augment the data.
I’m including this audio as an example of how frustrating the violin/piano preprocessing was (and mostly because this audio file cracks me up – I’ve played piano for many different violinists in the past, but never anyone quite this bad!) After a while thinking I’d accidentally allowed a transposing clarinet into the training data, I finally tracked down a bug in the way I was doing the data augmentation. Lesson learned – always double check the results of data augmentation, even when you think you’re just doing something small and obvious that you couldn’t possibly mess up!
Upcoming…
We’re down to our last few weeks here in the OpenAI Scholars program, and we’re now all focused on our final projects. I’ll be continuing to focus on music generation. Next week I plan to add a music critic, to help select only the best generations. I’m also looking into creating some metrics to measure the creativity and musicality of the generations. After that I’ll be polishing up the code, and getting ready to open source everything.