Integrated Music Composition with MuseScore and Neural Networks
Project Overview
In the realm of music composition, this groundbreaking project seamlessly merges the creative instincts of human composers with the advanced capabilities of AI music generation models. My team and I developed a unique plugin for the MuseScore composition application, a leading platform used by composers worldwide. This plugin integrates a Recurrent Neural Network (RNN) that autonomously generates new measures of music based on an initial seed melody. The integration within the MuseScore environment allows composers to effortlessly utilize advanced machine composition tools, fostering innovation and expanding creative boundaries.
Objective
Music composition is a delicate balance of artistry and technical precision. Composers often seek fresh and innovative sounds, a pursuit that can be hindered by creative blocks and limitations of human imagination. As a composer myself, I constantly struggle with writer's block and have no solution other than to wait for inspiration to hit. Our project aims to obliterate these barriers by offering composers a robust, intuitive, and practical tool for generating new musical ideas. By intertwining a neural network with the open-source composition application MuseScore, we’re making cutting-edge AI music composition accessible and straightforward for composers across the globe. This innovative approach not only amplifies the diversity and richness of music creation but also democratizes advanced composition tools for everyone to use.
Related Work
Recent advancements in music generation leverage various innovative methods, achieving remarkable results. Taketo Akama's work integrates a unique long short-term memory (LSTM) architecture and musical domain knowledge to create symbolic music. Meanwhile, Google's MusicLM artfully crafts multi-instrumental sound from simple text descriptions. Similarly, the Theme Transformer employs a sophisticated process to compose full-length symbolic music pieces from brief theme melodies, closely emulating the human approach to music composition.
Furthermore, efforts like those by Sulun enhance the emotional depth in music generation by training models to compose based on specific emotions, using a specially augmented dataset. However, despite these significant strides, there's a noticeable gap: current technologies have not been designed to augment the human composition process, which raises concerns about the potential displacement of human composers.
Our project focuses on the MidiNet model, a convolutional generative adversarial network (CGAN), adept at transforming MIDI tabs of pop songs into complex digital music representations. This model not only competes well with MelodyRNN in generating realistic tunes but also excels in birthing fascinating melodies.
In a different vein, the Google Brain Team's Magenta Project introduced MelodyRNN, a model that utilizes recurrent neural networks (RNNs) to instill long-term structure in music. This project gave birth to the Lookback RNN and the Attention RNN, both designed to mitigate the inherent unpredictability of conventional RNNs, particularly for extended musical pieces.
For our music generation tasks, we employed the versatile Magenta model, thanks to its accessibility and comprehensive library. Additionally, to aid in the visualization of the melodies created, we utilized the MusPy library, an invaluable resource for data management in symbolic music generation. This tool significantly simplifies the preprocessing steps, turning raw files into network-ready data, and offers seamless integration with popular deep learning libraries like PyTorch and TensorFlow.
Approach
Our project harnesses the power of the several RNN models from Google's Magenta project, known for its efficacy in music generation. This model integrates with our plugin (as seen below) and generates the continuations of the seed melodies given by the plugin.
The user begins by selecting their desired seed melody that they wish to send to the model from their score. The plugin takes these notes, and converts them into a format that the model can interpret (MIDI) using csvmidi. After the continuation is generated, it is then converted back into csv using midicsv, and the continuation is displayed on the score.
The Interface
Here we can see the plugin window itself, which pops up after the user runs the plugin. Here the user can select how many measures they wish to generate, the complexity (or temperature) of the generation, and of course generating the continuation itself.
Results
We tested the plugin ourselves to see whether it was accomplishing the desired behavior of outputting a note sequence to the score. These tests were generally successful, since the plugin has functioned as desired. Below, we show the first four measures of Twinkle Twinkle Little Star, a commonly used input sequence for music generation.
Once the generate button was pushed, the following sequence appeared on the page. We can see that the model has clearly generated a melody and the input and output process between the generator and the plugin is working.
The Attention RNN model also has a parameter called temperature which controls how safe or crazy the model’s output will be, encouraging it to take more risks with a higher value. Given the first 4 bars of Twinkle Twinkle Little Star, the model generated the following 4 measures of music with a higher temperature. We can see that the generator’s output in this case is much less controlled and more exciting than the previously generated sequence.
The model also responded quite well to different musical time meters such as ¾ or ⅞, and produced the results shown below in a longer generated sequence. In this example, it appears that the model is actually completing four measure phrases, a hallmark of long term music structure.
We also tested the results of a few other MelodyRNN models including the Basic RNN and the Lookback RNN, but found the generated results to be the best with the Attention RNN. Thus all the results above are from the Attention RNN.
Limitations
Only notes longer than 16th can be generated by the model.
Triplets or any other kind of tuplet cannot be read from the model.
With the current model, only one staff/melody can be generated at a time. However, the plugin still supports a multi-staff input/output.
Future Work
There are many directions and improvements that can be made to this plugin, but some advancements and software updates need to take place before some progress can be made.
Improving how notes are added back onto the page
Currently, the generated notes are added to the page by converting the midi back to csv and manually determining note lengths and adding each note one at a time. The improved version would keep the model output as midi, and import that midi as a new file into MuseScore, copy the contents of that file, and pasting it back onto the original score. This would allow for more flexibility, including tuplets and 32nd notes.
Adding different models
In its current state, the plugin only supports Google Magenta's RNN models. Since the front-end can stay the same, it is possible to end different back-end architecture and perhaps more models/different transformers.
Adding multi-instrument support
This requires a different model, as the current model can only support 1 instrument at a time. However, the front end has already been built to handle this functionality, making it possible to only change the model used to gain this functionality.
Adding support for MuseScore 4
As of December 2023, this is not yet possible as the development team at MuseScore have yet to figure out a way to get WriteScore(), a function necessary for this plugin, working correctly. Until that happens, this plugin can only work with version 3.6.x.
Adding support for Dorico
Dorico has yet to release an API for their software, so adding support would completely change the structure of this plugin and require a significantly different approach as notes cannot be directly accessed from the page.
Adding custom prompt support
This feature would take the most amount of time, in addition to needing bleeding edge LLM and music generation technology. It would allow the user to enter a text prompt of what they would like the generated music to sound like, i.e. joyous or lamenting, and that would be fed to the model for a more nuanced generation.