Read a couple of articles in the past few weeks on OpenAI’s Jukebox and another one on computer generated art, in Art in America, (artistically) Creative AI poses problems to art criticism. Both of these discuss how AI is starting to have an impact on music and the arts.
I can recall almost back when I was in college (a very long time ago) where we were talking about computer generated art work. The creative AI article talks some about the history of computer art, which in those days used computers to generate random patterns, some of which would be considered art.
More recent attempts at AI creating artworks uses AI deep learning neural networks together with generative adversarial network (GANs). These involve essentially two different neural networks.
- The first is an Art deep neural networks (Art DNN) discriminator (classification neural network) that is trained using an art genre such as classical, medieval, modern art paintings, etc. This Art DNN is used to grade a new piece of art as to how well it conforms to the genre it has been trained on. For example, an Art DNN, could be trained on Monet’s body of work and then it would be able to grade any new art on how well it conforms to Monet’s style of art.
- The second is a Art GAN which is used to generate random artworks that can then be fed to the Art DNN to determine if it’s any good. This is then used as reinforcement to modify the Art GAN to generate a better match over time.
The use of these two types of networks have proved to be very useful in current AI game playing as well as many other DNNs that don’t start with a classified data set.
However, in this case, a human artist does perform useful additional work during the process. An artist selects the paintings to be used to train the Art DNN. And the artist is active in tweaking/tuning the Art GAN to generate the (random) artwork that approximates the targeted artist.
And it’s in these two roles that that there is a place for an (human) artist in creative art generation activities.
Using AI to generate songs is a bit more complex and requires at least 3 different DNNs to generate the music and another couple for the lyrics:
- First a song tokenizer DNN which is a trained DNN used to compress an artist songs into, for lack of a better word musical phrases or tokens. That way they could take raw audio of an artist’s song and split up into tokens, each of which had 0-2047 values. They actually compress (encode) the artist songs using 3 different resolutions which apparently lose some information for each level but retain musical attributes such as pitch, timbre and volume.
- A second musical token generative DNN, which is trained to generate musical tokens in the same distribution of a selected artist. This is used to generate a sequence of musical tokens that matches an artist’s musical work. They use a technique based on sparse transformers that can generate (long) sequences of tokens based on a training dataset.
- A third song de-tokenizer DNN which is trained to take the generated musical tokenst (in the three resolutions) convert them to musical compositions.
These three pretty constitute the bulk of the work for AI to generate song music. They use data augmented with information from LyricWiki, which has the lyrics 600K recorded songs in English. LyricWiki also has song metadata which includes the artist, the genre, keywords associated with the song, etc. When training the music generator a they add the artist’s name and genre information so that the musical token generator DNN can construct a song specific to an artist and a genre.
The lyrics take another couple of steps. They have data for the lyrics for every song recorded of an artist from LyricWiki. They use a number of techniques to generate the lyrics for each song and to time the lyrics to the music. lexical text generator trained on the artist lyrics to generate lyrics for a song. Suggest you check out the explanation in OpenAI Jukebox’s website to learn more.
As part of the music generation process, the models learn how to classify songs to a genre. They have taken the body of work for a number of artists and placed them in genre categories which you can see below.
The OpenAI Jukebox website has a number of examples on their home page as well as a complete catalog behind their home page. The catalog has over a 7000 songs under a number of genres, from Acoustic to Rock and everything in between. In the fashion of a number of artists in each genre, both with and without lyrics . For the (100%) blues category they have over 75 songs and songs similar to artists from B.B. King to Taj Mahal including songs similar to Fats Domino, Muddy Water, Johnny Winter and more.
OpenAI Jukebox calls the songs “re-renditions” of the artist. And the process of adding lyrics to the songs as lyric conditioning.
Source code for the song generator DNNs is available on GitHub. You can use the code to train on your own music and have it generate songs in your own musical style.
The songs sound ok but not great. The tokenizer/de-tokenizer process results in noise in the music generated. I suppose more time resolution tokenizing might reduce this somewhat but maybe not.
The AI song generator is ok but they need more work on the lyrics and to reduce noise. The fact that they have generated so many re-renditions means to me the process at this point is completely automated.
I’m also impressed with the AI painter. Yes there’s human interaction involved (atm) but it does generate some interesting pictures that follow in the style of a targeted artist. I really wanted to see a Picasso generated painting or even a Jackson Pollack generated painting. Now that would be interesting
So now we have AI song generators and AI painting generators but there’s a lot more to artworks than paintings and songs, such as sculpture, photography, videography, etc. It seems that many of the above approaches to painting and music could be applied to some of these as well.
And then there’s plays, fiction and non-fiction works. The songs are ~3 minutes in length and the lyrics are not very long. So anything longer may represent a serious hurdle for any AI generator. So for now these are still safe.