Recent discussions about creative AI have focused on how these tools enable us to do what we already do — but faster, more inexpensively, and for a wider group of people than ever before. But there are a large number of creative pioneers who aren’t interested in simply replicating what’s been done before. They have already moved on to explore how they can use these tools to create something truly unique.
Building an AI DJ to surf music’s latent space
Nao Tokui’s goals with AI are nothing if not ambitious. “My ultimate goal is to create a new genre of music,” he told me earlier this spring over a video call. It was early on a Friday evening in Los Angeles for me but already Saturday morning for Nao in Tokyo — he had just woken up when we connected. “You’re in the future,” I joked.
And it feels that way when you talk with Nao. In addition to being a DJ, he’s an associate professor at Keio University and founder of Qosmo Labs, a sort of R&D laboratory for experiments in creative AI. I was eager to discuss the development of his AI DJ project, which he has been working on for the better part of a decade.
“I started this project, the AI DJ Project, back in 2015. When I started, my DJ friends got angry,” he recalls. “They thought I was trying to automate everything, but that’s not [my] intention, of course. My intention was to use AI to extend my capability as a DJ. Through interaction with the AI, I get new ideas, what I can play, and how to be human in a creative process. In the future, with AI I think there will be a new form of music, or new form of art expression, which has never been possible without new technology.”
Early versions of the project featured a “back-to-back” style where the human DJ plays one track and then the AI DJ selects the next.
In these early sets, you’ll notice an adorable robot head bobbing along to the beat next to Nao. “We also found that the audience still needs [a] physical embodiment of AI DJ,” Nao wrote at the time, “to which they can project their emotional bond with the music. We have introduced a very minimalistic head-only robot, which keeps nodding to the beats. A GoPro camera attached to the robot provided the first-person view of the robot and gave an impression of ‘autonomy’ to the audience. The design of physical artifacts embodying the existence of AI is an open and intriguing question to investigate in the future.”
The results were impressive and captivating. The use of visual representations of the data being pulled into the model in real-time, including a camera view of the audience which is evaluated to gauge how well the audience is responding to the music, give the performance a sort of otherworldly feel.
Ironically, one of the challenges of this sort of feedback points to the limitations of AI to accurately interpret human emotion. As you might expect, audiences for these types of performance pieces have a lot to take in, between listening to the music and witnessing the technological feat playing out before them. The result is an audience that is a bit more stoic and staid than your typical electronic show. “They don’t want to miss anything, so they just stare at the screen like this,” Nao laughs, arms crossed, miming a quiet (if impressed) observer. “And so the AI reacts like, ‘Oh, my music selection wasn’t good,’ so AI starts selecting different kinds of music, or a different genre.”
The result is an amusing loop where the AI keeps trying new music, evaluating the audience, determining that the audience isn’t vibing, and trying new music. It essentially interprets the audience’s awe as disinterest. The project has evolved over time driven by Nao’s desire to push the envelope and create something truly unique. The most recent iteration, released last year, is a performance called “Emergent Rhythm” where the DJ system leverages a variety of different models to create a live output of raw rhythms and melodies that is then spontaneously mixed in real-time by the human DJ. As Nao describes it on his website, “The human DJ is expected to become an AJ, or ‘AI Jockey,’ rather than a ‘Disk Jockey’, taming and riding the AI-generated audio stream in real-time.”
As for how artists should think about these tools, Nao says he struggles to find the right metaphor. “I always struggle to define the wording. Maybe ‘collaborate’ is wrong. I don’t want to anthropomorphize AI. I also oscillate between this ‘tool, paintbrush or collaborator’ [metaphor]. It feels like surfing — riding a wave or something. AI can push you in some direction, but of course I can ignore this wave, or I can take and apply it.”
In order for other artists to follow in his footsteps and use these tools to push the boundaries of music, it’s important that they are able to understand and manipulate the underlying AI tools. “I still see a big gap between AI engineers, AI practitioners, and actual artists,” Nao says. That worries him because as he sees it, great art often comes from misusing technology. “The history of music is full of examples of the misuse of technology.”
As these tools become more corporatized, Nao worries that could put them even farther beyond the reach of artists. “It’s easy to misuse a paintbrush or a piano, but it’s getting more and more difficult to misuse new tools. AI models are getting bigger and bigger, and more and more complex. Nowadays, I cannot train ChatGPT or Stable Diffusion model by myself.” As a result, “many artists are getting more and more dependent on these tools. Your creative process can be defined by what Adobe provides as an AI tool.” Changing that requires tools to become more accessible to artists, something that he is working on. “It’s super important that this process is not governed only by big companies like Adobe or OpenAI.”
“It’s super important that this process is not governed only by big companies like Adobe or OpenAI.”
NAO TOKUI
Ultimately, Nao’s goal remains the same. At its core, it’s the deeply human search for novelty that animates so much creative discovery. “AI can find good music from a very big data set of music, but AI is not good at finding unconventional music selections. You can use AI to generate Bach-like music or Beatles-like music,” Nao says. “But at this moment, we cannot use AI to invent or create new Beatles or new Bachs.”
“I’m not interested in imitating/reproducing existing music or automating the process of DJing,” he wrote during the release of Emergent Rhythms, “Rather, I’m interested in creating a unique musical experience. I want to make something different.”
And he’s not alone. Another group of part-musicians, part-engineers called Dadabots are also building new tools to explore how AI can help music evolve.
These “musicians seduced by math” are hoping to push the boundaries of music and discover fusions between genres.
Dadabots was founded by musical technologists CJ Carr and Zack Zukowksi. They have a reputation as sort of the charming scofflaws of the AI music world. The pair met at Berklee College of Music and started collaborating after attending a hackathon at MIT in 2012. “Zack and I are both, we just have this satirical, absurdist sense of humor and I think it’s why we connected so much as friends in college,” explains Carr.
As for how to understand what Dadabots is, well, that can be hard to define. On their website, they describe themselves as “a cross between a band, a hackathon team, and an ephemeral research lab.” Put another way, they write, “We’re musicians seduced by math.”
Carr, often the public face of the project in media interviews, is a tour de force. When he jumped on a video call with Freethink last month, he showed up full of energy — computer screen overloaded with dozens of tabs full of white papers and audio samples from previous projects. You get the sense as he’s talking that his thoughts are moving so much more quickly than his mouth can keep up. It’s clear he’s passionate about his research and the work he and Zukowski are doing together.
They are unapologetic in their approach to generative music and their willingness to explore ideas that make many working musicians uncomfortable. While they admit they can be, at times, a lightning rod for controversy — “mischief is a huge motivator for us,” Carr acknowledges with a smile — you get the sense that they care deeply about what they are doing and about supporting a healthy and thriving artistic community, even while acknowledging that this research could have big implications for the industry at large. “We do the science, we engineer the software, we make the music. All in one project,” they explain on their website. “Don’t need nobody else. Except we do, because we’re standing on the shoulders of giants, and because the whole point is to collaborate with more artists. And in the future, if musicians lose their jobs, we’re a scapegoat. jk. Please don’t burn us to death. We’ll fight for the right side of history.. We swear…”
Their technique is focused on using neural networks trained on large volumes of music. “We focus on this type of audio synthesis called neural synthesis,” explains Carr. “Audio is just a sequence of amplitudes. Basically you’re just giving it, okay, this happened and then predict what happens next. And this very, very simple concept of predict what happens next is the thing that’s powering ChatGPT right now, like these massive language models that are blowing everyone’s minds so that their ability to write code, to write sonnets, to give you recipes very almost like the same principle just applying it to audio here and it works.”
One of their early projects was with a model trained on Kurt Cobain’s vocals. “We didn’t know if there was going to be noise or silence or some bug,” Carr remembers. “The first thing it did was scream and then it screamed. ‘Jesus.’ I had one of those spooky moments where I’m like — should I be doing this? This is too good. Am I crossing some lines here?’ I mean, that’s usually when you know you’re doing good art.”
Another early experiment came after OpenAI released Jukebox, a generative model that could produce singing in the style of various artists, which they promptly used to have Frank Sinatra sing the lyrics from Britney Spears’ Toxic. “And of course Frank Sinatra died long before the song was written. So it’s kind of like an impossible cover song.”
The result is equal parts amusing and uncanny — but it initially resulted in a copyright claim. Echoing the debates we are having today around vocal synthesis, it was initially unclear what should happen. “We were like, ‘I don’t think a copyright claim like this has ever existed before. Maybe we should try to fight it and see what happens.’ So we got help from the EFF. And they were like, ‘Yeah, yeah, no, this is fair use. It doesn’t actually sample from any known Frank Sinatra recording. And no one owns the style. So yeah, fair use.’ But obviously Britney Spears owns Britney Spears’ stuff, so this is more like a cover song. And then YouTube agreed, which is really cool.”
The group’s efforts have evolved over time. They are perhaps most well-known for a neural network trained on the work of death metal band Archspire which has been livestreaming AI-generated metal music non-stop, 24-hours a day for four years (with one brief break “when the server crashed because of an out of memory buffer error” that was quickly addressed).
While they are widely regarded as pioneers in this space, Carr doesn’t seem afraid to admit that they are as unsure as everyone else about what the future holds. The only difference is that they aren’t as uncomfortable navigating the controversy while we figure it out. “It’s a shifting landscape,” Carr admits. “I think people’s feelings are constantly changing and a lot of people don’t fully understand even how the systems work. I think it’s more interesting to try to test the limits of the world and see what happens.”
But this urge to experiment doesn’t mean they disregard artists. In fact, more often than not, they are working directly with artists. “I think one of the best parts of what we do is collaborating with artists,” says Carr. “As these tools are being developed and as this new art form is being developed, it’s important to have musicians in on the conversations around the development around it.”
“I think it’s more interesting to try to test the limits of the world and see what happens.”
CJ CARR, DADABOTS
In fact, by and large, they’ve found that many artists are excited about working with them to explore these new tools. “It’s like way easier to collaborate with the artists as a technologist than as a musician,” laughs Carr. “This is kind of the secret that we found. If I reach out to a band and I’m like, ‘Hey, I’m a session bassist, can I be on your next record?’ [they say] ‘Get in line.’ But I’m like, ‘Hey, I’m an AI researcher. I have this state-of-the-art technology.’ Most people that we reach out to cold say yes. So it’s been really fun.”
One of these unique collaborations was with UK beatbox champion Reeps One. There is a whole six-part documentary series on YouTube that followed the project and explored the whole range of themes that technologists and artists have been discussing these last several years with regards to AI. It was a true collaboration between Dadabots and the beatbox artist to see what they could create together, and, as these things often do, got into some equally uncomfortable and enlightening territory. “For Reeps, hearing his essence distilled and replicated by a machine was met initially with fear,” according to a recap of the project on the Dadabots website. “It’s creepy hearing your own voice like this. [But] that fear turned into excitement as he saw his bot doppelganger more as a collaborator. It produces strange beatbox patterns he’s never made before, inspiring him to push his craft further.”
Ultimately, Dadabots is focused on pushing boundaries and discovering fusions between genres. “Beyond just recreating styles that already exist, we’re really, really interested in making fusions. And this is actually a really, really hard problem, but some genres just seem to fuse really nicely,” explains Carr. “So like punk, deathcore and djent, because they’re all at the same tempo, they have similar backbeat grooves, they’re just different color palettes. They actually fuse together pretty well.”
Their latest project, which Carr hopes they can realize this year, aims to provide an interface — inspired by Every Noise at Once, an evolving map of musical genres — that would allow users to explore combinations of genres and even discover new genres in the spaces in between. “Now picture with me that you’re a DJ and instead of using CDJ or Pioneer or a TRAKTOR, you’re using this, this is your interface,” Carr says excitedly. “So you’re not just picking one of these genres, but you can pick a spot anywhere between the genres. You can mix genres together. You could even subtract genres from it. Like, all right, give me country minus folk — what does that sound like? Or minus blues? And this is just what maybe the biggest art project that I’m most excited about is making something like this and releasing it to people just like the a genre knob or in this case, a spaceship through the genre constellation.”
It’s a pretty radical and exciting vision, but Dadabots are on the frontiers of an emerging new technology, and the frontiers are where all the interesting things are happening, where the rules haven’t yet been written. In the documentary with Reeps One, Carr compared the potential of AI to a nuclear bomb. On the one hand, it contains all this incredible potential that can be harnessed to power the world. And at the same time, it has potential for unimaginable destruction. Ultimately, says Carr, “it’s up to humans to be conscious and compassionate enough to do one and not the other.”