There has been an incredible explosion in new AI tools over the past year, and the effects are being felt throughout the artistic world. Virtually every creative industry — from music and photography to illustration and filmmaking to writing and beyond — is starting to grapple with this revolution and the disruption that will follow.
And, as with any major change, it has elicited its fair share of both excitement and handwringing about the potential upsides and downsides. Lawsuits have already started flying against image generators. Powerful trade groups and labor organizations, including the Recording Industry Association of America (RIAA) and SAG-AFTRA, have launched campaigns to spark public debate about the use of these tools, and the Writers Guild of America (WGA) is reportedly pushing for a total ban of AI-generated content in film and television in their contract negotiations. World governments have begun reviews of their copyright frameworks and in some cases even implemented temporary bans on new AI tools. In short, there is an emerging consensus that change is coming and a good deal of fear about what that might mean for the future.
It’s easy to get stuck in the morass of fearful speculation, but, interestingly, when you talk to creatives who are actively experimenting with these new tools, you hear a lot of earnest excitement about what they are building and about the potential they see, even as they wrestle with the challenges and disruptions that are likely to follow their arrival.
For these folks, the concerns are legitimate but shouldn’t keep us from moving forward. “I think neither euphoria nor terror is sort of helpful in this context,” offers Felix Simon, a researcher at the University of Oxford who has been studying how AI tools are being adopted inside major newsrooms. “I don't think it's very likely that one could sort of put it back into the box and stop it from happening. This is not how technological development historically works and in the cases where we've tried it's been very difficult.”
When you talk to creatives who are actively experimenting with these new tools, you hear a lot of earnest excitement about what they are building and about the potential they see, even as they wrestle with the challenges and disruptions that are likely to follow their arrival.
Simon’s research suggests that the advance of AI in journalism is following a fairly traditional path for new technologies. Most media companies are moving slowly, experimenting first with powering analytics tools and personalization; as they move into the newsroom, the focus has been on AI-powered tools that can help replace or augment some of the more mundane (yet critical) tasks of reporting. “Take for instance, interview transcription, that's something AI is doing and it's doing it fairly well,” observes Simon, “Most people I know and I've talked to, they don't do it by hand unless absolutely necessary.”
But it's increasingly likely that these tools will find their way into more substantive roles inside the newsroom to assist human reporters in their work. “All the work that has been done around things like the Panama Papers, the Paradise Papers, pretty much any major investigation where you have a document dump in the sort of scale of terabytes and then you have to find stories within that. That's something where AI is incredibly useful because it allows you to extract entities and detect links,” says Simon.
While this disruption is coming, it can still be shaped. And that’s why Simon believes it is critical that creatives and the public at large get involved at an early stage to help decide how and where these tools are adopted. “It's not something that drops down from the heavens and suddenly everything changes. It very much depends on organizational incentives and missions and ideas. It depends on individuals, how they want to use it.”
He believes we have an opportunity to steer their development. “I think I generally have a skeptical take. Neither the sort of utopian promises. It will automate everything. It will take away all these jobs, it will make our lives so much easier, blah blah blah. Nor the dystopian view, it's going to end truth, we will be dominated by AI overlords in 10 years' time. That's fun, that's entertaining. But it's not necessarily helping us understand the technology better and making sure that the way it is going to be used is ethical and in line with human rights broadly considered.”
“I think neither euphoria nor terror is sort of helpful in this context. I don't think it's very likely that one could sort of put it back into the box and stop it from happening. This is not how technological development historically works and in the cases where we've tried it's been very difficult.”
And so, in many ways, progress requires us to push through the discomfort and explore these technologies. “I guess the more nervous you are, the more I think you should investigate it,” says filmmaker Paul Trillo. “I think people should scare themselves a little bit and just try to dip their toe in and create something that they never would have created before. Maybe something comes out of that fear and maybe you can tap into that fear and make something beautiful out of it.”
Trillo has been following the development of generative technologies for years and has gained a following online for his impressive demonstrations of the power of these new AI-powered tools. “I've always been curious to just try things I don't know anything about,” he explains, “but it was really when DALL-E first released their announcement video. I immediately knew that was going to be a game changer from a visual effects standpoint.”
He joined other early pioneers in this space, like fellow filmmaker Karen Cheng, and began experimenting in public with this new suite of tools with his online audience, sharing creative experimentations and demonstrating creative executions that sometimes even the creators of these tools didn’t anticipate. “I just found myself experimenting in a way that I had never really experimented before.”
Cut to December of this past year and he released a video in partnership with GoFundMe, which was created with an impressive fusion of live-action filming and AI-generated animation. The project was one of the first advertisements to heavily leverage AI-generated art and, despite the impressive technical feat, still managed to tell a compelling human story that relayed the brand’s message. And as he pointed out when the project was released, the project “wouldn’t have been possible just 6 months ago. The scope of something like this is beyond anything a small team could [otherwise] handle. This shows promise in how AI can enable independent filmmakers.”
“The more nervous you are, the more I think you should investigate it,” says filmmaker Paul Trillo. “Maybe something comes out of that fear and maybe you can tap into that fear and make something beautiful out of it.”
For Mexican architect Michael Rojkind, these tools open up new avenues of creativity. At his firm, Rojkind Arquitectos, they have been leveraging ideas from popular image generators, alongside traditional tools like his decidedly analog sketchbook.
Ironically, given his interest in using these tools for idea creation, his concern lately has been that the tools are getting too good and, as a result, becoming less useful for ideation. “The first versions of Midjourney, the iterations were beautiful because it created mistakes,” he explains. “So it interpreted a column that would blend into a slab that would become something else. And it didn't have the logic. Now that it is becoming a little bit more logical, it's losing a little bit of the playfulness. The most interesting projects come out of that [sense of play].”
That’s led some creatives to try building their own tools. During a livestreamed discussion in April, Daniel Bolojan, a professor and senior architect at Coop Himmelb(l)au, discussed how his research team is building its own AI tools for ideation, trained on their firm’s unique style. They believe it could potentially aid and augment their human design team throughout the development process.
For VFX artist Wren Weichman, this kind of AI augmentation represents a tremendous opportunity to open up new avenues for creativity and further reduce the barriers between our imagination and what we can create. “The democratization is incredible because it allows for more people to put out their ideas, for more people to tell their stories, to express their artistic visions.”
Weichman works at the production company Corridor Digital and is one of the public faces of their popular YouTube channels Corridor and Corridor Crew. These channels regularly post short films with impressive visual effects, as well as breakdowns of the tools and techniques (increasingly, machine-learning programs) that enabled them.
Members of the Corridor team have regularly been posting their experiments with these new AI-powered tools, including an incredible anime-style video led by team member Niko Pueringer, which received some pushback online from a number of illustrators and critics. One particularly harsh critic claimed the piece “isn’t just callous and craven — it's also dangerous.” Pueringer published a lengthy and thoughtful response to these criticisms in the comments of his BTS video, which broke down how they created the video.
Late last year, Weichman himself released a video provocatively titled “Why THIS is the Future of Imagery (and Nobody Knows it Yet),” all about NeRFs.
NeRFs — “neural radiance fields” — are neural networks that take a variety of 2D photos and use them to generate a 3D model of a scene. As Weichman explains, what is so compelling for VFX artists is that it's exceptionally easy to capture; renders accurate reflections and lighting in conditions that can be difficult for traditional workflows; captures rudimentary geometry, allowing you to add other 3D objects into the environment; and allows you to move around inside a scene, setting up the ability to edit camera moves in post-production.
And the technology is only going to improve. As Weichman says in the video, “I have a very strong belief that it's just a matter of time before no one will be able to tell the difference between a NeRF and actual video.”
“The democratization is incredible because it allows for more people to put out their ideas, for more people to tell their stories, to express their artistic visions.”
At the end of the video, Weichman shows how it could be used to replicate the iconic scene from Inception where Paris folds on top of itself. “I'm doing a poor man's interpretation of that very advanced effect,” he says in the video. “But it's going to require a 1,000th of the effort.”
As promised, while the result is not production ready as-is, it’s hard not to be impressed at what he was able to reproduce with a simple drone and his iPhone. In fact, one person impressed by the result was none other than two-time Academy Award-winning VFX supervisor Paul Franklin. A legend in the VFX community, he’s known for his boundary-pushing visual effects in films like Inception, Interstellar, and the Dark Knight trilogy.
“It was fun watching that video that the guys did, where they were using the NeRF to try and recreate the Inception shot. They did that extremely quickly with a small team of people,” marvels Franklin. “We couldn't possibly have done it with that level of resource when we did Inception back in 2010, 13 years ago. The folding city sequence — that was months of effort with a very substantial team of about 20 or 30 people just on that part of the film, capturing all the data. It took two weeks to LiDAR scan all the buildings in Paris. We had a separate photography team. The photographic team shot a quarter of a million stills on location to capture the textures and what have you. And then the actual constructing the asset out of that was months of work.”
And despite all of those resources, they were still quite limited. “So there wasn't a huge amount of room for experimentation, and certainly not with something which represented the final version of the shot. There are no alternative versions of that shot. So when I saw the Corridor guys playing with the NeRF and flying the cameras around, and the fact that now you could take the camera off this preset path, and now I think you can start putting things into these NeRF environments. The speed of that just opens up the creativity [and] the creativity becomes the most important thing.”
The democratizing potential of new technologies
Franklin’s long and successful career in this industry has given him a sense of perspective on how things change and the importance of being willing to adapt. “I'm always excited by these things because there's always something new to be learned,” he says. “I think it's important to keep abreast of some of the new things that are going on and not become just some ossified old fool.”
The potential, as these creatives see it, is for massive democratization driven by these technologies: more people will have the ability to build and create than ever before, and there will be an expansion of what is possible as it becomes easier to create. Essentially lowering barriers and raising the bar on what’s possible. And there are plenty of historical analogies that support their argument. Technology has long had a democratizing effect on art — expanding who could create and, in turn, influencing what they create.
“It's rather like when synthesizers began to appear,” Franklin argues. “Cheap synthesizers began to appear in the 1970s and ‘80s, and [all of a sudden] indie bands could start to afford to buy these instruments. If you wanted a string section, you no longer had to have a whole chamber orchestra with you. You could do it with a keyboard, and make these very rich, lush sounds. It transformed the way that modern music sounded. You've got this explosion of creativity that always comes out of that sort of thing. And that's what I mean by democratizing. It's giving it to the people. It's giving it to the creatives who don't necessarily have access to these incredibly expensive tool sets.”
“It's rather like when synthesizers began to appear,” reflects Franklin. “Cheap synthesizers began to appear in the 1970s and ‘80s, and [all of a sudden] indie bands could start to afford to buy these instruments. It transformed the way that modern music sounded. You've got this explosion of creativity that always comes out of that sort of thing. And that's what I mean by democratizing.”
And yet those periods of change and transition are often rife with skeptics.
In 1979, Disney released The Black Hole. The movie Westworld, released just six years prior, was the first feature film to use computer-generated imagery. The technology used in The Black Hole was still rudimentary and CGI generally was still in its infancy, but the critics didn’t hold back. A New York Times review at the time was less than impressed about the advent of this new era of computer-generated visual effects. “The first feature completely animated by computer is probably some time off. To many of the older Hollywood producers and production designers, images generated by computers look spiritless compared to animation done by hand.”
Computer-generated VFX advanced a lot in the following years (aided in no small part by traditional VFX artists who realized that their skills in models and puppets transferred over into computers), and a little more than 15 years later, Toy Story, the first feature film animated entirely by computers, was released in theaters.
Paul Franklin remembers it well: “I remember first seeing a clip of Toy Story at SIGGRAPH in 1995, and we were all in the Shrine Auditorium. The Disney logo comes up and everyone starts booing because it's Disney. [They were] the Death Star of the feature animation world. And then it's the opening scene of Toy Story when Buzz Lightyear first appears, and he's just been brought out of the box and everything. By the end of it, everybody was on their feet cheering because it was just such an amazing thing because this new world had been opened up. It was a confirmation to everybody in the room that computer graphics had really come of age as a medium in its own right.”
“Those of us who love films like Tron, Terminator 2, and The Last Starfighter — we'd always believed that it was a medium in its own way. But then this film came and confirmed it. So it led to a whole renaissance of the animation industry. And suddenly, there are more animators working today than ever before. And that's because computer graphics allowed us to do that and extended the reach of animators.”
“I remember first seeing a clip of Toy Story at SIGGRAPH in 1995. By the end of it, everybody was on their feet cheering because it was just such an amazing thing because this new world had been opened up. It was a confirmation to everybody in the room that computer graphics had really come of age as a medium in its own right. It led to a whole renaissance of the animation industry. And suddenly, there are more animators working today than ever before.”
Artists can’t stop technology, but they can shape it
Perhaps the biggest lesson we can draw from the past to inform our present situation is the importance of artists getting involved to help shape the development of these tools — and potentially birth something new and exciting that couldn’t be created before. “There is great value there, and I think artists should be engaging with these technologies in order to stake some claim on them before they get away from us,” says Claire L. Evans, lead singer of the pop group YACHT.
Several years ago, the members of YACHT —Evans, Jona Bechtolt, and Rob Kieswetter — set out on an ambitious project to produce an entire album — from the lyrics to the melodies to even the album title and album art — with the help of machine learning. They leveraged a patchwork of AI tools to painstakingly build their own system and train it on their back catalog. The album, which was released in 2019, was met with mixed reviews but would go on to be nominated for a Grammy. But those accolades can obscure how hard it was to leverage those early precursors to the tools many of us are just starting to explore today. As Evans recalls, “A lot of the tools we were using are essentially obsolete now, or else they’ve been incorporated, under the hood, into consumer software.”
As Evans explains, “A lot of the tools that we were using are essentially obsolete or have been baked into other technologies, or completely streamlined and invisible and under the hood in a lot of places now.”
The goal, as Evans explained at the time, was to go behind the gimmicky and create something inventive and new that couldn’t have been created the same way before. “We didn’t set out to produce algorithmically-generated music that could ‘pass’ as human. We set out to make something meaningful. Something entirely our own.”
The process of making that album — and the accompanying technical challenges, frustrations, elations, and moral and philosophical challenges — were all captured in an upcoming documentary called The Computer Accent, which started touring the festival circuit last year.
When Evans looks back at the project, she says she’s still unpacking how to feel about these new technologies and how artists should integrate them into their work. “Morally, I’m conflicted. I find that there are ways of engaging with these tools that are really interesting and useful to me, as a working artist—they’ve helped me to destabilize my process, interrogate things, and make music in ways that I wouldn’t have tried before. But that has to be a choice,” she says. “There are valuable ways of critically engaging with this technology, and I’m afraid that we lose something when we get distracted by frictionless tools that allow us to just type in a prompt and get something out and be done with it. I would rather use AI to make a creative process more complicated and interesting than to make it easier and more streamlined.”
Ultimately, for Evans, the pace of change is one of the most difficult parts of trying to make sense of this new reality. “I think the artist’s role,” argues Evans, “has been to metabolize new technology and make sense of what’s coming down the pipeline. We don’t even have time to understand the implications of these tools before a new one comes along,” she explained. “That’s my biggest concern at the moment, really.”
“Now it feels like you're drowning and always trying to keep your head above water with this creative FOMO that's happening,” Trillo agrees. “I'm trying not to get too caught up in that. I think it's okay to pause and just focus. Focus on ideas. I think the tools are going to be so rampant now that if you're trying to be in this rat race of just showing off the functionality of a new tool, you're going to get caught up in it, in an endless loop.”
“I think the artist's role,” reflects YACHT’s Claire Evans, “has been to metabolize new technology and make meaning out of what's coming down the pipeline.” But the incredible pace of change makes that a real challenge. “You don't even have time to internalize or metabolize what these tools are going to mean before a new one comes along,” she explains. “That's my biggest concern at the moment, really.”
The steady proliferation of new tools will only continue
And the pace of change is only increasing as the tools continue to be developed and become more and more commercially available. The explosion of these tools is perhaps most evident in film and the visual effects world. Take Runway, the hot video editing startup whose researchers helped contribute to the development of the foundational Stable Diffusion model, which has given rise to an entire class of generative image tools. Runway users can leverage 30+ AI-powered tools, from helping replace some of the more mundane tasks of modern editing (like removing background noise and generating subtitles) to truly generative tools that let you create video entirely from text prompts. And these tools aren’t just for hobbyists, they are increasingly being adopted into professional workflows. In fact, this year's award season darling Everything Everywhere All At Once, which won seven Academy Awards including Best Picture and Best Director, leveraged Runway’s rotoscoping tools in parts of the film.
The same is happening in acting. For years, an increasing number of actors have required their contracts to include digital touch-ups, known in the industry as “digital makeup” or “beauty work,” as a condition of signing on to a project. And deepfake technology is only getting better and more routine, appearing in a recent Kendrick Lamar music video, in the second season of The Mandalorian to de-age actor Mark Hamil on camera, to de-age the voice of James Earl Jones as a younger Darth Vader, and even by comedian Neal Brennan in a recent Netflix standup special to redo a take that had been flubbed.
Gone could be the days of bad dubbing in movies — actors will be able to speak fluently in any language and in their own voice, with their mouth moving convincingly in tandem. Expect the great (and often hilarious) tradition of English-speaking celebrities starring in commercials abroad to only increase as a result. In fact, just last year, deepfake technology company Brask (previously Deepcake) made headlines when it was revealed that its technology helped actor Bruce Willis star in an ad for a Russian telecom company without him ever appearing on set or needing to learn Russian.
And in the VFX world, startup Wonder Dynamics has grabbed a lot of attention in recent months with the closed beta of its new studio product, which claims to leverage AI to "automatically animate, light and compose CG characters into a live-action scene".
For Runway’s CEO Cristóbal Valenzuela, his goal is all about providing tools that help storytellers expand what is possible. “Runway exists to build tools for storytellers,” he told me this spring at their first AI film festival. The event featured shorts selected by a panel of judges — each of which leveraged different AI tools to aid in the storytelling. “I think these are tools for imagination. They're tools to eventually help you take your ideas to completion faster than ever before.” But for Valenzuela, while the new tools and technologies are exciting, the focus needs to be on what humans can create. “I think that there's a fixation of thinking of these tools as AI first or AI power tools,” he said. “I think I'm really looking forward to the time where we stop referring to this as AI tools.”
Ultimately, the goal of any creative project is to translate an idea from our heads into the world. The tools we have at any given moment mediate those ideas as well as they can, and it’s important to consider that how the tools evolve is perhaps less important than whether they reduce the barriers between thought and expression. “I think the biggest challenge is to get rid of previous preconceptions or assumptions of how things work,” says Valenzeula. “Inertia bias is you're always thinking about what you know, and that what you know is the good way of doing things, right? [If you speak] with filmmakers now, they're like, well, how film is done, this is the way. But if you go 50 years in the past, the person who is using an analog camera in a film will tell you that the way that we're using it, that's the way you do film. If you go 100 years in the past, someone else is going, the way you do film is like this, right? … The interfaces that we'll have are going to be totally different from the interfaces we've had in the past. It's normal that we feel some form of attachment to the tools we've used. But once you get used to it, it's the same thing I said before, you stop referring to it as AI tools, just the tools. They're tools to eventually help you take your ideas to completion faster than ever before.”
And that new world we’re headed towards will still likely look similar to the one we’re in, even with all of these innovations — we’re likely to see a fusion of old and new as in ages past. After all, as another contributor to this special is fond of exclaiming, “Everything is a remix”.
“It doesn't mean that the old stuff is completely invalid,” Franklin agrees. “I worked with Christopher Nolan for over 10 years. And Chris, one of the things he's famous for, aside from not allowing chairs on set” — he chuckles, before clarifying that rumor isn’t actually true — “is that he's a dedicated fan of shooting on film, and then grading his movie on film. There's a reason for that, which is that it works for his process. He loves the way that American independent cinema of the 1970s — films like Taxi Driver and Serpico and Dog Day Afternoon — the way that they look and feel. The use of film is inherent, is intrinsic to the way that those films look. Also, it puts the emphasis in its creative process in different places. So it means that the cinematographer needs to light the set and then call the printer lights. That's what the film looks like. He rarely stands in front of the monitor. He's always by the camera looking directly at the cast because he wants to look at the performance. But at the same time, Chris was always asking me, ‘Well, where can we go with this? What can we do to create Limbo City in Inception? What's the story we want to tell? Are there new ways of approaching this?’ He allowed us to push some pretty radical approaches to creating those worlds, which gave us something unique, which then fitted with his film as well.”
So for Franklin and many other creatives, the future is full of expanding possibilities for the arts, not limited ones. “Photography did not invalidate portrait painting, and people still paint,” he said. “When cinematography came along and then later on television, television didn't mean that filmmaking disappeared. And filmmaking didn't mean that still photography disappeared. So they're new things. All these things are all ways of expressing ourselves. It's just more ways to express ourselves, more ways to explore the world, more ways to expand that space that stories can exist in.”