The image above was created via Stable Diffusion with the prompt “lawyers in suits fighting robots with lasers in a futuristic, superhero style.”
Special thanks to Yisong Yue, professor machine learning at Caltech for providing me with valuable technical feedback on this post!
Looks like Matthew Butterick and the Joseph Saveri Law Firm are going to have a busy year! The same folks who filed the class action against GitHub and Microsoft related to Copilot and Codex a couple of months ago, have filed another one against Stability AI, DeviantArt, and Midjourney related to Stable Diffusion. The crux of the complaint is around Stability AI and their Stable Diffusion product, but Midjourney and DeviantArt enter the picture because they have generative AI products that incorporate Stable Diffusion. DeviantArt also has some claims lobbed directly at them via a subclass because they allowed the nonprofit, Large-Scale Artificial Intelligence Open Network’s (LAION), to incorporate the art work submitted to their service into a large public dataset of 400 million images and captions. According to the complaint, at the time this was the largest freely available dataset of its kind and Stable Diffusion was trained on it. Like the Copilot case, this one includes claims for:
- Violation of the Digital Millennium Copyright Act’s (DMCA) sections 1201-1205 related to stripping images of copyright-related information
- Unfair competition, in this case stemming from copyright law and DMCA violations
Unlike the Copilot case, this one includes additional claims for:
- Direct copyright infringement for training Stable Diffusion on the class’s images, including the images in the Stable Diffusion model, and reproducing and distributing derivative works of those images
- Vicarious copyright infringement for allowing users to create and sell fake works of well-known artists (essentially impersonating the artists)
- Violation of the statutory and common law rights of publicity related to Stable Diffusion’s ability to request art in the style of a specific artist
Like the Copilot case, this one has similar potential flaws in defining the class. The definition of the injunctive relief class and the damages class in the complaint doesn’t actually condition participation on injury. The class is defined as all persons or entities with a copyright interest in any work that was used to train Stable Diffusion. But, simply having work that is part of the training set doesn’t mean the work is 1) is actually part of the model, 2) actually outputted by the model (or a derivative of it is), or 3) outputted by the model in sufficient detail to still be subject to copyright. As in the Copilot case, the complaint explicitly states that it’s difficult or impossible for the plaintiffs to identify their work in Copilot’s output.
Since this case involves copyright infringement claims, it’s also surprising that the class is not limited to people with registered copyrights, but only to people “with a copyright interest” since people who don’t have registered copyrights cannot enforce their copyrights in court. Additionally, litigator Noorjahan Rahman has pointed out that some courts extend the registration requirement to DMCA enforcement as well, further weakening the plaintiffs’ chances of succeeding in defining the class this way and/or bringing either copyright or DMCA claims as a class action.
The Copyright Claims
The complaint includes a section attempting to explain how Stable Diffusion works. It argues that the Stable Diffusion model is basically just a giant archive of compressed images (similar to MP3 compression, for example) and that when Stable Diffusion is given a text prompt, it “interpolates” or combines the images in its archives to provide its output. The complaint literally calls Stable Diffusion nothing more than a “collage tool” throughout the document. It suggests that the output is just a mash-up of the training data.
This is a fascinating take because certainly the exact way the interpolation is done and how Stable Diffusion responds to the text prompts seem to be parameters that can range widely. The text prompt interpretation could potentially be very responsive to natural human speech with all its nuances or it could be awful. And interpolation, especially of 3 or 12 or 100 or 1000 images can be done in an unlimited number of combinations, some better, some worse. But, Stable Diffusion doesn’t interpolate a subset of training images to create its output: it interpolates ALL the images in the training data to create its output. Carefully calibrating the interpolation parameters to lead to useful, realistic, aesthetic, non-disturbing images is itself an art form as the Internet abounds with both excellent and horrible generative AI output images. It’s relatively easy to see improvements in Stable Diffusion’s output across their releases and those improvements are the result of tweaking the model, not adding more images to the training data. Even the various companies using Stable Diffusion as a library and training it with the LAION dataset, appear to be producing results with markedly different qualities. So, the claim that the output is nothing more than the input is deeply specious.1
I think this is a little bit like arguing that there is no difference between, say, listening to a randomly created mashup of the Beatles’ White Album with Jay-Z’s Black Album (creating abominable noise), and listening to Danger Mouse’s powerful and evocative Grey Album, which is a creative mashup of the two. Even if the output is a “mere” mashup, the exact manner of mashup still matters a lot and makes the difference between something humans recognize with joy as art, and something humans view as nothing more than noise. Danger Mouse may have made a mashup, but he also made a significant contribution of his own artistry to the album, creating an entirely different tone, sound, style, and message from the original artists, worth listening to on its own merits and not simply to catch pieces of the original works.
This question of how much of the output should be credited to the training data versus to the model’s processing of the training data should be at the heart of the debate over whether Stable Diffusion’s use of the various images as training data is truly transformative and thus eligible for copyright’s fair use defense (or perhaps even to the question of whether the output is eligible for copyright protection at all). I think it’s going to be easy for the defense to present alternative analogies and narratives to those presented by the plaintiffs here. The output represents the model’s understanding of what is useful, aesthetic, pleasing, etc. and that, together with data filtering and cleaning that general image generating AI companies do,2 is what the companies consider most valuable, not the training data.3 Except in corner cases where the output is very tightly constrained (like “show me dogs in the style of Picasso”) it may well be argued that the the “use” of any one image from the training data is de minimis and/or not substantial enough to call the output a derivative work of any one image. Of course, none of this even begins to touch on the user’s contribution to the output via the specificity of the text prompt.4 There is some sense in which it’s true that there is no Stable Diffusion without the training data, but there is equally some sense in which there is no Stable Diffusion without users pouring their own creative energy into its prompts.
Stability AI has already announced that it is removing users’ ability to request images in a particular artist’s style and further, that future releases of Stable Diffusion will comply with any artist’s requests to remove their images from the training dataset. With that removal, the most outrage-inducing and troublesome output examples disappear from this case, leaving a much more complex and muddled set of facts for the jury to wade through. The publicity claims and vicarious copyright infringement claims, at least as stated in this complaint, also fall away.5 It’s not clear if the lawsuit that remains is one the plaintiffs still want to litigate, particularly since the class is likely to be narrowed as well.
- This doesn’t cover the fact that Stability AI didn’t use the dataset in raw form. They have said they removed illegal content and have otherwise filtered it. Depending on the extent of this manipulation, they might be eligible for a thin copyright on the compilation that resulted, which would also erode the argument that the output is 100% work copyrighted by the plaintiffs.
- Note that this is highly context-specific. Images for general-scope image generating AIs are widely available and a giant subset of them is in the public domain. In other contexts, the data can be extremely valuable if it’s difficult to collect, requires human annotation or interpretation, etc. I think it’s really worthwhile to distinguish generative AIs that essentially draw on all of humanity’s knowledge in a certain domain (which we could also call our culture) from generative AIs that draw on more narrow sources of data which cannot be said to belong to all of us in the same way.
- Despite their self-proclaimed penchant for “open,” Stability AI didn’t hesitate to insist on a take-down when one of their partners made the model public.
- The Copyright Office currently does not allow works created with the assistance of generative AI to receive copyright registration. However, top IP attorney Van Lindberg is working on a case to reverse this position and a considerable number of IP experts believe that he may ultimately succeed. The idea that no part of a work can be copyrighted just because some of the work was created with the help of generative AI tools doesn’t seem like it will stand the test of time, especially as such tools make their way into the standard set of tools artists are already using, like Photoshop. Such a victory would cast further doubt on the plaintiffs’ broader claims that every Stable Diffusion output is merely the sum of its inputs, and therefore the images used here were merely “stolen” (i.e. copyright infringement) rather than “transformed” (i.e. fair use).
- There is of course nothing prohibiting the plaintiffs for suing for various claims that occurred in prior releases of Stable Diffusion. But, because this is a class action and the lawyers are likely getting paid a portion of the damages awarded to the plaintiffs, the damages merely for the prior claims may not justify the legal expense of proceeding with the case.