Generative AI legal battles heat up

Generative AI legal battles heat up

More developments in the copyright case of generative AI and artists. The previous lawsuit has been amended and updated.

After a first round in which the judge refused a few arguments, things have gotten tightened up a bit.

  1. New artists – from photographers and game artists – have joined the lawsuit
  2. New arguments have been added:
    • In an effort to expand what is copyrighted by artists, the complaint makes the claim that even non-copyrighted works may be automatically eligible for copyright protections if they include the artists’ “distinctive mark,” such as their signature, which many do contain.
    • AI companies that relied upon the widely-used LAION-400M and LAION-5B datasets — which do contain copyrighted works but only links to them and other metadata about them, and were made available for research purposes — would have had to download the actual images to train their models, thus made “unauthorized copies.” to train their models.
    • The suit claims that the very architecture of diffusion models themselves — in which an AI adds visual “noise” or additional pixels to an image in multiple steps, then tries to reverse the process to get close to the resulting initial image — is itself designed to come as close to possible to replicating the initial training material. The lawsuit cites several papers about diffusion models and claim are simply ‘reconstructing the (possibly copyrighted) training set’.

This third point is likely the actual meat of the suit; but they haven’t spelled it out quite as sufficiently as I think they should have. To me, the questions that are really the crux of the question are:

  1. Do large-scale models work by generating novel output, or do they just copy and interpolate between individual training examples?
  2. Whether training (using copyrighted art) is covered by fair use or qualifies as a copyright violation.

Even if generative AI loses all of these arguments, it doesn’t mean generative AI is going away. They can still be trained on huge volumes of non-copyright images and data, or data that is purchased and licensed for the purpose. Even beyond that, companies have already been training models with data collected from their use (that you give to them for free by using devices like iPhone’s Siri, Amazon Alexa, and Google) and by generated synthetic training data.

Links:

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.