Blog

PLI’s Artificial Intelligence Law Program 2024

Image courtesy of PLI, from left to right, Van Lindberg, Peter Schildkraut, me, and Ron Eritano

Glynna Christian and Van Lindberg co-chaired a terrific and very comprehensive two-day program “Artificial Intelligence Law” at PLI this January in New York. I served as faculty for two of the panels, “Update on the Regulation of AI,” and “Contracting Around AI: Important Considerations.”

The first panel covered the EU AI Act, the Executive Order on AI, state-level AI legislation, the prospects of federal legislation, and briefly touched on legal regimes in other countries. We didn’t have slides that for this panel, but I can share some of the related resources here:

  • Co-panelist Ron Eritano’s Normandy Group put together a summary of the federal AI-related legislation proposed in the 118th Congress
  • Everything I covered with respect to the EU AI Act is also available in Part 2 of my post “Choose Your Own Adventure: The EU AI Act and Openish AI”
  • The Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence
  • Co-Panelist Kristen Johnston pointed us to a great resource maintained by EPIC, tracking AI-related law across all the states

The second panel covered unique AI-related risks, contractual risk-mitigation measures, vendor screening and vendor management. Glynna Christian and her firm Holland & Knight, were kind enough to allow me to reprint a suggested clause Glynna included as part of the slide deck for this panel. It is a sample customer-favorable clause on the use of AI technologies:

[Except as otherwise described in each SOW,] Vendor represents and warrants that it will not perform any Services that uses or incorporates, in whole or in part, any AI Tools (or depends in any way upon any AI Tools), including without limitation, any collection or processing of any [Customer Data or Personal Information] using any AI Tools. “AI Tools” means any and all deep learning, machine learning, and other artificial intelligence technologies, including any and all (i) algorithms, heuristics, models, and methodologies, whether in source code, object code, human readable form or other form, (ii) proprietary algorithms, software or other IT Systems that make use of or employ expert systems, natural language processing, computer vision, automated speech recognition, automated planning and scheduling, neural networks, statistical learning algorithms (like linear and logistic regression, support vector machines, random forests, k-means clustering), or
reinforcement learning, and (iii) proprietary embodied artificial intelligence and related hardware or equipment.

With respect to any and all AI Tools described in an SOW approved by Customer, Vendor further represents and warrants that:

(a) each applicable SOW accurately identifies and fully describes all AI Tools;

(b) the AI Tools will (i) perform with a high degree of accuracy in accordance with the Specifications and (ii) not produce materially inaccurate results when used in accordance with the Documentation;

(c) Vendor will monitor the performance of the AI Tools to ensure continued accuracy in accordance with the Specifications, including processes and policies for the regular assessment and validation of the AI Tools’ outputs;

(d) Vendor has obtained, and is in compliance with, all rights and licenses necessary to use all AI Tools as described in the applicable SOW;

(e) Vendor has complied with all the Laws [and industry standards] applicable to (i) Vendor’s development and provision of all AI Tools as described in the applicable SOW and Customer’s use of all of the AI Tools as described in the applicable SOW;

(f) [Vendor will comply with all Customer policies and procedures relating to the use of AI Tools];

(g) Vendor will notify Customer at least [X] days’ prior to any [material] changes pertaining to any of the AI Tools (in whole or in part);

(h) Vendor will cooperate and comply with Customer’s privacy, security, and proprietary rights questionnaires and assessments concerning all such AI Tools and all proposed changes thereto;

(i) Vendor will, upon Customer’s request, allow Customer (or its agent) to audit or review all Services for usage of AI Tools and will provide Customer with all related necessary assistance;

(j) there have been no interruptions in use of any such AI Tool in the past [X] months;

(k) Vendor (i) retains and maintains information in human-readable form that explains or could be used to explain the decisions made or facilitated by the AI Tools, and Vendor maintains such information in a form that can readily be provided to Customer or Governmental Authorities upon request;

(l) Vendor maintains or adheres to [industry standard] policies and procedures relating to the ethical or responsible use of AI Tools at and by Vendor, including policies, protocols and procedures for (i) developing and implementing AI Tools in a way that promotes transparency, accountability and human interpretability; (ii) identifying and mitigating bias in training data or in the algorithmic model used in AI Tools, including without limitation, implicit racial, gender, or ideological bias; and (iii) management oversight and approval of employees’ use or implementation of AI Tools (collectively, “Vendor AI Policies”);

(m) there has been (i) no actual or alleged non-compliance with any such Vendor AI Policies; (ii) no actual or alleged failure of any AI Tool to satisfy the requirements or guidelines specified in any such Vendor AI Policies; (iii) no Claim alleging that any training data used in the development, training, improvement or testing of any AI Tool was falsified, biased, untrustworthy or manipulated in an unethical or unscientific way; and no report, finding or impact assessment by any employee, contractor, or third party that makes any such allegation; and (iv) no request from any Governmental Authority concerning any AI Tool.

Co-panelist Jason Mark Anderman also contributed a customer-friendly clause to the slide deck related to training, confidentiality and privacy:

Training and Instance Confidentiality Limits. Company will ensure that the Services and Software, provided via a third-party cloud (“Cloud Service Provider”) and AI environment (“Cloud AI Service Environment”), shall maintain strict confidentiality and security of Customer Data constituting personal data or confidential information of Customer’s clients (“Personal/Client Data”). The Personal/Client Data will be securely retained within the specific, dedicated Cloud AI Service Environment allocated for the Company, and will not contribute to the training of the Company’s, or the Cloud Service Provider’s AI models, nor be utilized by any third party outside of the Customer’s expressly nominated clients (in writing). Upon receipt of a notice from Customer, Company will remove all Customer Data from the Cloud AI Service Environment. Company will ensure that the governing contractual terms (e.g., terms of service) issued by the Cloud Service Provider include provisions materially consistent with this provision, and will identify the forgoing to Customer. Company will allow Customer to first approve in writing a given Cloud Service Provider and its Cloud AI Service Environment, such approval not to be unreasonably withheld or delayed. If there is any conflict or ambiguity between this provision and the rest of the Agreement, this provision governs and controls.

Here are some other related resources:

  • My panel chair from the first panel, Peter Schildkraut, is the lead author on a great article describing the FTC’s settlement with Rite Aid regarding its use of facial recognition technology. It’s relevant because the settlement essentially spells out what the FTC thinks a good AI vendor management policy needs to include in order to avoid FTC charges of unfair and deceptive practices. The short of it is that I think it’s nearly impossible to do what the FTC requires unless a company either has staff with fairly deep AI or it hires consultants who do
  • Microsoft’s AI Security Risk Assessment

Choose Your Own Adventure: The EU AI Act and Openish AI

A copy of the EU AI Act leaked on January 22, 2024.1 The Act has since been unanimously approved by the ambassadors of each EU member country and is likely to officially go into effect in April. The Act exempts certain freely available AI/ML models2 from some of its obligations if they are under “free and open source licenses.” The Act only governs models put into real-world use and does not apply to AI models used or shared purely for scientific research or development. It therefore does not affect anyone’s ability to merely post models on public repositories. This post will examine the Act’s potential effects on providers of AI models, with a focus on “openish” AI models. Skip below to Part 2 if you don’t provide openish AI models, but want to better understand model providers’ obligations under the Act.

Part 1: Is My Model Under a “Free and Open Source License” Under the Act?

The TL;DR here is that the Act doesn’t actually bother to define exactly what it means for models to be under “free and open source licenses” and the Act is using this term in an idiosyncratic way. As you read the Act, you can silently replace every instance of “Free and Open Source License” with “Mystery License” in your head and you will have lost nothing by doing so. As best as I can tell just from the language in the Act, a model is under the Mystery License if the provider:

  • Doesn’t monetize it, including by charging for hosting or for support. Putting it up on HuggingFace or similar is fine, though
  • Releases it under a license* that allows for access, usage, study**, modification, and distribution** 
  • Makes its weights available
  • Provides information on the model architecture and model usage

* It’s unclear whether or not such licenses can have “field of use” restrictions that prohibit using the model for specific uses (like development of nuclear weapons or biometric identification).

** It’s possible that more AI-related artifacts (like training methodologies) may be explicitly required in the future. More on this below.

Read on if you’d like some colorful commentary around this term in the Act. Otherwise, just skip ahead to Part 2: What Does the EU AI Act Require of My Model?.

Review of the Plain Meaning and Course of Usage of the Term “Free and Open Source License”

Is There a “License”?

It’s not obvious that models, by themselves, are copyrightable. There is no major precedent or legislation in the EU (or in any other major market as far as I know), that says one way or another. My personal take is that the models are, more or less, just numbers, and contain no copyrightable human expression. The training protocols might qualify as patentable processes and the software used for training might be eligible for both patent and copyright protection, but the models are mere computer output. If the models aren’t copyrightable, then the legal documents attached to them aren’t licenses at all – they’re contracts.3 All of which is to say that it’s not clear there are ANY AI model “licenses” out there today, and also unclear whether or not there might be any in the future. 

Yet, it’s strange to imagine that if models aren’t copyrightable, or if they are dedicated to the public domain, that that should have any bearing on what transparency and safety steps providers of openish models need to take. My conclusion is that this is unlikely to be a dispositive term one way or another.

Is There “Source”?

Nope. Models aren’t code and therefore don’t contain any source code, which is generally what people are referring to when they refer to “source” in the context of either “open source” or software. Taken more broadly, the term “open source” might refer to the concept of publicly making available the underlying technology or artifacts that are used to build a final product (like design schematics for hardware), but there is no industry-wide consensus on what that might include with respect to AI models. In fact, the Open Source Initiative is still working on defining what “open” might mean in the AI domain and what would constitute the equivalent of source code for AI/ML models.

Is the License “Free and Open”?

Most people in the open source community would guess that this phrase refers to the definitions of free and open source software promulgated by the FSF and the OSI, respectively, or to a license approved by one or both of the organizations. However, none of the licenses currently approved by the FSF or OSI were intended for use with AI/ML models and they aren’t suited for the purpose. They’re open source software (OSS) licenses.

A number of openish AI-specific licenses have emerged, but none of the notable ones would meet the definitions of free or open source software (notwithstanding that models aren’t software) because they contain field of use restrictions which prohibit the models from being used for certain purposes (such as for the creation of biological weapons) or prohibit certain types of users (such as the military). Other licenses, like that for Llama 2, are really just free commercial licenses (“shareware,” if you’ll take a stroll down memory lane with me) and not “open” or “free” as traditionally understood by the open source community for a multitude of reasons. 

To the extent that OSI and FSF continue to categorically reject field of use restrictions, plenty of people are going to choose non-approved licenses anyway because the field of use restrictions are important to their ethics and/or because limiting downstream use can also limit their own liability with respect to the models,4 and protect their reputations. In the software realm, OSS licenses generally contain a disclaimer of warranties and a limitation of liability provision that applies to anyone exercising any of the rights granted in the license. When the software fails, the potential harm is generally borne by the licensee using the software,5 so those disclaimers and limitations are generally enough to immunize an OSS developer from liability. However, in the AI realm, AIs can cause serious harm to people who are not users or licensees of the AI provider – people who are denied loans based on AI-enhanced assessments by banks, for example. Since those individuals are not licensees, no disclaimers of warranty or limitations of liability apply to them. The only way an AI provider can attempt to limit liability with respect to those individuals is by prohibiting licensees from applying their AI models to risky uses in the first place.

AI developers also have a desire to release models under something like a “beta” or “eval” license so that others can test them before they are forced to decide if the model can be made available under a broader license or if they need to go back to the drawing board; that desire is more acute with AI than with software because the potential harms are so much greater and less predictable (no accounting software, for example, has accidentally tried to convince a user to divorce his wife). So, it’s not clear to me that even if the OSI and FSF managed to define what “free and open” might mean in the AI domain in the near future, that they will be seen as the vanguard for this definition. Few AI providers will be inclined to take on global liability for human deaths (and any number of lesser harms) just to suit the principles of these organizations. 

Is It Desirable to Use a License Already Approved by the OSI or FSF?

If a model provider has reason to put a model out under a license instead of dedicating it to the public domain, it’s a gamble whether or not any of the licenses currently approved by the FSF or OSI6 are likely to help them achieve their goals. They would be better off using an AI-specific license to give users clear restrictions and obligations, particularly if they wanted to add AI-specific transparency obligations. Further, from a policy perspective, I strongly suspect that the EU would prefer to see models licensed under something like the BigScience RAIL License than under Apache 2.0.

OSS licenses make reference to terminology that is not applicable to models (like “source code,” “binaries,” “build instructions,” “linking,” “macros,” etc.). Perhaps most importantly, copyleft open source licenses require that “modifications”  (as in the Mozilla Public License 2.0) or “derivative works” (as in the GNU General Public License 2.0) of the OSS code also be licensed under the same or similar license, but it’s anyone’s guess how a court might interpret these terms in the AI model context. Recognizing this is extremely important for any model developer who is drawn to copyleft licensing because fostering collaboration is important to them, or who really wants to ensure that anyone using their models only does so in conjunction with products and services that are provided under similar terms. 

“Modifications” often mean something like “additions or deletions to code.” That’s not a definition that works for AI models. “Derivative work” is a term that has a specific meaning in copyright law,7 is fundamentally inapplicable to a work that is not copyrightable, and in the software domain, depends on exactly how one piece of code interacts with another piece of code. That analysis takes for granted that the copylefted work and the larger or other work at issue are both pieces of software. The generally recognized consensus in the open source community is that if a software product uses or incorporates the output of a copyleft OSS package (output that is not code), but not the OSS package itself, the copyleft license of the OSS package will not extend to the product and the product is not considered a derivative work of the OSS package. If it were otherwise, it would be difficult to sustain copyleft text editors, for example. 

In other words, there is something of a blood-brain barrier between output and software when discussing the reach of a strong copyleft license. Applied to the AI model context, it would mean that copyleft training software doesn’t necessarily yield a copyleft model (though perhaps copyleft training data might) and that software receiving output from an AI model that is under a traditional copyleft license would not necessarily be affected by the license of the model either. These outcomes probably run counter to the goals that AI developers may have when placing their models under copyleft licenses.

If/when it is determined that models are copyrightable, the above-mentioned consensus in the open source community may or may not be relevant to any particular judge or jury and that particular consensus only answers some of the possible questions that may arise when deciding what is and isn’t a derivative work of a model under a traditional open source software license:

  • Are fine-tuned models derivative of the models that they tuned? (Starting with a softball!) 
  • What if you just publish a set of weights to swap out of the original model (a diff) but you don’t publish any of the weights in the original model? 
  • Does a product or service that uses a particular model constitute a derivative work of the model? Does it matter how important the model is for the product (if so, what is that bar and who sets it?)? 
  • What if the product has, say, three features, and uses a different model for each feature and each model is under a different license? 
  • What if multiple models under different licenses are all used for just one feature? 
  • What about models trained on the output of another model? 
  • A model trained by another model? 
  • A model trained using the same training methodology as another model? 
  • A model trained using the same training methodology and the same training data as another model? 

Takeaway

The phrase “free and open source license” and the constituent words in this phrase have no relation to AI. If this was supposed to reference licenses specifically approved by the FSF and OSI, it’s a strange reference since they haven’t approved any AI-specific licenses or published any definitions related to “open” AI, and it doesn’t make sense to push people to use the existing approved licenses for models. Since many, if not most, of the most popular freely available models out there aren’t licensed under true free and open source software licenses,8 but there’s every indication that this language was intended to refer to most of them, one can only conclude that the EU’s understanding of what constitutes a “free and open source license” is unique to the EU legislators drafting this Act.

Review of the Term “Free and Open Source Licenses” Solely Within the Context of the Act

Article 2, which critically addresses the scope of the Act, simply refers to models under “free and open source licenses.” Article 52c of Title VIII, addressing the need for authorized representatives in the EU, refers to models with:

 “…a free and open source licence that allows for the access, usage, modification, and distribution of the model, and whose parameters, including the weights, the information on the model architecture, and the information on model usage, are made publicly available.” 

Recital 60(f)9 similarly refers to AI models that:

“…are released under a free and open source license, and whose parameters, including the weights, the information on the model architecture, and the information on model usage, are made publicly available…” 

Recital 60i adds that:

 “The licence should be considered free and open- source also [emphasis mine] when it allows users to run, copy, distribute, study, change and improve software and data, including models under the condition that the original provider of the model is credited, the identical or comparable terms of distribution are respected.”10

So far, this term isn’t too convoluted. It’s more or less asking for licenses that grant broad use rights (like all OSI-approved licenses) and for providers to make model parameters, weights, model architectures, and model usage info publicly available. 

There is some ambiguity here, though, about what it really means to be able to modify or study a model if you are only provided with the items specified in the Act (and don’t have information on the training methodologies, for example). A coalition of entities with interest in openish AI wrote

“…to understand how a given AI system works in practice, it is necessary to understand how it was trained. This stands in contrast to traditional software systems, where access to the source code is typically sufficient to explain and reproduce behaviors. The training dataset, training algorithm, code used to train the model, and evaluation datasets used to validate the development choices and quantify the model performance all impact how the system operates and are not immediately apparent from inspecting the final AI system.” 

They identify critical artifacts necessary to study and modify a model that are not specified in the Act. Does the right to modify and study imply that model providers actually need to provide more than that which is specifically listed in the Act to be eligible for the “free and open source” exceptions? Some legal experts both inside and outside of OSI have voiced this position. To my mind, a standard of openness that requires providing everything necessary to rebuild a model mirrors the GPL’s requirement that those who receive GPL code also receive everything necessary to modify the code and put it back into use. But, I also think “openness” in the AI context should have gradations (just like open source has a variety of licenses), and that this broad approach is just one conceivable and valid interpretation of “open.” 

There is also ambiguity as to whether or not the license must be free of field of use restrictions in order to be a “free and open source license” under the Act. On the one hand, Recital 60i does open the door for licenses with certain conditions and nothing in the Act explicitly forbids field of use restrictions. It would be rather strange for an Act focused on mitigating harm from AIs to disincentivize people from licensing their AIs in ways that prevent them from being used for risky or dangerous purposes. Many providers are keen to limit their liability with respect to openish models (see more discussion of this above) and would not want to make their models publicly available if the only way to do so would be to accept unlimited liability with respect to harms suffered by individuals impacted by an AI’s activities or results. In the long run, such an interpretation probably puts a nail in the coffin for the possibility of an “open” AI in the image of open source software.11 But on the other hand, at least some lawmakers and regulators may want exactly that: it may be their intent to significantly shrink the “open” AI ecosystem.

 Recital (60i+1)12 is where it gets really confusing:

“Free and open-source AI components covers the software and data, including models and general purpose AI models, tools, services or processes of an AI system. Free and open-source AI components can be provided through different channels, including their development on open repositories. For the purpose of this Regulation, AI components that are provided against a price or otherwise monetised, including through the provision of technical support or other services, including through a software platform, related to the AI component, or the use of personal data for reasons other than exclusively for improving the security, compatibility or interoperability of the software, with the exception of transactions between micro enterprises, should not benefit from the exceptions provided to free and open source AI components. The fact of making AI components available through open repositories should not, in itself, constitute a monetisation.”

This set of requirements isn’t supported by any definitions or practices in the open source software domain. Of course, open source software must be provided freely (though developers can charge for the media it’s on and shipping), but open source software doesn’t cease being open source simply because someone offers support services or hosting services for it. The point of open source isn’t to foreclose private profits; it’s to ensure an end user’s rights to use and modify the code and to exchange that code and those modifications with others. Somebody’s offer of support or hosting services doesn’t hinder user rights at all. In the AI domain, offering such services also would not take away from the transparency, safety, and innovation benefits that are otherwise foreseen by the Act from open AI models. 

The bit about personal data looks inscrutable to me – looks like a drafting error. Do they mean to not exempt otherwise “open” models if they’re trained on data that has personal data? If they accept input that might have personal data? Either would ban every well known openish AI model out there from taking advantage of the free and open source exception. I have no idea what this is supposed to mean.

Part 2: What Does the EU AI Act Require of My Model?

All models need to comply with the transparency requirements in Title IV, Article 52, to the extent that they are applicable. Here’s a summary:

  • To the extent that human users interact with the AI system, they need to know they’re interacting with an AI
  • Outputs (if any) generated or modified by AI must be marked as such (there are some nuances here)
  • Deployers of emotional recognition systems or biometric categorisation systems that aren’t prohibited by the Act have to notify people that they are using such systems

Keep reading to see what additional obligations may apply. All the obligations discussed below are additive.

Is Your Model Designed for a Prohibited Use or Are You (Personally) Using a General Purpose Model for a Prohibited Use?

The EU AI Act prohibits uses of artificial intelligence that fall under Title II, Article 5, regardless of the type of model in question (“general purpose AI” or not, “free and open source” or not). Such uses may only occur for the sole purpose of scientific research and development. Here is a brief summary of the unacceptable uses:

  • Deploying subliminal techniques or purposefully manipulative or deceptive techniques
  • Exploiting vulnerabilities of people due to age, disability or specific social or economic situation in a way that causes harm
  • Biometric categorisation to deduce race, political opinion, and other sensitive characteristics
  • Social scoring
  • Real-time biometric identification in public spaces (with a bunch of caveats)
  • Predicting future crimes of individuals
  • Creating or expanding a facial recognition database by scraping images from the internet or CCTV footage
  • Inferring emotions in a workplace or educational institution, except for safety reasons

Beyond this inquiry, the Act bifurcates between “general purpose AI models” and models designed for specific use cases. The bifurcation happened because the Act was first drafted in 2022, before general purpose AIs made a big splash in the tech world, and at a time when the dominant idea of an AI was one that was specifically trained at one or a handful of narrow tasks. In 2023, numerous politicians (particularly from Germany and France) demanded that general purpose AI providers (aka foundational models) be exempted from the Act entirely because they didn’t want to stifle the growth of domestic AI companies. The bifurcation happened as a compromise, lowering the number and scope of obligations applicable to such providers. In particular, general purpose AI models do not need to be approved by regulators before they are put on the market, even very powerful ones, unless the provider itself puts it towards a high-risk use on behalf of itself or a customer. In that case, the provider would be subject to all the regulations attendant to both general purpose AI models as well as “high-risk” uses. This is true for both models that are and aren’t “free and open source licensed.”

Is Your Model Designed for a High-Risk Use or Are You Personally Using a General Purpose Model for a High-Risk Use?

With certain exceptions and nuances as expressed in Article 6, Annex III of the Act lists a number of AI uses categorized as “high-risk.” Generally speaking, these are uses of AI where the decisions they make or help others make have a significant impact on the course of an individual’s life and include decision-making regarding things like access to employment, education, asylum, essential private and public services, law enforcement, etc. High-risk use cases also include ones that pose physical risks, like the use of AI for critical infrastructure or as part of a safety system for a physical product or system. “Free and open source licensed” models designed for a “high-risk” use have to comply with all the same requirements as other models if providers want to put them on the market (commercially) or use them for their own benefit or that of a customer (including by using a general purpose AI model for a high-risk use13); in such case, this category of AI models carries the most onerous obligations under the Act: 

  • The model must go through a “conformity assessment” and receive approval before it can be put on the market
  • Affix a marking to approved systems to indicate they have passed the conformity assessment
  • Maintain a risk management system, including testing
  • Implement a data governance policy
  • Maintain technical documentation – small companies can use a simplified form. Must include level of accuracy, robustness and security; possibility of misuse and possible risks; explainability info; human oversight measures, etc.
  • Provide info to deployers necessary for them to use the system and comply with the Act
  • Provide human oversight
  • Maintain a post-market monitoring system
    • High-risk AI systems that continue to learn after being placed on the market or put into service must be developed in such a way as to eliminate or reduce, as far as possible, the risk of possibly biased outputs influencing input for future operations (‘feedback loops’) 
  • The model must be registered in an EU database
  • Meet accessibility requirements
  • Maintain a “quality management system” – a policy for complying with all obligations in the Act
  • Keep all documentation for 10 years
  • System must have automatic logs
  • Provider must report incidents of noncompliance and corrective action taken to regulators
  • Must appoint a representative to perform tasks under the Act if company is outside the Union
  • Create written agreements with vendors to ensure the provider can meet its obligations under the Act

Additional obligations will apply if your model is both used for a “high-risk” activity and is a general purpose AI model. However, I believe that the drafters of the Act imagined that it would be rare for a general purpose AI model provider to also be using their own model for a high-risk use. That might be true today, but it might not always be true. In particular, a general purpose AI doesn’t necessarily cease being a general purpose AI just because it is fine-tuned to perform better in a certain domain, so it seems possible that an AI provider may offer different flavors of their models, with some flavors specifically designed to perform high-risk activities. In the interest of completeness and because I enjoy being technically correct, I will frame this as a possibility.

Is Your Model a “General Purpose AI Model”?

All general purpose AI models need to comply with a subset of the transparency requirements in Article 52c, summarized below:

  • Put in place a copyright compliance policy
  • Provide a detailed summary of the content used for training using a template to be provided by the AI Office (whether copyrighted or not)

Is Your Model Under a “Free and Open-Source License”?

See Part 1.

Does Your General Purpose AI Model Pose “Systemic Risk”?

General purpose AI models are treated differently under the Act depending on whether or not they pose “systemic risk” due to their high impact capabilities. Models where the cumulative amount of compute used for training measured in floating point operations (FLOPs) is greater than 10^25 are deemed to be models with “systemic risk” per the Act by default, unless the provider can demonstrate otherwise. The Act also allows for regulators to add alternative criteria for determining whether a general purpose AI might cause “systemic risk.” Models in the “systemic risk” category are subject to a number of additional requirements:

  • Article 52d:
    • Perform model evaluations
    • Assess and mitigate systemic risks
    • Report serious incidents to the AI Office
    • Ensure cybersecurity
  • Appoint an authorized representative in the EU to coordinate/correspond with AI Office, etc. if your organization isn’t established in the Union

Further, the rest of the obligations under Article 52c related to transparency would also apply:

  • Create and keep up-to-date very detailed technical documentation, including training and testing processes and results of evaluation.
    • Notably this needs to include the model’s energy consumption
  • Create and keep up-to-date info and documentation for deployers of such AI systems to use

The obligations above related to general purpose AI models with “systemic risk” will be further spelled out in “codes of practice” to be developed within 9 months of the Act going into effect. These will be developed via collaboration by the AI Office, the Advisory Board, and the providers of such models. That’s where a lot of the real action will take place.

You’re Still Here?

Congratulations, you have made it through your chosen adventure. As you can see, while the concessions granted to general purpose AI models relative to other models designed for “high-risk” uses are fairly wide-ranging, the exceptions for openish models are actually relatively slim in comparison, especially because of the requirement that openish models not be monetized in any way. The lighter regulatory load for “free and open source licensed” models is likely to only be enjoyed by researchers at universities and non-profits, who truly don’t monetize the models in any way, and to a lesser extent by individuals. Companies that want to utilize openish models as part of their business strategy are unlikely to benefit from any regulatory leeway by doing so. 


The New York Times Launches a Very Strong Case Against Microsoft and OpenAI

It seems that The New York Times Company (“The Times”) got fed up with the pace of its negotiations with Microsoft and OpenAI over their use of The Times’ content for training and running their LLMs. So much so that The Times filed a post-Christmas complaint against the two, likely knowing full well they’d lay waste to the winter vacations of hundreds of people working for OpenAI and Microsoft. It might be the most well-known AI-related case to date because the case isn’t a class action and the plaintiff is globally recognized.

The complaint alleges:

  • Copyright infringement against all defendants (related to handling of the datasets containing content from The Times, handling of models allegedly derivative of the datasets, and the ultimate output)
  • Vicarious copyright infringement (the idea that Microsoft and various OpenAI affiliates directed, controlled and profited from infringement committed by OpenAI OpCo LLC and OpenAI, LLC)
  • Contributory copyright infringement by all defendants (the idea that the defendants contribute to any infringement perpetrated by end users of the models)
  • DMCA Section 1202 violations by all defendants regarding removal of copyright management information from items in the datasets
  • Common law unfair competition by misappropriation by all defendants (related to training AI models on The Times’ content and offering AI services that reproduce The Times’ content in identical or substantially similar form (and without citing The Times or linking to the underlying content))
  • Trademark dilution by all defendants (arguing the the AIs dilute the quality associated with The Times’ trademarks by falsely claiming certain content originates from The Times)

Unlike other complaints, this one doesn’t spend too much time explaining how AI models work or teeing up the analogies they plan to use in court. Instead, the complaint includes multiple extremely clear-cut examples of the LLMs spitting out The Times’ content nearly verbatim or stating bald-faced lies about The Times’ content. Many of the other complaints admitted they weren’t able to find clear-cut examples of infringing output, nebulously resting their claims on the idea that all output is, by definition, infringing. Here, Microsoft and OpenAI haven’t just used The Times’ content to teach the AI how to communicate, they’ve launched news-specific services and features that ingest both archived content and brand new articles from The Times. The other plaintiffs also weren’t able to argue that their specific content, out of the trillions of pieces of training data in the datasets, was particularly important for creating quality AIs. Here, The Times convincingly argues that its content was extremely valuable for training the AIs, both because of the quantity involved as well as the fact that the training process involved instructing the AI to prioritize The Times’ content.

This is probably the strongest AI-related complaint out there. I think it’s quite possible that a jury or judge angry at Microsoft and OpenAI for offering services that compete with and undercut The Times is more likely to also find that the training activities constituted copyright infringement and that the model itself is a derivative work of the training data, without thinking too hard about a scenario where the ultimate model doesn’t supplant the business or livelihood of the copyright holders in the training data. It’s definitely a case where “bad facts invite bad law.”

This case is also notable for the fact that it explicitly goes after the defendants for their AIs’ hallucinations. An AI summarizing a news event based on one or more news articles opens a Pandora’s box worth of debate about the line between uncopyrightable facts and copyrightable expression, as well as how/if those same standards should be applied to a computer “reading” the news. But the hallucinations aren’t facts; they’re lies. And even if the defendants prevail in arguing that the AIs are mostly just providing people with unprotectable facts, there’s very little to shield them from liability for the lies, both with respect to trademark dilution claims, but also with respect to potential libel or privacy-related claims that might be brought by other individuals. Copyright law can forgive a certain amount of infringement under certain circumstances but these other areas of law are far less flexible.

The other really interesting thing about this complaint is the extent to which it describes the business of The Times – how much work the journalists put in to create the articles, the physical risks they take during reporting, the value of good journalism in general, and The Times’ struggle with adjusting to an online world. The complaint paints a picture of an honorable industry repeatedly pants-ed by the tech industry, which historically has only come to heel under enormous public pressure and the Herculean efforts of The Times to continue to survive. It’s interesting because US copyright law decisively rejects the idea that copyright protection is due for what is commonly referred to as “sweat of the brow.” In other words, the fact that it takes great effort or resources to compile certain information (like a phonebook), doesn’t entitle that work to any copyright protection – others may use it freely. And where there is copyrightable expression, the difficulty in creating it is irrelevant. So, is all this background aimed solely at supporting the unfair competition claim? Is it a quiet way of asking the court to ignore the “sweat of the brow” precedent, to the extent that it’s ultimately argued by the defendants, in favor of protecting the more sympathetic party? Maybe they’re truly concerned that the courts no longer recognize the value of journalism and need a history lesson? No other AI-related complaint has worked so hard to justify the very existence, needs, and frustrations of its plaintiffs.

Unless Microsoft and OpenAI hustle to strike a deal with the New York Times, this is definitely going to be the case to watch in the next year or two. Not only does it embody some of the strongest legal arguments related to copyright, it is likely to become a lightning rod for many interests who will use it to wage a proxy war on their behalf. The case, and especially the media coverage of the case, will likely embitter the public and politicians even further against big tech, treating its success as a zero sum game vis a vis journalists and creators more broadly. It’s the kind of case that ultimately results in federal legislation, either codifying a judgment or statutorily reversing it. 

Yes, GitHub Finally Offered to Indemnify for Copilot Suggestions, But…

Back in September, Microsoft made a big splash announcing that it would be offering a copyright indemnity to all paying customers of its various Copilot services, including GitHub Copilot. But, Microsoft didn’t update any of its contracts to reflect this new copyright indemnity. Lawyers everywhere were mystified (ok, maybe just my friends…). It was very strange for such a public official announcement, coming straight from the Chief Legal Officer himself, to not also be accompanied by new contracts which would instantiate the commitment being offered. Claiming to offer an indemnity is one thing, but indemnities can be written broadly or narrowly and can include exceptions big and small. Given that GitHub already had a history of publishing misleading information about its legal protections, which I detailed here, I was curious to see what Microsoft came up with.

Close to a month later, GitHub finally published some new indemnity-related language. Section 4 of the GitHub Copilot Product Specific Terms was updated from this:

To this:

The General Terms remain the same:

As before, the IP indemnity only applies to paying customers, but now it explicitly covers not just use of GitHub Copilot, but also any IP claims related to its Suggestions. But, this update of Section 4 seems a bit hasty. When GitHub says the Suggestions are “included,” does that also mean that the Suggestions are subject to the “unmodified as provided by GitHub and not combined with anything else” carveout? As discussed before, in the context of how developers use GitHub Copilot, those exceptions are so large they threaten to swallow the indemnity whole. It’s entirely up for debate whether or not those exceptions are meant to apply to Suggestions as well or not. Further, with respect to GitHub’s mitigation measures, is GitHub also offering to replace the Suggestions with a functional equivalent? Again, I think it’s entirely unclear. 

But, let’s say for the sake of argument that GitHub is being generous and we should read these ambiguities as being resolved in favor of the customer. The big elephant in the room is that a lawsuit against a customer may or may not specify exact Suggestions that infringe the copyright holder’s IP rights. The plaintiffs in many of the existing class action lawsuits against various generative AI companies don’t allege any specific infringing output; they allege infringement generally, solely on the basis of their works being used to train the models. One alleges that the model itself ends up being a derivative work of the training data and goes so far as to say that all output infringes the copyright of the author of each piece of training data. 

Lawsuits with that posture are much harder to bring against Copilot customers since the customers didn’t train the model and didn’t handle any of the training data, but there is some uncertainty around whether the courts will accept that a model is a derivative work of the training data (and therefore, to the extent models are offered for physical distribution, making copies of the model also infringes the copyrights of the training data authors) or that all model output is a de facto infringement of the training data’s copyrights. So, if the plaintiff doesn’t specify infringing output and the customer has no way to track what was and what wasn’t a Suggestion, what would it mean for the customer to stop infringing if they lose the case or GitHub settles it on their behalf? Would the customer just have to stop shipping its product entirely and rewrite it so that there’s certainty it doesn’t include any Copilot Suggestions? The likelihood of such a claim against a customer or its ultimate success is probably low, but it’s not zero, and the cost of such an outcome is extremely high. It’s likely higher than the related copyright statutory damages. 

The issue here is that an injunction or agreement to stop infringing is an equitable remedy; it’s not monetary damages. That means that any revenue losses resulting from a customer needing to discontinue a product aren’t damages covered by the indemnity and are losses they would have to deal with alone. Such damages would be consequential damages for which Microsoft fully disclaims any liability. That potentially puts GitHub in a position where they make a settlement on a customer’s behalf that effectively ends their business, with no or limited financial repercussions for GitHub. Worse, GitHub would still be able to tell reporters that they “fulfilled their obligations to defend their customers against IP claims related to Copilot” and unless the customer is well-known, the customer’s ultimate immiseration may never become publicly known, especially if the settlement’s terms include confidentiality. 

It’s also worth noting that the new indemnity provision is strictly for IP claims and does not cover other types of claims, like those that might arise from security vulnerabilities introduced by the Suggestions. It also doesn’t cover some of the claims already brought against GitHub such as those under the DMCA’s Section 1202 related to deleting copyright management information or related to violations of the California Consumer Privacy Act. Either of those could potentially be brought against GitHub customers as well. 

Conclusion

Microsoft and GitHub’s new indemnity offer is an improvement over their previous offer, but the drafting leaves a lot of open questions about how it would apply in practice. Is that ambiguity intentional or just the result of drafting quickly under pressure? Overall, the specter of lawsuits against customers is likely overblown, but the worst case scenario I described here is quite bad. One remedy is obviously to ask GitHub for final approval over any settlement or any settlement that includes any conditions other than monetary damages. However, if that fails, customers might also consider something unusual: ban GitHub from making settlements subject to confidentiality obligations without the customer’s written consent so that GitHub will have its reputation to consider if it chooses to throw a customer under the bus in the name of a quick and cheap settlement.

Co-Chairing PLI’s Annual “Open Source Software – From Compliance to Cooperation” Program + Slides to New Presentation “Licensing ‘Open’ AI Models”

Our after-conference dinner. All that OSS knowledge at one table!

I co-chaired the Practising Law Institute’s annual “Open Source Software 2023 – from Compliance to Cooperation” program with Heather Meeker again this September. The program is a day-long continuing legal education event with a variety of open source licensing and compliance experts covering both introductory and advanced topics as well as recent developments in OSS licensing. 

As part of the program, Luis Villa, founder and general counsel of Tidelift, and I presented a session titled “Licensing ‘Open’ AI Models” (it’s called “Open Source and Artificial Intelligence” on the PLI site). We did a deep dive on what “open” AI licenses currently entail, the legal and technical pros and cons of using “open” AI models, the applicability of open source principles to the AI domain, and how “open” AI licenses interact with traditional OSS licenses. This presentation is useful for anyone thinking of using or releasing a publicly available AI model.

Aaron Williamson of Williamson Legal (former counsel for the Software Freedom Law Center and general counsel to the Fintech Open Source Foundation) and I did a session titled “OSS in Transactions, Licensing and M&A” where we took a close look at contractual provisions related to open source software and provided some advice on where and how they should be implemented. Our presentation was loosely based on a white paper we co-authored titled “IoT and the Special Challenges of Open Source Software Licensing,” which was published in the ABA’s Landslide magazine.