Did GitHub Really Just Offer to Indemnify You for Copilot’s Suggestions?

Short answer: Strictly theoretically, yes, but only if you execute the Corporate Terms of Service (not the one for individuals), potentially litigate a lot of unclear language in the contract in order to enforce GitHub’s obligation, and GitHub doesn’t spend all its money on other lawsuits first.

Long answer:

If, like me, you’ve been poking around GitHub’s website to get the scoop on all of Copilot’s new features, you might have stumbled upon an FAQ with the following information:

You can reach this FAQ via https://github.com/features/copilot/#faq-privacy-copilot-for-business and https://github.com/features/copilot/#faq as of May 10, 2023. Much ink has been spilled by lawyers and non-lawyers alike arguing about whether or not Copilot’s suggestions result in copyright infringement, so a promise by GitHub to defend its users in court against claims of copyright infringement related to Copilot output would be notable, and would likely persuade many potential customers to use the service.

If you click the link and go to the GitHub Copilot Product Specific Terms, though, you’ll find this:

To be clear, GitHub ISN’T offering an indemnity here. They’re saying that if your agreement with them happens to have one, then the following things are EXCLUDED from it. However, if you go to the GitHub default Terms of Service, you might be surprised to find that there’s no indemnity there for customers/users at all. The wording of the FAQ, and the fact that the information about the indemnity appears under the “Privacy – General” section and not under the “Privacy – Copilot for Business” section would lead a casual reader to expect this statement to be generally true of Copilot, not just for a subset of customers.

The Corporate Terms of Service do offer to indemnify you “against any claim brought by an unaffiliated third party to the extent it alleges Customer’s authorized use of the Service infringes a copyright, patent, or trademark or misappropriates a trade secret of an unaffiliated third party.” Let’s break that down, though. Does the “use of the Service” also include use of suggestions in your hosted service? That’s unclear. Does “use of the Service” also include your redistribution of suggestions in your own downloadable products? That’s even less clear since you’re not just including suggestions in your offering, you’re taking a separate step to ship it to third parties who will run it completely separately and apart from the Copilot services that initially built it. Certainly, the suggestions themselves could have been expressly referenced in this provision, but they are not and the definition of “Service” just says “GitHub’s hosted service and any applicable Documentation.”

Now let’s look at those exclusions. That language excludes claims based on code that differs from a suggestion provided by Copilot. Does a very slight difference void the indemnity? Potentially. It doesn’t say “materially” or “significantly” differs. And, perhaps more saliently, how is anyone supposed to tell 1. what code was written by Copilot and 2. if it was modified? I’m not aware of Copilot maintaining this kind of editing history as of today’s date (May 10, 2023). That means that at best one might be able to show that a particular suggestion was created by Copilot only if one managed to prompt Copilot to output the suggestion again (and that’s not necessarily evidence that the code at issue was output by Copilot back when the claim arose, only that it might have been output that way back then). The ability to do that is dubious since the underlying model and fine tuning are subject to change over time and since the suggestions are context-sensitive – they depend on the other info in the file that’s being worked on and even related files. So, those other artifacts may need to be re-presented to Copilot in that original state in order to elicit the exact same suggestion.

Moreover, it’s important to remember that the current case against GitHub does not actually allege copyright infringement. I discussed this before here, but it’s worth noting that both the DMCA Section 1202 claims and the claims related to violations of the California Consumer Privacy Act are claims that could potentially be brought against Copilot customers, not just Copilot. DMCA Section 1202 prohibits distribution of copyright management information that one knows has been altered or removed (and maybe customers “know” this because Copilot has already been sued for this? Certainly they’d “know” after the plaintiffs won on those claims?). Parts of the CCPA apply to a data holder even if they didn’t collect the data themselves and claims under other data protection laws in other states or other countries remain possible.

It’s also instructive to remember that the claims against GitHub and the claims against Stability AI related to Stable Diffusion (which does allege copyright infringement) are brought under a very “circumstantial” argument: the models are trained on copyrighted data, therefore all output from the models is a derivative work of the training data that infringers the authors copyrighted works, and all the other claims more or less flow from there. The plaintiffs specifically state in both complaints that it’s impossible to pinpoint particular instances of the output that might be infringing, so they are choosing to bring their theory of the claims without resting them on any specific pieces of output. Whether or not such a strategy will be approved by the courts is pure speculation, but it suggests the possibility of generative AI lawsuits where no specific code is at issue – the accusation could more or less begin and end with whether you’ve used Copilot to build your products. In that case, it’s not clear if or how GitHub’s indemnity exclusion might apply since, to put it literally, the claim is based on “code” (presumably the customer’s product as a whole) that “differs from the suggestions,” as it must because 1. many suggestions are likely modified, and 2. the product contains customer-written code that doesn’t contain and therefore differs from, the suggestions. (Tech transactions lawyers have likely noticed the “interesting” way this exclusion is drafted, which sidesteps the clearer and more traditional indemnity exclusion for “code/software/materials modified by a party other than the vendor” preferably paired with a “where such claim would not have arisen but for such modification”)

Lastly, in the wake of Johnson & Johnson’s baby powder class actions, it’s worth pointing out that the more serious the legal actions against Copilot’s customers or Copilot itself get, the less likely GitHub is to actually honor its indemnity obligation. If, for example, a court in any major market deems the entire enterprise of creating and/or using these models to be copyright infringement (or contributory copyright infringement, etc.), it’s possible that the first few people (or classes) to win litigation get a big payout from GitHub and then there’s no money left over for indemnifying subsequent customers. And even if there is money left over, GitHub could file for bankruptcy , which would likely mean customers are stuck defending themselves upfront and then trying to recoup just a portion of those expenses in bankruptcy court.

In conclusion, the indemnity on offer is not very exciting. Given the vagueness of the provision and its exclusions and the potential need to litigate in order to sort them out, I’d advise customers to ask for clarifying language, or at least a “prevailing party provision” that states that the loser in a lawsuit, or other dispute resolution settlement, must pay all or part of the winner’s legal costs – and make sure those fees are carved out of any limitation of liability provision. It would be unfortunate to lose the protection of the indemnity if the upfront costs of enforcing it against GitHub are just too high for a business to swallow.