OpenAI’s Massive Data Grab

I spent some time this week going through the whole suite of OpenAI agreements and policies and was quite surprised at what I found in their Terms of Use (TOU). It turns out that OpenAI’s confidentiality provision is unilateral: it includes confidentiality protection solely for OpenAI’s information. That means that neither the inputs provided to OpenAI nor the output it produces are treated as confidential by OpenAI. This is somewhat unusual in the context of SaaS vendors, who usually at least acknowledge that the data provided to them is confidential, even if they try to limit their liability with respect to keeping such data confidential over the course of the contract. Many companies are likely to be caught off guard by this provision. 

Companies the world over are looking to integrate OpenAI technologies, particularly ChatGPT, into their products and services. Microsoft has famously integrated a version of ChatGPT into its Bing search engine, with plans to integrate it with products and services throughout the rest of the Microsoft ecosystem. However, many companies will have to think twice about those integrations if OpenAI doesn’t change its tune. That’s because nearly every SaaS company has at least a subset of customers whose data they promise to treat as confidential, requiring them to pass along those confidentiality requirements to any third party who gets access to such customer data. If a vendor like OpenAI won’t treat any of a company’s inputs as confidential, then they are also not treating any input provided by a company’s customer or about a company’s customer as confidential, putting companies in violation of their own terms of use and/or non-disclosure agreements with their customers. Likewise, it means that companies cannot use OpenAI technologies to solicit or analyze certain categories of data about their staff, which the companies are bound to keep confidential either by law or by their agreements with their staff. That severely curtails the available use cases for OpenAI’s technologies. 

The unilateral nature of the confidentiality provision is easy to miss for non-lawyers. The TOU has a provision that says no input sent to or received from its APIs will be used to improve its services or train its models and further says that companies can opt out of the same for input or output sent in other ways. Many people reading the TOU are concerned about precisely this scenario because they don’t want third parties to get their hands on their data, so it’s easy to tune out the rest of the TOU after reading that, but what the TOU is also saying via the confidentiality provision is that every other use of input data and output data is still on the table for OpenAI, whether that means publishing the inputs or outputs, privately sharing them with third parties, aggregating them, analyzing them, etc. Lawyers can also draw distinctions here (some more dubious than others), like claiming that analyzing the input in order to better market the services, understand customers, or prioritize partnerships is not the same as analyzing the data to “improve the Services.” Such distinctions aren’t unusual in privacy policies and that kind of specificity is often lauded by regulators. Keep in mind that like the GitHub TOU discussed with respect to Copilot, the OpenAI TOU includes all of OpenAI’s affiliates, current and future, so there is no telling what the input data could potentially be used for in the future and as we’ve seen with Copilot, people can be unpleasantly surprised by unexpected uses down the line. 

The other element making the confidentiality provision easy to overlook is the fact that the TOU mentions the applicability of a Data Processing Addendum (DPA), principally used to help companies comply with the GDPR (Europe’s main set of data privacy regulations). Many reading the TOU will assume that the DPA protects their input and output data. However, the DPA only protects data that is also personally identifiable information. Again, provided that OpenAI strips the input of any personally identifiable information (which they explicitly state is a practice they perform), the input can be used by OpenAI without any confidentiality obligations. That should be particularly disconcerting for companies who make it their business to collect, analyze, and sell certain types of data. 

So, if you’re worried about what OpenAI might learn about your company, your customers or your employees, and who they might share that information with, think twice about accepting the OpenAI TOU as-is. Call them up and negotiate. 

3 thoughts on “OpenAI’s Massive Data Grab

  1. Kate, how should we interpret the 1st March additions to the API data usage policies, especially: “Any data sent through the API will be retained for abuse and misuse monitoring purposes for a maximum of 30 days, after which it will be deleted (unless otherwise required by law)”? Would this not exclude purposes other than abuse monitoring?


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s