Is Open Source Attribution Dead?

Virtually every common open source license such as the BSD, MIT, Apache, and GPL requires, at a minimum, that the copyright owners of the code be attributed when their code is redistributed. Licenses that merely require attribution and passing on the license are termed “permissive” licenses and licenses that additionally require providing source code are termed “copyleft” licenses. A few licenses out there are very specific about the attribution and require it in prominent places like a splash screen or even in advertising, but none of the most common licenses do so and most open source foundations as well as corporate entities have policies against using those licenses. They have largely been abandoned. Today, there is industry-wide consensus that open source attribution is sufficient if provided in documentation accompanying a product. 

The nature of that documentation has certainly changed over time, though. With the advent of package managers and the explosion of libraries that get pulled into most projects, significant products now have open source attribution files stretching into tens of thousands of pages (Here’s just a part of VMware’s vSphere product. It’s 15,385 pages and I crashed my MacBook Air twice trying to output it as PDF so I could count the pages for you!). Because licenses don’t get too specific about the readability of these attributions and the attributions are often provided in .txt or other fairly universal formats so that they are accessible by all, the documents aren’t easy to navigate. And because the attribution requirements in common OSS licenses are quite light, the attribution files typically don’t include descriptions of particular OSS packages, how they’re used in the product, what part of the product they’re used for, or how integral they are to the product. Commonly, packages are listed in alphabetical order but a tiny javascript function will get just as much prominence as an entire framework. Attribution files include copyright information, but they don’t include things like project/company logos or contact information. 

Long gone are the days when a product only used a handful of open source packages and attribution could easily fit into a product’s About screen which could reasonably be expected to be seen by most product users. The lists of packages are getting longer because of transitive dependencies, and additionally, individual OSS packages are now more likely to include other OSS subcomponents, with their own license and copyright information. The bloat occurs on multiple axes.

Compliance is now more challenging because not only does it involve digging up copyright and license information for the overall project, but it also involves digging up this info for the project’s subcomponents. Some projects do a good job of providing this information in one document, but most don’t. Most projects have secondary licenses scattered throughout their source code that are not acknowledged in a centralized location like a LICENSE, NOTICE or COPYING file. Projects from the Apache Foundation, for example, may note that certain subcomponents are under a certain license, but they won’t reprint that copyright and license information in their NOTICE file – they satisfy the requirement of passing on that information by passing on the the subcomponent’s source code when they make their own source code available. Finding this information either requires a tedious manual process or a complex scanning tool that often still requires manual checks and corrections and neither process is 100% accurate.* For this reason, many companies choose to only provide a project’s main license and decide to just risk it when it comes to secondary licenses. 

Downstream OSS consumers could likewise pass on an entire project’s source code in order to satisfy the attribution requirements of OSS licenses instead of putting together tedious attribution documents.  Some do, particularly in the container context, where it is much simpler and cleaner to just pass on all of a product’s OSS source code in an accompanying container rather than to try to locate this info and re-print it in one document. But it’s harder to bundle a lot of source code with a product in contexts where the size of the payload still matters (like on mobile devices) and in contexts where the source code is difficult to navigate and use (like on tiny screens built into refrigerators). Companies frequently provide source code on request or on websites separate from the products that use it in order to fulfill copyleft requirements around the provision of source code upon distribution. But, separating attributions from the product would technically violate a lot of licenses and most companies still try to avoid this.

However, the fact that perfect compliance can only really be accomplished by passing on the source code would seem to defeat the purpose of having permissive licenses in the first place. The license is no longer really permissive if it can only practicably be complied with by passing on the source code. Many developers choose permissive licenses for their work because they want it to be used as widely as possible and they specifically do not want to obligate users to additional source-code related conditions. A lot of them don’t think too hard about whether to pick a permissive license or whether to dedicate their work to the public domain because permissive licenses are very common and have been used for a long time. In contrast, public domain dedications were a tricky proposition before the CC0 1.0 Universal was released by Creative Commons in 2009, giving developers the right toolkit to properly make a public domain dedication and to safeguard that intention even in countries that do not recognize the concept of the public domain. If more developers understood that permissive licenses now function much more like copyleft licenses, it’s likely that many would opt for putting their work out under public domain rather than under a permissive license because the attribution is buried deep in documentation no one reads anyway and essentially just adds to the downstream user’s overhead.

Historically, permissive licenses haven’t seen much in the way of legal enforcement. Legal enforcement has really been focused on the GPL family of licenses, although some of those enforcement claims have obviously been tied to the fact that redistributors have failed to properly attribute the GPL code owners. In large part, that’s because people who chose permissive licenses in the first place were more concerned with spreading their work far and wide than they were with ensuring that downstream users kept the code “open” (because in that case they would have chosen a copyleft license instead). There has long been speculation that this might change and we might see some “attribution trolling,” wherein copyright holders start enforcing permissive licenses as well. That hasn’t happened yet, except perhaps in the context of a claim related to Copilot’s failure to attribute OSS owners when providing output to its users. 

On the one hand, some could see the attribution requirement as good and useful leverage as well as overhead that corporations should have to face, especially if they want to put a stick in Copilot’s wheel. But, I think others are ready to concede that that isn’t really what they intended or want for the industry, especially because this overhead tax is also legally required of individuals and non-profit OSS maintainers (no one is exempt from following third party OSS licenses). With a bigger push on the federal level for products to maintain a proper bill of materials for security purposes, we are already seeing more companies turn to upstream project maintainers and asking for better and more easily digestible information about code provenance. They, too, are likely to struggle with attributions.

The desire to get something in return for putting out quality code into the ether is understandable. Many developers care a lot about their own reputation and they contribute to open source at least in part to signal to other developers and potential employers that they have marketable skills. Of course larger projects run under the auspices of a non-profit foundation or even a corporation also want to burnish their brands. But, it’s hard to say anymore that traditional OSS attribution requirements are doing any of that for developers. Signal about great OSS projects is now coming from stars on GitHub, number of forks, tech blogs, Hacker News, etc. I’ve never heard of a developer checking out a new OSS project on the basis of merely finding its name among a sea of other OSS projects in someone’s attribution file.

There are much better ways for developers to build their reputations and brands. Merely including their contact info in their projects (perhaps in their license.txt or in header comments) may well be sufficient since few people have any interest or incentive in proactively stripping such information from the code they’re using. It’s important to note that downstream users want to be able to easily find new useful OSS; while putting together attribution files is expensive and time-consuming, they still want to know the origin of their software – they want to see what else the same developers have written, they want updates to the code, they want to know who to call if there are security issues. And they want to know who to hire, or in the case of other companies, who to buy. Certain companies have also emerged to track OSS usage globally and report on what’s commonly used and by whom, helping to give developers credit and pointing downstream users to useful projects. Package managers could also track this data automatically and make it public. In any case, it’s long past time for this problem to be solved via non-legal means.

* the process isn’t accurate for two reasons. The first is that locating all of a project’s license information isn’t easy. Some licensing information is in the header of a file, but occasionally info is hidden away in the middle of a file that’s the length of a book. The second is that people don’t all agree on exactly what needs to be reproduced. Do we have to reproduce the exact same copyright info and licensing info if the only difference between two such blocks is the copyright year? What if there’s an additional author? What if the text is substantially similar but uses slightly different wording? Plus, some projects put copyright info in every single file while others put it all in one place. Scanners and humans have a much harder time locating unique copyright notices and license information if each and every file in a project is marked rather than just the files with licensing information that differs from the main license of the project. 

Is Open Source Attribution Dead?

Virtually every common open source license such as the BSD, MIT, Apache, and GPL requires, at a minimum, that the copyright owners of the code be attributed when their code is redistributed. Licenses that merely require attribution and passing on the license are termed “permissive” licenses and licenses that additionally require providing source code are termed “copyleft” licenses. A few licenses out there are very specific about the attribution and require it in prominent places like a splash screen or even in advertising, but none of the most common licenses do so and most open source foundations as well as corporate entities have policies against using those licenses. They have largely been abandoned. Today, there is industry-wide consensus that open source attribution is sufficient if provided in documentation accompanying a product. 

The nature of that documentation has certainly changed over time, though. With the advent of package managers and the explosion of libraries that get pulled into most projects, significant products now have open source attribution files stretching into tens of thousands of pages (Here’s just a part of VMware’s vSphere product. It’s 15,385 pages and I crashed my MacBook Air twice trying to output it as PDF so I could count the pages for you!). Because licenses don’t get too specific about the readability of these attributions and the attributions are often provided in .txt or other fairly universal formats so that they are accessible by all, the documents aren’t easy to navigate. And because the attribution requirements in common OSS licenses are quite light, the attribution files typically don’t include descriptions of particular OSS packages, how they’re used in the product, what part of the product they’re used for, or how integral they are to the product. Commonly, packages are listed in alphabetical order but a tiny javascript function will get just as much prominence as an entire framework. Attribution files include copyright information, but they don’t include things like project/company logos or contact information. 

Long gone are the days when a product only used a handful of open source packages and attribution could easily fit into a product’s About screen which could reasonably be expected to be seen by most product users. The lists of packages are getting longer because of transitive dependencies, and additionally, individual OSS packages are now more likely to include other OSS subcomponents, with their own license and copyright information. The bloat occurs on multiple axes.

Compliance is now more challenging because not only does it involve digging up copyright and license information for the overall project, but it also involves digging up this info for the project’s subcomponents. Some projects do a good job of providing this information in one document, but most don’t. Most projects have secondary licenses scattered throughout their source code that are not acknowledged in a centralized location like a LICENSE, NOTICE or COPYING file. Projects from the Apache Foundation, for example, may note that certain subcomponents are under a certain license, but they won’t reprint that copyright and license information in their NOTICE file – they satisfy the requirement of passing on that information by passing on the the subcomponent’s source code when they make their own source code available. Finding this information either requires a tedious manual process or a complex scanning tool that often still requires manual checks and corrections and neither process is 100% accurate.* For this reason, many companies choose to only provide a project’s main license and decide to just risk it when it comes to secondary licenses. 

Downstream OSS consumers could likewise pass on an entire project’s source code in order to satisfy the attribution requirements of OSS licenses instead of putting together tedious attribution documents.  Some do, particularly in the container context, where it is much simpler and cleaner to just pass on all of a product’s OSS source code in an accompanying container rather than to try to locate this info and re-print it in one document. But it’s harder to bundle a lot of source code with a product in contexts where the size of the payload still matters (like on mobile devices) and in contexts where the source code is difficult to navigate and use (like on tiny screens built into refrigerators). Companies frequently provide source code on request or on websites separate from the products that use it in order to fulfill copyleft requirements around the provision of source code upon distribution. But, separating attributions from the product would technically violate a lot of licenses and most companies still try to avoid this.

However, the fact that perfect compliance can only really be accomplished by passing on the source code would seem to defeat the purpose of having permissive licenses in the first place. The license is no longer really permissive if it can only practicably be complied with by passing on the source code. Many developers choose permissive licenses for their work because they want it to be used as widely as possible and they specifically do not want to obligate users to additional source-code related conditions. A lot of them don’t think too hard about whether to pick a permissive license or whether to dedicate their work to the public domain because permissive licenses are very common and have been used for a long time. In contrast, public domain dedications were a tricky proposition before the CC0 1.0 Universal was released by Creative Commons in 2009, giving developers the right toolkit to properly make a public domain dedication and to safeguard that intention even in countries that do not recognize the concept of the public domain. If more developers understood that permissive licenses now function much more like copyleft licenses, it’s likely that many would opt for putting their work out under public domain rather than under a permissive license because the attribution is buried deep in documentation no one reads anyway and essentially just adds to the downstream user’s overhead.

Historically, permissive licenses haven’t seen much in the way of legal enforcement. Legal enforcement has really been focused on the GPL family of licenses, although some of those enforcement claims have obviously been tied to the fact that redistributors have failed to properly attribute the GPL code owners. In large part, that’s because people who chose permissive licenses in the first place were more concerned with spreading their work far and wide than they were with ensuring that downstream users kept the code “open” (because in that case they would have chosen a copyleft license instead). There has long been speculation that this might change and we might see some “attribution trolling,” wherein copyright holders start enforcing permissive licenses as well. That hasn’t happened yet, except perhaps in the context of a claim related to Copilot’s failure to attribute OSS owners when providing output to its users. 

On the one hand, some could see the attribution requirement as good and useful leverage as well as overhead that corporations should have to face, especially if they want to put a stick in Copilot’s wheel. But, I think others are ready to concede that that isn’t really what they intended or want for the industry, especially because this overhead tax is also legally required of individuals and non-profit OSS maintainers (no one is exempt from following third party OSS licenses). With a bigger push on the federal level for products to maintain a proper bill of materials for security purposes, we are already seeing more companies turn to upstream project maintainers and asking for better and more easily digestible information about code provenance. They, too, are likely to struggle with attributions.

The desire to get something in return for putting out quality code into the ether is understandable. Many developers care a lot about their own reputation and they contribute to open source at least in part to signal to other developers and potential employers that they have marketable skills. Of course larger projects run under the auspices of a non-profit foundation or even a corporation also want to burnish their brands. But, it’s hard to say anymore that traditional OSS attribution requirements are doing any of that for developers. Signal about great OSS projects is now coming from stars on GitHub, number of forks, tech blogs, Hacker News, etc. I’ve never heard of a developer checking out a new OSS project on the basis of merely finding its name among a sea of other OSS projects in someone’s attribution file.

There are much better ways for developers to build their reputations and brands. Merely including their contact info in their projects (perhaps in their license.txt or in header comments) may well be sufficient since few people have any interest or incentive in proactively stripping such information from the code they’re using. It’s important to note that downstream users want to be able to easily find new useful OSS; while putting together attribution files is expensive and time-consuming, they still want to know the origin of their software – they want to see what else the same developers have written, they want updates to the code, they want to know who to call if there are security issues. And they want to know who to hire, or in the case of other companies, who to buy. Certain companies have also emerged to track OSS usage globally and report on what’s commonly used and by whom, helping to give developers credit and pointing downstream users to useful projects. Package managers could also track this data automatically and make it public. In any case, it’s long past time for this problem to be solved via non-legal means.

* The process isn’t accurate for two reasons. The first is that locating all of a project’s license information isn’t easy. Some licensing information is in the header of a file, but occasionally info is hidden away in the middle of a file that’s the length of a book. The second is that people don’t all agree on exactly what needs to be reproduced. Do we have to reproduce the exact same copyright info and licensing info if the only difference between two such blocks is the copyright year? What if there’s an additional author? What if the text is substantially similar but uses slightly different wording? Plus, some projects put copyright info in every single file while others put it all in one place. Scanners and humans have a much harder time locating unique copyright notices and license information if each and every file in a project is marked rather than just the files with licensing information that differs from the main license of the project. 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s