From RAIL to Open RAIL: Topologies of RAIL Licenses
Disclaimer: This post is not intended to be legal advice from any of the authors.
Background
The RAIL initiative was established in 2019 to advocate for the adoption of behavioral use-based restrictions in licenses and contracts for the purpose of mitigating the risk of harm from sharing AI technologies. The first Responsible AI Licenses (“RAILs”) were released for licensing source code and end-user software to demonstrate how behavioral-use restrictions could be included in licenses and contracts to restrict usage. Since the development and release of these first licenses, we have been working with the AI community to collect feedback and iterate. For example, we are working with the IEEE-SA on the development of a standard that RAIL developers can adopt. We also organized an AAAI Spring Symposium in 2020 to brainstorm and discuss aspects related to the creation, development, and adoption of RAILs. A paper at this year’s ACM FAccT also discusses the framework under which RAIL Licenses operate and the challenges that may lie ahead. However, we are encouraged to see growing traction in our efforts from the AI community at large.
BigScience RAIL License for the BLOOM Large Language Model (“LLM”)
Recently, we co-developed a RAIL License for the BLOOM LLM and related set of models as part of the BigScience Workshop. This RAIL License for the BLOOM LLM applies use-based restrictions on the use of the large language model and its derivatives. Note that this is a meaningful difference from licensing typical off-the-shelf software packages (eg, consisting of compiled binaries) or API services. Here, the model by itself is represented by weights/parameters conforming to a neural architecture implemented in source code/binaries. The RAIL License for BLOOM defines derivatives of the BLOOM models and checkpoints, and it includes aspects related to distillation and fine tuning (see more here and here ). However, the BLOOM License does not apply use-based restrictions to the underlying source code, which was previously obtained under standard open source terms.
The exercise of developing a RAIL license at the BigScience workshop opened up interesting real-world questions around (i) the nature of artifacts being licensed - i.e. the data, source, model, binaries/executables, (ii) what could constitute derivative works for each, (iii) whether the artifact’s license enables permissive downstream distribution of such artifact and any derivative versions of it (e.g. with commercial terms of any kind).
RAIL Licenses - Naming Convention
In essence, we could consider licenses associated with AI related artifacts to be RAIL Licenses if:
they include behavioral-use restrictions which disallow/restrict certain applications by the licensee; and
they require downstream use, including re-distribution, to include, at minimum, those same behavioral-use restrictions
Collectively, we refer to these as the “Use Restrictions”.
In this blog post we outline a new naming convention for RAIL Licenses that we hope the community shall find useful when conceptualizing and/or selecting their own Use Restrictions. We begin by discussing what the artifact being licensed, and subject to Use Restrictions, is : Is it the data (D)? The source code (S)? The model (M)? An application/service (A)? Combinations thereof?
Data: The dataset(s) used to pretrain or train an AI Model.Application/service: Any executable software code or application, including API-based remote access to software.
Model: Machine-learning based assemblies (including checkpoints), consisting of learnt weights and parameters (including optimizer states), corresponding to the model architecture.
Source: The source code and scripts associated with the AI system.
In order to easily distinguish licensing types using acronyms, we use the following representative naming conventions:
RAIL-D: RAIL License includes Use Restrictions only applied to the data
RAIL-A: RAIL License includes Use Restrictions only applied to the application/executable
RAIL-M: RAIL License includes Use Restrictions only applied to the model
RAIL-S: RAIL License includes Use Restrictions only applied to the source code
The nomenclature of each RAIL identifies what artifact the Use Restrictions apply to; licensing of AI artifacts in a RAIL license may be combined in various ways and should be listed in D-A-M-S order. For example, a RAIL License applying Use Restrictions to data, source code, models and applications/services would be referred to as a “RAIL-DAMS” license. Alternatively, a RAIL license applying Use Restrictions to the model and the source code would be referred to as a “RAIL-MS” license.
Open RAIL Licenses
Does a RAIL License include open-access/free-use terms, akin to what is used with open source software?
If it does, it would be helpful for the community to know upfront that the license promotes free use and re-distribution of the applicable artifact, albeit subject to Use Restrictions. We suggest the use of the prefix "Open" to each RAIL license to clarify, on its face, that the licensor offers the licensed artifact at no charge and allows licensees to re-license such artifact or any subsequent derivative works as they choose, as long as the Use Restrictions similarly apply to the relicensed artifacts and its subsequent derivatives. A RAIL license that does not offer the artifact royalty-free and/or does not permit downstream licensing of the artifact or derivative versions of it in any form would not use the “Open” prefix.
This table shows a comparison of Open Source terms, Creative Commons terms and RAIL terms, so readers can more easily discern how the former two and RAIL terms address both overlapping and orthogonal concerns:
License | Licensor permits modification and redistribution | Licensor requires source code be disclosed when re-used | Licensee must include copyright notice | Licensor includes Use Restrictions |
---|---|---|---|---|
GNU Affero General Public License v3.0 | Yes | Yes | Yes | No (OSI) |
Apache 2.0 | Yes | No | Yes | No (OSI) |
Creative Commons Attribution Share Alike 4.0 | Yes | No | Yes | No (CC) |
Creative Commons Zero 1.0 Universal | Yes | No | No | No (CC) |
MIT License | Yes | No | Yes | No (OSI) |
RAIL Licenses | May or May Not | May or May Not | Yes | Yes |
OpenRAIL-D | Yes | N/A | N/A | Yes |
OpenRAIL-A | Yes | No | N/A | Yes |
OpenRAIL-M | Yes | No | Yes | Yes |
OpenRAIL-S | Yes | No | Yes | Yes |
The table above is representative of certain OpenRAIL terms, and is not meant to be exhaustive. In particular, because open source requirements vary widely. If the licensor has in-licensed open source code and chooses to redistribute such code, the licensor should continue to ensure that such redistribution is in compliance with the original, applicable open source terms.
In the table above, the licenses require downstream users to comply with the terms identified with a “yes” - “OSI” refers to Open Source Initiative, whose definition of “open source” is our reference point in this table. See https://opensource.org/osd. CC” refers to Creative Commons, see: https://creativecommons.org/
In summary, a license which includes behavioral-use restrictions on the artifact being licensed may be termed a RAIL license if Use Restrictions (as defined herein) apply both to the artifact and any derivative works.
Further, we can utilize a simple naming convention for open versions of RAIL Licenses – in DAMS order – to specify the artifact(s) being impacted by Use Restrictions:
D: for data being licensed
A: for apps/binaries/services/executables or any non-source code form of the artifact
M: for models/parameters
S: for source code, including libraries and toolkits
Lastly, the naming convention proposed requires that RAIL Licenses which offer artifacts at no charge and allows licensees to re-license such artifacts or any subsequent derivative works as they choose to include the prefix “Open”.
The following flow-chart presents a visual summary of the suggested naming convention.
Discussion
The Open Source movement has been critical to the growth of transparency and reproducibility in science allowing people to license code easily and with understandable clauses. We introduce the idea of Open RAIL licenses as an attempt to provide practitioners with more control over how what they create is used, while also creating a simple and understandable mechanism for them to license material broadly and permissively. However, there were several design choices that have been made in their development that deserve deeper discussion. Data, apps, models and source code, each have their own considerations when it comes to licensing and we recognize the need for more deliberation on how RAIL licenses would need to be structured, organized as well as, tools and mechanisms to support developer adoption of RAIL Licenses.
We’d love to hear what you think!
Contact us here!
Acknowledgments: We would like to thank Julia Haines, Brent Hecht, Joe Lindley, Jesse Benjamin for their feedback to improve the clarity and structure of this blog post.