Why CC0 for data
Context and problem statement
The license of the data can help alleviate and solve many issues that arise when working with data. And while our projects involve processing and managing data, we aren’t always the owners of the data so we can’t or aren’t able to make decisions on the license to use. However, we still would like to have a license that we recommend to others or that we use ourselves when we are the owners of the data. There are many licenses available, but given we are a project that follows open principles, we want to use a license that is open. So the question is:
What open license should we use for our own data and recommend to others?
Decision drivers
It needs or should be:
- A valid open license
- Easy to understand.
- Fairly widely used or recommended.
- Applicable to a wide range of data.
- A globally applicable license.
- Applicable to data itself, not the database architecture.
- Generally applicable or specific to scientific/research data.
Considered options
Licensing of data is quite a complex topic, and we are not lawyers. So we need to select licenses that are well established, widely used, and simple. For instance, facts (someone’s weight in kilograms) cannot be copyrighted and since generally research/scientific data is fact-based, it might not even be copyrightable, depending on the jurisdiction. For data (not database structure or organisation), there are two main “providers” of open licenses: Open Data Commons and Creative Commons. Within these, there are share-alike, non-commercial, and non-derivative versions of their licenses. These are not generally recommended for scientific data, as they can substantially limit the practical use of the data, even for purely academic purposes and even if everything is legal and applicable. For those reasons, we won’t consider them.
From this context, we consider these options:
- Open Data Commons Attribution License (ODC-By)
- Creative Commons Attribution 4.0 International (CC-BY-4.0)
- Open Data Commons Public Domain Dedication and License (PDDL)
- Creative Commons CC0
ODC-By
The Open Data Commons is a license that is specifically designed for data and databases. It is a simple license that allows for the data to be used freely, as long as the attribution is given to the original source.
Benefits
- It is specifically designed for data and databases.
- It is a simple license that allows for the data to be used freely, as long as the attribution is given to the original source.
Drawbacks
- It seems, according to an Open Data Commons discussion, that it is not being actively maintained and that they recommend using the CC-BY-4.0 license instead.
- Of the ODC licenses, it seems it is not as widely used. The ODC-ODbL is the more widely used of the ODC licenses.
CC-BY-4.0
With the 4.0 version of the license, it included clauses and clarifications that make it more applicable to data.
Benefits
- Has the same benefits as ODC-By, but is more widely used and recommended.
Drawbacks
- Unlike ODC-By, it is not specifically designed for data and databases, but rather for creative works.
- With data itself, it isn’t as clear nor unambiguous on how it applies to data.
ODC-PDDL
This is the public domain version of the Open Data Commons licenses. It means that any rights and conditions to the use of the data are removed.
Benefits
- Ensure very easy (re)use as the conditions are unambiguously clear: A researcher or user can use the data without any conditions.
Drawbacks
- Has the same drawbacks as the ODC-By license, meaning the ODC licenses aren’t actively maintained.
- Because of no conditions on the use of the data, attribution is not required, which is something that the research community relies on for career and funding purposes. However, the research community still generally has a practice of citing things they use, so this is not a major drawback.
CC0
This is the license to put data into the public domain and waive all rights.
Benefits
- It is the most widely used license for data, recommended by many organisations and repositories. Creative Commons even have a specific page for it.
- It ensures zero ambiguity about the rights of the data, as they are waived.
- Makes it much easier to use and re-use the data without any ambiguity.
Drawbacks
- Like the ODC-PDDL, because of lack of attribution requirements, users or researchers may not cite it if they use it, which can have funding and career implications for the data owners. However, the research community generally has a practice of citing things they use so this might not be important.
Decision outcome
Based on the above considerations, we decided to use the Creative Commons CC0 license for our data and recommend it to others. This is because it is the most widely used and recommended license for data, it is simple and clear, and it ensures that the data can be used freely without any ambiguity about the rights of the data. However, it is still important to note that use of this license doesn’t remove other legal requirements that apply to data, such as privacy laws or data protection laws.
Consequences
- We don’t foresee any major consequences of this decision.
Resources used for this post
- Open Data Licenses
- Guide to Open Data Licensing
- List of all open licenses
- Creative Commons wiki on data
- data.world license help
- Creative Commons Wiki on CC0 use for data
- figshare: How to choose a license for your data
- Post-publication sharing of data and tools
- Open Knowledge Foundation discussion on their Open Data Commons licenses