[Bioc-devel] License question for experimental data package
Thank you all for the useful suggestions and links. I like the idea of using a CC0 license. That's likely what I will go for. Best, davide
On Fri, Mar 4, 2016 at 7:42 AM Tim Triche, Jr. <tim.triche at gmail.com> wrote:
I was going to mention droit d'auteur under EU common law, but somehow that seemed more in Herv?'s wheelhouse ;-). --t
On Mar 4, 2016, at 7:17 AM, Lyle Burgoon <burgoon.lyle at gmail.com> wrote: Also keep in mind US copyright rules for data are different from
European. We ran into this recently when wanting to publish european data from a web database.
On Mar 4, 2016 10:05 AM, "Tim Triche, Jr." <tim.triche at gmail.com>
wrote:
Data (facts) are not copyright worthy, but databases (collections of
facts) can be. See Feist v Rural for precedent; in short, there must be an inobvious and creative aspect to the database for it to be elevated to copyrightable status. I doubt that a collection of datasets would clear this bar, but it's still worth noting.
--t
On Mar 4, 2016, at 6:22 AM, Robert M. Flight <rflight79 at gmail.com>
wrote:
I am pretty sure in general "data" is not copyrightable per se ( http://www.lib.umich.edu/copyright/facts-and-data), so while I might contact the original authors as a courtesy, if the data has been
released
into any public database, then you should be free to do with it as you please. Providing the original accession numbers for the data and
relevant
citations (if they exist) so that it is easy for you and others to be
given
credit if the data is used would be a good thing to do. Also, I would personally go with the CC0 (waive of copyright, see https://wiki.creativecommons.org/wiki/CC0) for a data package, as
the data
is already publicly available, you have just packaged it together
into a
useful set. My 2 cents. -Robert Robert M Flight, PhD Bioinformatics Research Associate Resource Center for Stable Isotope Resolved Metabolomics Manager, Systems Biology and Omics Integration Journal Club Markey Cancer Center CC434 Roach Building University of Kentucky Lexington, KY Twitter: @rmflight Web: rmflight.github.io ORCID: http://orcid.org/0000-0001-8141-7788 EM rflight79 at gmail.com PH 502-509-1827 To call in the statistician after the experiment is done may be no
more
than asking him to perform a post-mortem examination: he may be able
to say
what the experiment died of. - Ronald Fisher On Fri, Mar 4, 2016 at 8:52 AM Kasper Daniel Hansen < kasperdanielhansen at gmail.com> wrote:
For data packages, which does not contain any code, it seems weird
to use a
software license such as GPL or GPL-2. It seems better to use
something
like Artistic-2.0 or one of the CC licenses. On Thu, Mar 3, 2016 at 5:15 PM, davide risso <risso.davide at gmail.com
wrote:
Hi Herv? and Sean, thanks for your help. It will indeed be interesting to hear how
other
people chose the license, especially for those package that
redistribute
a
dataset not from their lab. I do have an experimental data package in Bioc, zebrafishRNASeq,
but it's
an experiment from a collaborator and at the time I didn't pay much attention on which license to use. In this case, I'd like to redistribute data from different labs. I
guess
I
will contact the original authors at least as a courtesy. But I'm still keen to hear opinions on which license(s) is
appropriate
for
experimental data sharing. Best, davide On Thu, Mar 3, 2016 at 12:50 PM Herv? Pag?s <hpages at fredhutch.org>
wrote:
Hi Davide,
On 03/01/2016 02:25 PM, davide risso wrote: Dear Bioc developers, I recently downloaded three publicly available single-cell RNA-seq
datasets
from the NCBI GEO/SRA repository and created an R package with
some
gene-level summaries (read counts and FPKMs). I'm currently using the package locally for my own tests, but I'm
thinking
that this may be a useful resource for the community and thinking
of
sharing it on github and eventually submit it to Bioconductor. I was not involved in any way with the original studies, and I'm
wondering
what is the best practice in terms of license / data sharing.
Since
there
are many experimental data packages in Bioconductor, I'm guessing
that
I'm
not the first person wondering about this.
From the NCBI website, I read (quote from
https://www.ncbi.nlm.nih.gov/home/about/policies.shtml): Databases of molecular data on the NCBI Web site include such
examples
as
nucleotide sequences (GenBank), protein sequences, macromolecular structures, molecular variation, gene expression, and mapping
data.
They
are designed to provide and encourage access within the scientific community to sources of current and comprehensive information.
Therefore,
NCBI itself places no restrictions on the use or distribution of
the
data
contained therein. Nor do we accept data when the submitter has
requested
restrictions on reuse or redistribution. However, some submitters
of
the
original data (or the country of origin of such data) may claim
patent,
copyright, or other intellectual property rights in all or a
portion
of
the
data (that has been submitted). NCBI is not in a position to
assess
the
validity of such claims and since there is no transfer of rights
from
submitters to NCBI, NCBI has no rights to transfer to a third
party.
Therefore, NCBI cannot provide comment or unrestricted permission concerning the use, copying, or distribution of the information
contained
in the molecular databases. Should I contact the original authors for permission? Or is the
fact
that
the data were publicly shared enough to grant me permission to
redistribute?
In that case, is there a standard license that I should use? Thanks for any feedback / thought!
I don't have much to offer. AFAIK we don't really have guidelines
or
recommendations for what license to use for experimental data
packages,
except for the usual "make sure you use an appropriate license"
advice.
So far it has really been up to each author/maintainer to make sure they pick up a license that is compatible with the original license/copyright/patent of the original data they are packaging and with its redistribution thru the Bioconductor channel. FWIW here is a summary of the licenses used by the 276 experimental data packages currently in BioC devel: License Nb of packages ------------ -------------- GPL 135 Artistic-2.0 96 LGPL 41 other 4 Would be interesting to hear from other developers about this. For example, how people choose between GPL vs Artistic-2.0? Is one license typically more appropriate for packaging and redistributing data that is already publicly available? H.
Best,
davide
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
-- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel