Skip to content

[R-meta] Open Data: preferred way of publishing

4 messages · Moritz Tobiasch, Philipp Doebler, James Pustejovsky +1 more

#
Dear colleagues,

I have not a primary technical question, but I think it is worth asking the experts:

I am currently finishing on a meta-analysis, preparing for publication. I intend to publish my set of primary data (it?s been quite some work, and I guess it could be helpful to review and discuss on the topic). Primary data were collected in a (I know, I apologize for it ?) Excel spreadsheet before being imported to R, and analysis was run in Rstudio in a Markup file.

My question is now: based on your experience and preferences, what would be your ideal way to make the primary dataset and the calculations accessible for review and further research? Add it to the article as supplemental material, upload it to arxiv, GitHub, or just on my website?

Any suggestions are welcome!

Sincerely yours
M. Tobiasch

--
Dr. med. Moritz Tobiasch
Staff Physician

Universit?tskliniken LKH Innsbruck
Dept. of Medicine
Division of Gastroenterology and Hepatology
Anichstr. 35
A-6020 Innsbruck
Austria






-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://stat.ethz.ch/pipermail/r-sig-meta-analysis/attachments/20180127/311c4607/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 873 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://stat.ethz.ch/pipermail/r-sig-meta-analysis/attachments/20180127/311c4607/attachment.sig>
#
Dear Moritz,

from my limited experience, I would recommend the Open Science Framework (
https://osf.io/), which includes many of the ideas in Github (e.g. you can
clone someones project and work with that) but could be advantageous w.r.t.
data sharing. Harvard's dataverse has some visibility, too, and also allows
you to publish code etc.

While I have published some material in online supplements, I do not like
the fact that the publishers copyright extends to that (which could limit
the re-use of the data you intend). I would definitely not recommend your
website, as this is typically way less stable than professional hosting.

Best wishes,
  Philipp

On Sat, Jan 27, 2018 at 4:19 PM, Moritz Tobiasch <moritztobiasch at gmail.com>
wrote:

  
    
#
I would recommend one of the following (in decreasing order of enthusiasm):

1. Post raw data and replication code to the Open Science Framework (http://osf.io). This will let you create a doi, so that you can add a citation to the data and materials in your article. Also a license to be clear about conditions for re-use. This approach gives high assurance that others will be able to find the materials even years from now.
2. Post data and materials to GitHub. Similar benefits in terms of long-term stability, but the interface is less friendly for non-programmers and the doi and license aren?t as easy.
3. Post as supplementary materials on journal website. This would seem the most intuitive and user-friendly, but depending on the journal, there?s no assurance that the links and materials will be preserved and discoverable in the long term.
4. Posting only in your personal website is not ideal because you might decide three years from now to reorganize it and then the url from your article will no longer work. Better to do (1) or (2) and then include a link on your personal site.
5. Any of the above is better than creating a nicely formatted table of the data and turning the table into pdf.

  
  
#
Dear Moritz,

It is very nice that you are considering to make your meta-analytic
research more open by sharing data, codes, post pre-prints, etc.

I am sending below two examples of reproducible reports for meta-analysis
manuscripts I have published recently using R Markdown (Rmd)  (literate
programming).

Here are the usual steps I follow:

1) Use excel or alike for preparing the data but export and work with text
files such as csv
2) Create an Rproj. and perform all data munging, visualisation, analysis,
etc in Rmd files (text + codes)
3) Produce a webpage (single) or website (multiple html and a nav bar) as
output using knitr
4) Host all files on a GitHub repository
5) Store data (csv) and codes in a permanent repo such as http://www.osf.io
to get the DOI and a citation
6) Post a pre-print of the manuscript


Example 1
My first one, uses knitr to produce a single page.

webpage: https://emdelponte.github.io/paper-white-mold-meta-analysis/
repo: https://github.com/emdelponte/paper-white-mold-meta-analysis


Example 2
Most recent one is a website + nav bar with four main Rmd files for intro,
data, code and manuscript:

website: https://emdelponte.github.io/paper-FHB-Brazil-meta-analysis/
repo: https://github.com/emdelponte/paper-FHB-Brazil-meta-analysis

this is not a MA paper, but it follows the same structure
https://emdelponte.github.io/paper-FGSC-fitness/index.html

I prepared a template for a research compendium (data + code + manuscript +
figures) for this last example
website: https://emdelponte.github.io/research-compendium-website/
report: https://github.com/emdelponte/research-compendium-website


Finally, I post the manuscript (bioRxiv or PeerJ) with the following
addition:

"Data processing and analyses.
All data processing and analyses, as well as graphical work, were performed
running R version 3.4.3 (R Core Team, 2017). Texts and scripts were
prepared as R Markdown documents. A collection of these latter files were
rendered as a website, using the render_site function of the R package
rmarkdown (Allaire et al. 2017), where all analysis are documented,
reproducible and openly available at
https://github.com/emdelponte/paper-FGSC-fitness. The data in text format
are deposited at the Open Science Framework data repository and available
at https://osf.io/c2mbr/."


One of the most important aspects to ensure reproducibility of methods is a
clear documentation, besides access to all files. I found that this is not
only good for others but for myself  when I need to go back to previous
analysis and understand what and why I did something!

Hope these are useful. Any question, let me know.

Best wishes,

Emerson



Prof. Emerson M. Del Ponte
Departamento de Fitopatologia
Universidade Federal de Vi?osa
Vi?osa, MG - Brasil
+55 (31) 3899-1103
Twitter: @edelponte


2018-01-27 13:19 GMT-02:00 Moritz Tobiasch <moritztobiasch at gmail.com>: