Skip to content

[Bioc-devel] experimental data packages and git-lfs

7 messages · Turaga, Nitesh, Levi Waldron, Hervé Pagès

#
Are a couple extra commands needed at
https://www.bioconductor.org/developers/how-to/git/maintain-github-bioc/
for experimental data packages? I followed the instructions here for
curatedOvarianData, but got GitHub warnings on push:

remote: warning: GH001: Large files detected. You may want to try Git Large
File Storage - https://git-lfs.github.com.
remote: warning: See http://git.io/iEPt8g for more information.
remote: warning: File data/TCGA_eset.rda is 53.10 MB; this is larger than
GitHub's recommended maximum file size of 50.00 MB

So I followed the instructions at https://git-lfs.github.com/:

git lfs track "*.rda"
git add .gitattributes
git commit -a

Does this seem correct, and is there anything else to know for experimental
data packages with large files?
#
Also, can I now remove external_data_store.txt?

$ cat external_data_store.txt
data
inst/extdata


On Tue, Sep 5, 2017 at 4:38 PM, Levi Waldron <lwaldron.research at gmail.com>
wrote:

  
    
#
Hi Levi,

The external_data_store.txt file is not needed anymore.

The current git.bioconductor.org server does not store files in the LFS mode, so you can just add a file, commit and push. But for Github, you might have to use LFS for such large files. I?ll look into this more and get back to you with a detailed reply.

Best,

Nitesh
This email message may contain legally privileged and/or confidential information.  If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited.  If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
#
Thank you, Nitesh.  Is there any difference at all now in how experimental
data vs. software packages are tracked and checked?

On Tue, Sep 5, 2017 at 8:40 PM, Turaga, Nitesh <
Nitesh.Turaga at roswellpark.org> wrote:

            

  
    
1 day later
#
Hi Levi,

I?m not sure what you mean by tracked vs checked.

On another note, I have explored your package a little on GitHub (waldronLab/curatedOvarianData), and I?ve noticed that the data files there are stored as LFS files. As long as you don?t add the .gitattributes file in upstream, you should be fine. You can keep the LFS tracking on Github and non-LFS storage on bioc-git separate. 

To summarize:  No LFS support as of right now on bioc-git server, so I wouldn?t use any LFS commands.

Best,

Nitesh
This email message may contain legally privileged and/or confidential information.  If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited.  If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
#
Thanks, Nitesh! See below:

On Thu, Sep 7, 2017 at 4:21 PM, Turaga, Nitesh <
Nitesh.Turaga at roswellpark.org> wrote:

            
Sorry, I meant whether there are any differences in the way software and
experimental data packages are handled by Bioconductor. It seems like from
the developer's perspective now, they are handled exactly the same, as
opposed to in the past when experimental data packages were tracked in two
separate svn repos and checked only twice a week.

On another note, I have explored your package a little on GitHub
To summarize:  No LFS support as of right now on bioc-git server, so I
So the way I understand this is, I can continue using LFS on GitHub, and
merging my changes upstream to git.bioconductor.org, as long as I avoid
adding the .gitattributes file upstream. I have already used LFS commands
to initiate LFS on GitHub, but these shouldn't affect git.bioconductor.org
if I avoid adding the .gitattributes file upstream.
#
Hi Levi,
On 09/07/2017 02:52 PM, Levi Waldron wrote:
I'll try to summarize what has changed with respect to software vs
data-experiment packages:

What has changed with the old svn way is that software and
data-experiment packages are now in the same location on the new
git.bioconductor.org server. Also now developers access and maintain
both types of packages the same way thru the git client. No more
separation between package shell and data for data-experiment packages.
So the external_data_store.txt file is no more needed.

What has NOT changed is that builds for software and data-experiment
packages are still separated. Both still happen every day but at
different times and we still generate separate reports. Also for
the last couple of years or so, we've been building/checking
data-experiment packages on Linux only. This is for lack of resource.
Software and data-experiment packages are still propagated to
separate public repositories:

   in release:
     https://bioconductor.org/packages/3.5/bioc/
     https://bioconductor.org/packages/3.5/data/experiment/

   in devel
     https://bioconductor.org/packages/3.6/bioc/
     https://bioconductor.org/packages/3.6/data/experiment/

You can see this by calling biocinstallRepos() after loading the
BiocInstaller package.
Finally our requirements for software and data-experiment packages
have not changed either. The latter can be bigger than the former and
they are not required to have a vignette. Also they should NOT have
native code in them.

Cheers,
H.