On Mar 26, 2018, at 4:03 PM, Zhu, Lihua (Julie) <Julie.Zhu at umassmed.edu> wrote:
Thanks so much, Nitesh!
FYI, I sent an email to bioc-devel (cced Martin) as well. Hope you saw it. It would be great if you could send your solutions in response to my email to the bioc-devel for others to benefit.
Best regards,
Julie
?On 3/26/18, 3:39 PM, "Turaga, Nitesh" <Nitesh.Turaga at RoswellPark.org> wrote:
Hi Julie,
Please send this email to the bioc-devel page. It?s a very valid question and I think everyone in the community should benefit from it. I?ll take a look at your problem now.
bioc-devel <bioc-devel at r-project.org>
Best,
Nitesh
On Mar 26, 2018, at 3:36 PM, Zhu, Lihua (Julie) <Julie.Zhu at umassmed.edu> wrote:
A while ago, Jim suggested to replace two big data files with two smaller ones, which I did.
Oddly, when I tried to import the package GUIDEseq to github today, it still mentions these two files being too big, although I do not see these two files in my local repository, checked out from the Bioconductor repository. I think the files in the older braches affect git push origin master.
Could you please help? Thanks!
JulieZhuMac2017:GUIDEseq ZHUJ$ git push origin master
Counting objects: 515, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (180/180), done.
Writing objects: 100% (515/515), 107.80 MiB | 18.38 MiB/s, done.
Total 515 (delta 344), reused 483 (delta 320)
remote: Resolving deltas: 100% (344/344), done.
remote: warning: File inst/extdata/HEK293_site4All.bam is 87.88 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB
remote: error: Trace: faafca1a45d562b62039862ac0dfbf85
remote: error: File inst/extdata/HEK293_site4All.bed is 124.11 MB; this exceeds GitHub's file size limit of 100.00 MB
! [remote rejected] master -> master (pre-receive hook declined)
JulieZhuMac2017:GUIDEseq ZHUJ$ git rm -r inst/extdata/HEK293_site4All.bed
fatal: pathspec 'inst/extdata/HEK293_site4All.bed' did not match any files
From: "Zhu, Lihua (Julie)" <Julie.Zhu at umassmed.edu>
Date: Monday, October 19, 2015 at 1:32 PM
To: Michael Lawrence <lawrence.michael at gene.com>, Jim Hester <james.hester at bioconductor.org>
Subject: Re: Data files in GUIDEseq
Jim, FYI, I have removed the two large datasets with much smaller replacements. Thanks for your feedback!
Michael, please feel free to check out the new dataset in the following commit. Thanks!
Julie-Zhus-MacBook-Pro-Intel-Core-i7:GUIDEseq zhuj$ svn ci -m "changed to smaller test datasets"
Deleting inst/extdata/HEK293_site4All.bam
Deleting inst/extdata/HEK293_site4All.bed
Adding (bin) inst/extdata/bowtie2.HEK293_site4_chr13.sort.bam
Adding inst/extdata/bowtie2.HEK293_site4_chr13.sort.bed
Transmitting file data ...
Committed revision 109732.
From: Lihua Julie Zhu <julie.zhu at umassmed.edu>
Date: Thursday, October 15, 2015 2:16 PM
To: Michael Lawrence <lawrence.michael at gene.com>, Jim Hester <james.hester at bioconductor.org>
Subject: Re: Data files in GUIDEseq
Yes, we need to fetch a few enriched regions.
From: Michael Lawrence <lawrence.michael at gene.com>
Date: Thursday, October 15, 2015 2:13 PM
To: Jim Hester <james.hester at bioconductor.org>
Cc: Michael Lawrence <lawrence.michael at gene.com>, Lihua Julie Zhu <julie.zhu at umassmed.edu>
Subject: Re: Data files in GUIDEseq
Since this is peak calling, we probably want to keep all the reads within a smaller region.
On Thu, Oct 15, 2015 at 11:12 AM, Jim Hester <james.hester at bioconductor.org> wrote:
FWIW the hard limit that was causing the error with the git mirrors is 100
Mb, but I would try and shoot for a file around 20 Mb or so at max.
An easy way if you just want uniform sampling is using samtools (where -s
is the fraction of reads you want to keep).
samtools view HEK293_site4All.bam -s .1 -b -o HEK293_site4All2.bam
On Thu, Oct 15, 2015 at 2:07 PM, Michael Lawrence <lawrence.michael at gene.com
Probably best to come up with smaller files, because we want them in the
package for demonstration purposes, right? Just need to filter them,
somehow. Actually, all we need is the BAM file...
On Thu, Oct 15, 2015 at 10:43 AM, Zhu, Lihua (Julie) <
Julie.Zhu at umassmed.edu> wrote:
These files were included for Michael to test functions he is developing
Molecular, Cell and Cancer Biology (MCCB)
Head of Bioinformatics Core, MCCB
Program in Bioinformatics and Integrative Biology
Program in Molecular Medicine
University of Massachusetts Medical School
This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.