Skip to content

[Bioc-devel] Package download when using functions from affy and oligo

9 messages · Joris Meys, Obenchain, Valerie, James W. MacDonald +3 more

#
Dear,

I've noticed that using certain functions in affy and oligo (eg
oligo::read.celfiles and affy::bg.correct) start with downloading another
package and end with either R crashing or a warning that -after
installation succeeded- the package is not available. After which using
some functions of both packages still crash R.

The warning I get when trying oligo::read.celfiles() on a single CEL file
right after installing it about the pd.hugene.1.0.st.v1 package. The even
more annoying thing is that on my machine it insists on building from
source, whereas on another Windows machine without Rtools, it downloads a
binary.

Reason it frustrates the heck out of me, is that both affy and oligo
crashed the R session in different ways. During installation of a package,
during use of a function, and at different points when comparing my machine
with the one of our students. The culprit seems to be in one of the
underlying packages, but I wasn't even able to detect which package is the
culprit, let alone which function crashes everything.

Is there a way around this so I can ensure that at least I have the same
setup as they have and I can try to come up with a reproducible example to
report this critical bug?

Thank you in advance
Joris
2 days later
#
Joris,

Sorry I don't have much to offer here. I've cc'd the authors of oligo and affy who may have some insight.

Valerie
On 05/02/2018 11:35 AM, Joris Meys wrote:
Dear,

I've noticed that using certain functions in affy and oligo (eg
oligo::read.celfiles and affy::bg.correct) start with downloading another
package and end with either R crashing or a warning that -after
installation succeeded- the package is not available. After which using
some functions of both packages still crash R.

The warning I get when trying oligo::read.celfiles() on a single CEL file
right after installing it about the pd.hugene.1.0.st.v1 package. The even
more annoying thing is that on my machine it insists on building from
source, whereas on another Windows machine without Rtools, it downloads a
binary.

Reason it frustrates the heck out of me, is that both affy and oligo
crashed the R session in different ways. During installation of a package,
during use of a function, and at different points when comparing my machine
with the one of our students. The culprit seems to be in one of the
underlying packages, but I wasn't even able to detect which package is the
culprit, let alone which function crashes everything.

Is there a way around this so I can ensure that at least I have the same
setup as they have and I can try to come up with a reproducible example to
report this critical bug?

Thank you in advance
Joris





This email message may contain legally privileged and/or confidential information.  If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited.  If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
#
I think there are multiple complaints here, so I'll take them one at a time.

On Fri, May 4, 2018 at 3:56 PM, Obenchain, Valerie <
Valerie.Obenchain at roswellpark.org> wrote:

            
This is true for oligo, and perhaps a bit annoying. If you don't have the
package installed already, it gets the package, installs it, and then says
it's not available. This is an easy enough fix.


After which using
I don't know what to do with that. What functions?
That is an options setting that gets changed when you install Rtools. The
'pkgType' option gets set to 'both' because you can now install both kinds.
And in install.packages it ends up getting switched from 'both' to
'source'. I haven't dug any further into that because I am not sure I see
why it's a problem. In the end there isn't a difference between installing
a source or a binary pdInfoPackage, and trying to get it to 'do the right
thing' might have some unforeseen consequences that I would rather not have
to worry about. This is really an 'if it ain't broke, don't fix it'
scenario, IMO.
I understand your frustration, but that's not enough to go on. I have
never, in like 18 years, had either oligo or affy randomly segfault on me.
I understand that it is happening for you, but unless you can come up with
a reproducible example, it's not possible for anybody to help.
Again, I am not sure what to do with that. I am not sure what 'a way around
this' pertains to, and ensuring you have the same setup as 'they have'
seems to be something only you can accomplish. Is there some reason you
cannot ensure that you have the same setup on two different computers?

Best,

Jim

  
    
#
Now that I think about this, most of the annotation packages (including the
pdInfo packages) are no longer generated as windows binary files. So even
if you don't have Rtools installed you shouldn't get the binary, because it
doesn't exist.
BioC_mirror: https://bioconductor.org
Using Bioconductor 3.7 (BiocInstaller 1.30.0), R 3.5.0 (2018-04-23).
Installing package(s) 'pd.hugene.1.0.st.v1'

   package 'pd.hugene.1.0.st.v1' is available as a source package but not
as a binary

So maybe you are thinking of an older R installation, from back when we
still made binary versions of annotation packages?


Best,

Jim
#
Thank you for the answer.

I was trying to create a reproducible example before I vented maybe a bit
too much in my previous mail.

I managed to get closer to the problem and it is related to data that was
corrupted at download. I can send you a reproducible example that bombs R,
but I will have to send the specific data files as well. How do I send them
best?

Cheers
Joris
On Sat, 5 May 2018, 00:09 James W. MacDonald, <jmacdon at uw.edu> wrote:

            

  
  
#
How about a google drive?  This problem of autodownloading should be
addressed directly.
These facilities are still important but their maintenance is clearly a
lower priority as the
technologies handled have diminished use in the field.  I think we should
be able to team up and remove autoinstallation elements of these packages,
and
perhaps improve general maintainability -- Joris, can you pick
one, make a github repo that we can collaborate on revising, and then
we can start?  It will involve a deprecation process.
On Sat, May 5, 2018 at 10:54 AM, Joris Meys <jorismeys at gmail.com> wrote:

            

  
  
8 days later
#
Dear all,

sorry for the delayed response, due to some unfortunate events I had to
prioritize my family the past week.

You find an RStudio project in a zipped folder on this link :
https://jorismeys.stackstorage.com/s/3ik0vMwsvueuT5a

It contains a script called testOligo.R that can be sourced and nukes my R
session in the second step of the rma() function. It also contains the
faulty .gz files. If you need more information, don't hesitate to contact
me.

Regarding improving general maintainability, I'm willing to help out on
that. Problem is that I'm rather behind with my own work, so I'm short on
time for the moment. I'll fork affy tomorrow (need to give class now) and
let's start from there then?

Cheers
Joris



On Sat, May 5, 2018 at 5:17 PM, Vincent Carey <stvjc at channing.harvard.edu>
wrote:

  
    
#
One of your cell files is funky

 > colSums(exprs(alldata) == 0)
GSM907854.CEL.gz GSM907866.CEL.gz GSM907857.CEL.gz GSM907863.CEL.gz
                0                0                0           686388
GSM907856.CEL.gz GSM907862.CEL.gz GSM907855.CEL.gz GSM907861.CEL.gz
                0                0                0                0

and it's tickling a bug in preprocessCore

$ R -d gdb -f testOligo.R
...
0x00007fffe190418a in max_density (z=0x7fffcc0008c0, rows=0, cols=1, 
column=0)
     at rma_background4.c:128
128	rma_background4.c: No such file or dire

(gdb) dir /home/mtmorgan/b/git/preprocessCore/src/
Source directories searched: 
/home/mtmorgan/b/git/preprocessCore/src:$cdir:$cwd
(gdb) l
123	
124	  max_y = find_max(dens_y,16384);
125	
126	  i = 0;
127	  do {
128	    if (dens_y[i] == max_y)
129	      break;
130	    i++;
131	
132	  } while(1);
(gdb) p i
$1 = 1821306
(gdb) p max_y


Maybe one of the preprocessCore pros will chime in...

Martin
On 05/14/2018 09:15 AM, Joris Meys wrote:
This email message may contain legally privileged and/or...{{dropped:2}}
#
One of the CEL files is truncated or otherwise corrupted. The 
appropriate place to really detect that is in the code that reads the 
CEL file data and appropriately warn or error to the user.

The bug below occurs because of a massive mismatch between the 
assumptions of the RMA background model (essentially the modal spike at 
0 intensity throws it completely off) and I think the only time I've see 
it occur in the past was also with corrupted data. Any which way a 
segfault is a less than desirable outcome.

Ben
On 2018-05-14 17:11, Martin Morgan wrote: