Skip to content

Problems with source() function

5 messages · Duncan Temple Lang, Al, Seth Falcon

Al
#
Hello list members!

I'm trying to enter some data in an R session using source() function
with an URL as argument. The data source is a PHP script located in an
apache web server and the data is a long list generated on-the-fly,
these are the initial lines:

groups<-list()
groups[['ENSMUST00000000001']]=c(52611,483683,147952,132170,297514,469248,291525,364037,469915,55472,280220,314688,415650,486875,440898,6781,497785) groups[['ENSMUST00000000003']]=c(416911,327120,425495,72272,297529,101933,371418,139034,318872,367204,237702) groups[['ENSMUST00000000028']]=c(199311,325400,184761,241988,376845,75052,67724,404240,439543,391057,393816) groups[['ENSMUST00000000031']]=c(402587,352900,139030,186068,463553,328881,74942,277085,301431,256149,410846) groups[['ENSMUST00000000033']]=c(12700,23908,11140,122358,389908,390084,383903,354007,457965,106395,131876) groups[['ENSMUST00000000049']]=c(59336,203239,101077,382882,327374,281549,212042,275594,361523,490934,240275) groups[['ENSMUST00000000056']]=c(409571,304584,394332,379699,13785,4260,288889,42538,304075,47734,485512,52501,328509,504846,334607,82566,250088,150240,16422,446551,314484,91878,124752,341638,379512,379890,319764,8019,59221,156508,362524,74001,149400) groups[['ENSMUST00000000058']]=c(26511,455190,466368,358528,268486,315461,149260,422804,137641,163718,352555)

The problem:
When I execute the command it apparently finish ok, without printed
errors but when I test the consistency of the data entered using the
command length() I always obtain different figures.

More facts:
When I source the data from a static file instead an url, the data is
fully entered and the length is always the same (20346 list elements).
It delays 30 secs to load.

When I source the data from the dynamic way, from an url, it delays 2
min. and always data is truncated.

Tried and miserably failed:
- Changed .Options$timeout from 60 to 300
- Using R --verbose is of no help, the data is silently truncated. 
- Changed the expression in which data is entered:
groups<-list(
'ENSMUST00000000001'=c(52611,483683,147952,132170,297514,469248,291525,364037,469915,55472,280220,314688,415650,486875,440898,6781,497785),
'ENSMUST00000000003'=c(416911,327120,425495,72272,297529,101933,371418,139034,318872,367204,237702)
...
)

Kind list members, is there some timeout I am missing? Some way to debug
the process? Some suggestion?

Sincerely, thank you!

Alberto de Luis
www.cicancer.org
#
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Does
  source(textConnection(readLines(url(http://...)))

give the correct answer. If not, what is being dropped
when you just use readLines() and look at the contents
of the download.

And how long is the longest line?


The RCurl package  (http://www.omegahat.org/RCurl) gives you a lot of
control in perform and processing HTTP requests, allowing
you to control the request, and read the body and the header of the
response.  It may be worth a try if things are getting frustrating.

 D.
Al wrote:
5!
- --
Duncan Temple Lang                duncan at wald.ucdavis.edu
Department of Statistics          work:  (530) 752-4782
371 Kerr Hall                     fax:   (530) 752-7099
One Shields Ave.
University of California at Davis
Davis, CA 95616, USA
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (Darwin)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFDYSvk9p/Jzwa2QP4RAsqfAJ98RNScQ7ea1/MAnt72R0VGZoXaEQCfZvyl
WNNN/HT1hx/Kix3KSp15XwM=
=VsDG
-----END PGP SIGNATURE-----
Al
#
Thank you for your answer :)

I've tested your suggestion but without success. The remote load process
is truncated silently using

	source(textConnection(readLines(url(http://...)))

when look at the contents there's not a fixed point of break, is
different each time I execute the command. Therefore the dropped lines
are different every time. It seems the only constant is the time of the
interruption (1 min 55 secs in my system).

Loading the file in a browser (it loads always complete) and examining
the text, there's no apparent malformation in the rupture points.

The longest line is 669 chars and is perfectly loaded in remote and
local mode:

	> lineas <- readLines("transcripts_moe430a.R")
	> length(lineas)
	[1] 20347
	> max(nchar(lineas))
	[1] 669
	> which(nchar(lineas)==669)
	[1] 3241
	> lineas <- readLines(url
("http://10.10.10.3:83/probefinder/scripts/probegrouper.php?chip=moe430a&mode=transcript"))
	> length(lineas)
	[1] 7471
	> max(nchar(lineas))
	[1] 669
	> which(nchar(lineas)==669)
	[1] 3242

Apparently there's a timeout in the url() or some subordinated function.
I will try to use the RCurl package but, for educational purposes, I
prefer that the load process were managed in a simply way... with an
source() for example, in order to not overload alumni with tricky
methods...

Thank you again.

.....................
Alberto de Luis
Bioinformatics and Functional Genomics Lab
Cancer Research Center
Salamanca (Spain)
.....................
On Thu, 2005-10-27 at 12:35 -0700, Duncan Temple Lang wrote:
,4
#
On 28 Oct 2005, aldeluis at usal.es wrote:

            
I'm not sure exactly what you are trying to accomplish, but I wonder
if either of the following two ideas would help you:

1. Instead of source(), consider loading the R code on the server side
   and then using save(..., compres=TRUE) to create a binary image.
   You can then feed that to the clients and have them use load().  

2. What happens if you gzip the code before sending and gunzip on the
   client side.  It may be less convenient, although supposedly there
   is a way to do the equivalent of gzfile(url(...)).

HTH,

+ seth
Al
#
I'm trying to feed data generated on-the-fly by a PHP script using R
function source(), passing the arguments in the URL, using GET method
("http://someserver.com/script.php?a=343&b=873"). If not on-the-fly, the
user has to wait more and get the data in more than one step.

I'm trying a one-step simple method but for some reason the source()
function truncates silently the data. I will try your suggestion of a
binary file if I can generate the gzip stream on-the-fly...

Thank you!

.....................
Alberto de Luis
Bioinformatics and Functional Genomics Lab
Cancer Research Center
Salamanca (Spain)
.....................
On Fri, 2005-10-28 at 06:57 -0700, Seth Falcon wrote: