Skip to content

downloaf.file

5 messages · Adelchi Azzalini, Barry Rowlingson, Thomas Lumley +1 more

#
Dear List-members,

to download a file from the net, the function download.file(..)
does the job.  However, before embarking on the download, I would
like to find out how large the file is.  Is there a way to know it?

Most easily, this question has been asked before, but I am new to 
the list.

Regards, with thanks in advance,

Adelchi Azzalini

----
Adelchi Azzalini  <azzalini at stat.unipd.it>
Dipart.Scienze Statistiche, Universit? di Padova, Italia
http://azzalini.stat.unipd.it/
#
Essentially no.  Most servers will give you the length if you start the 
download, and then R prints it out, but it may be "unknown".  As in
trying URL `http://cran.r-project.org/src/contrib/PACKAGES'
Content type `text/plain; charset=iso-8859-1' length 95407 bytes
opened URL
.......... .......... .......... .......... ..........
.......... .......... .......... .......... ...
downloaded 93Kb

and you can (probably) interrupt during those dots.
On Tue, 4 Feb 2003, Adelchi Azzalini wrote:

            

  
    
#
You can send web servers a 'HEAD' request, which can give you some 
basic information about the download. I cant see a way to get this from 
the current R functions, so here's a little routine to leverage the 
'lynx' web browser:


"head.download" <-
   function (url)
{
   if (system("lynx -help > /dev/null") == 0) {
     method <- "lynx"
   }
   else {
     stop("No lynx found")
   }
   if (method == "lynx") {
     heads <- system(paste("lynx -head -dump '", url,"'", sep = 
""),intern=T)
   }

# turn name: value lines into named list. prob vectorisable

   ret <- list(status=heads[1])
   for(l in 2:length(heads)){
     col <- regexpr(":",heads[l])
     if(col>-1){
       name <- substr(heads[l],1,(col-1))
       value <- substr(heads[l],(col+1),nchar(heads[l]))
       ret[[name]] <- value
     }else{
       ret <- c(ret,heads[l])
     }
   }
   ret
}

  this borrows bits from download.file(), but it does depend on you 
having lynx installed. The return value is a list with names 
corresponding to the header titles and values being the values. It looks 
for a : as the title: value separator, and anything that doesnt have a : 
is just added verbatim unnamed.

  For example, how big is the R logo on the home page?

 > head.download("http://www.r-project.org/Rlogo.jpg")$"Content-Length"
[1] " 8793"

  That's bytes. Yes I know its character! I dont think web servers are 
under any obligation to provide accurate Content-length values. Many 
dynamic web servers have pages that change length every time. This will 
also not for for ftp:// URLs or local file:// URLs (or gopher:// URLs?).

  Perhaps HEAD-getting functionality can be put in the next release of 
R? It would probably have a better "name: value -> named list" routine 
than the one I just hacked up in two minutes above. Oops. Shame.

Baz
#
On Tue, 4 Feb 2003, Barry Rowlingson wrote:
The HTTP protocol says that a content length SHOULD be provided and MUST
be accurate if it is provided.

	-thomas
#
On Tue, 4 Feb 2003, Thomas Lumley wrote:

            
Most proxies of my acquaintance will report unknown unless they are asked
to actually get the file or have it already cached.  Further, the IE 
internals used under Windows with --internet2 usually seems to get the 
wrong length (far too short) when talking to a proxy.

Why is this of interest: there are lots of internet download tools 
available apart from R?