Skip to content

Downloading tab separated data from internet

3 messages · HC, Brian Ripley

HC
#
Hi all,

I am trying to download some tab separated data from the internet. The data
is not available directly at the URL that could be known apriori. There is
an intermediate form where start and end dates have to be given to get to
the required page.

For example, I want to download data for a station 03015795. The form for
this station is at:

http://ida.water.usgs.gov/ida/available_records.cfm?sn=03015795

I could get the start date and end date from this form using:

# 
# Specifying station and reading from the opening form
stn<-"03015795"
myurl<-paste("http://ida.water.usgs.gov/ida/available_records.cfm?sn=",stn,sep="")
mypage1 = readLines(myurl)

# Getting the start and end dates
mypattern = '<td align="center">([^<]*)</td>'
datalines = grep(mypattern, mypage1[124], value=TRUE)
getexpr = function(s,g)substring(s,g,g+attr(g,'match.length')-1)
gg = gregexpr(mypattern, datalines)
matches = mapply(getexpr,datalines,gg)
result = gsub(mypattern,'\\1',matches)
names(result)=NULL
mydates<-result[1:2]

I want to know how I can feed these start and end dates to the form and
execute the button to go to the data page and then to download the data,
either as displayed in the browser or by saving as a file.

Any help on this is most appreciated.

Thanks.
HC







--
View this message in context: http://r.789695.n4.nabble.com/Downloading-tab-separated-data-from-internet-tp4152318p4152318.html
Sent from the R help mailing list archive at Nabble.com.
#
AFAICS what you mean is 'how can I fill in an HTML form using R'.
Answer: use package RCurl.

Do study the posting guide: none of the 'at a minimum' information was 
given here.
On 03/12/2011 04:47, HC wrote:

  
    
HC
#
Thanks for your reply.

I tried to use the  postForm function of RCurl as below but do not have much
clue as to how to go further with what it does. 

library(RCurl)
stn<-"03015795"
myurl<-paste("http://ida.water.usgs.gov/ida/available_records.cfm?sn=",stn,sep="")
mypage1 = readLines(myurl)

# Getting the start and end dates
mypattern = '<td align="center">([^<]*)</td>'
datalines = grep(mypattern, mypage1[124], value=TRUE)

getexpr = function(s,g)substring(s,g,g+attr(g,'match.length')-1)
gg = gregexpr(mypattern, datalines)
matches = mapply(getexpr,datalines,gg)
result = gsub(mypattern,'\\1',matches)
names(result)=NULL
mydates<-result[1:2]

result = postForm(myurl,fromdate=mydates[1], todate=mydates[2],rtype="Save
to File", submit1="Retrieve Data")


I tried to read the RCurl's documentation but do not know what functions I
should be using. Are there any  examples available that could be helpful.
Could you point me to those please.

Thanks for the help.
HC

--
View this message in context: http://r.789695.n4.nabble.com/Downloading-tab-separated-data-from-internet-tp4152318p4153063.html
Sent from the R help mailing list archive at Nabble.com.