Skip to content

grep txt file names from html

4 messages · Sarah Goslee, David Winsemius, chuck.01

#
Sorry, I know I should read a little 1st about this, but I am actually just
helping somebody really quick and need help too. 

I want to grep all of the names of the .txt files mentioned on this html web
page:

http://www.epa.gov/emap/remap/html/three/data/index.html

Thanks ahead of time.



--
View this message in context: http://r.789695.n4.nabble.com/grep-txt-file-names-from-html-tp4648037.html
Sent from the R help mailing list archive at Nabble.com.
#
On Oct 31, 2012, at 9:56 AM, chuck.01 wrote:

            
This shows code that will identify lines in that source page containing URLs that end in '.txt"'
Warning message:
In readLines(con = url("http://www.epa.gov/emap/remap/html/three/data/index.html")) :
  incomplete final line found on 'http://www.epa.gov/emap/remap/html/three/data/index.html'
# You can generally ignore that warning.
[1] 11

Should be fairly straightforward to remove the preceding and trailing material.
[1] "http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/benthic/benmet.txt"  
 [2] "http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/benthic/bencnt.txt"  
 [3] "http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/location/watchr.txt" 
 [4] "http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/location/habbest.txt"
 [5] "http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/design/sdesign.txt"  
 [6] "http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt"    
 [7] "http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/fish/fshmet.txt"     
 [8] "http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/fish/fshcnt.txt"     
 [9] "http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/fish/fshnam.txt"     
[10] "http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/tissue/ftmet.txt"    
[11] "http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/tissue/ftorg.txt"
David Winsemius, MD
Alameda, CA, USA
#
Sorry Sarah. 
I want to store them as a vector for use later.  

so, similar to this:

links <-
c("http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/benthic/benmet.txt",
"http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/location/watchr.txt",
"http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt")




Sarah Goslee wrote

            
--
View this message in context: http://r.789695.n4.nabble.com/grep-txt-file-names-from-html-tp4648037p4648043.html
Sent from the R help mailing list archive at Nabble.com.