Skip to content

Navigating web pages using R

4 messages · Erik Gregory, Mike Marchywka, lcn +1 more

#
R-Help,

I'm trying to obtain some data from a webpage which masks the URL from the user, 
so an explicit URL will not work.  For example, when one navigates to the web 
page the URL looks something like:
http://137.113.141.205/rpt34s.php?flags=1 (changed for privacy, but i'm not sure 
you could access it anyways since it's internal to the agency I work for).
The site has three drop-down menus for "Site", "Month," and "Year".  When a 
combination is selected of these, the resulting URL is 
always http://137.113.141.205/rpt34s (nothing changes, except "flags=1" is 
dropped, so what I need to be able to do is write something that will navigate 
to the original URL, then select some combination of "Site", "Month", and 
"Year," and then submit the query to the site to navigate to the page with the 
data. 
Is this a capability that R has as a language?  Unfortunately, I'm unfamiliar 
with html or php programming, so if this question belongs in a forum on that I 
apologize.  I'm trying to centralize all of my code for my analysis in R!

Thank you,
-Erik Gregory
Student Assistant, California EPA
CSU Sacramento, Mathematics
#
LOL, presuming you are not a disgruntled employee, it is always amusing to
see some entity with a fancy cryptic web design drink their own Koolaid :) 
This is the most annoying kind of code to write, especially when there is
no reason such as revenue model to make it hard to get. I've posted in other
forums about the general need for an API if you are providing data to others
in a non-hostile setting.
I'm sure that ultimately you can code this in R but for digging out what
you need there may be better approaches.
First I would try to contact the page author or determine if there is
a better way to get the same data. Failing that, you may be able to find
a "form" section in the html and copy that. Firefox is supposed to have something
called "firebug" to let you see what the page does but I've never actually used
that. Generally I use linux or cygwin command line tools to diagnose this junk,
R may support some of these features but this is a common issue outside of R too
and so it may be worth while learning the other tools. If all else fails, downloading
a local copy of the page etc, you may be able to do a packet capture and just
see what it does by brute force.
curl for example.