R-Help, I'm trying to obtain some data from a webpage which masks the URL from the user, so an explicit URL will not work. For example, when one navigates to the web page the URL looks something like: http://137.113.141.205/rpt34s.php?flags=1 (changed for privacy, but i'm not sure you could access it anyways since it's internal to the agency I work for). The site has three drop-down menus for "Site", "Month," and "Year". When a combination is selected of these, the resulting URL is always http://137.113.141.205/rpt34s (nothing changes, except "flags=1" is dropped, so what I need to be able to do is write something that will navigate to the original URL, then select some combination of "Site", "Month", and "Year," and then submit the query to the site to navigate to the page with the data. Is this a capability that R has as a language? Unfortunately, I'm unfamiliar with html or php programming, so if this question belongs in a forum on that I apologize. I'm trying to centralize all of my code for my analysis in R! Thank you, -Erik Gregory Student Assistant, California EPA CSU Sacramento, Mathematics
Navigating web pages using R
4 messages · Erik Gregory, Mike Marchywka, lcn +1 more
Date: Tue, 4 Jan 2011 10:54:19 -0800 From: egregory2007 at yahoo.com To: r-help at r-project.org Subject: [R] Navigating web pages using R R-Help, I'm trying to obtain some data from a webpage which masks the URL from the user, so an explicit URL will not work. For example, when one navigates to the web page the URL looks something like: http://137.113.141.205/rpt34s.php?flags=1 (changed for privacy, but i'm not sure you could access it anyways since it's internal to the agency I work for).
LOL, presuming you are not a disgruntled employee, it is always amusing to see some entity with a fancy cryptic web design drink their own Koolaid :) This is the most annoying kind of code to write, especially when there is no reason such as revenue model to make it hard to get. I've posted in other forums about the general need for an API if you are providing data to others in a non-hostile setting.
The site has three drop-down menus for "Site", "Month," and "Year". When a combination is selected of these, the resulting URL is always http://137.113.141.205/rpt34s (nothing changes, except "flags=1" is dropped, so what I need to be able to do is write something that will navigate to the original URL, then select some combination of "Site", "Month", and "Year," and then submit the query to the site to navigate to the page with the data. Is this a capability that R has as a language? Unfortunately, I'm unfamiliar with html or php programming, so if this question belongs in a forum on that I apologize. I'm trying to centralize all of my code for my analysis in R!
I'm sure that ultimately you can code this in R but for digging out what you need there may be better approaches. First I would try to contact the page author or determine if there is a better way to get the same data. Failing that, you may be able to find a "form" section in the html and copy that. Firefox is supposed to have something called "firebug" to let you see what the page does but I've never actually used that. Generally I use linux or cygwin command line tools to diagnose this junk, R may support some of these features but this is a common issue outside of R too and so it may be worth while learning the other tools. If all else fails, downloading a local copy of the page etc, you may be able to do a packet capture and just see what it does by brute force.
From what I have seen, the R tools are pretty much named after the linux tools,
curl for example.
Thank you, -Erik Gregory Student Assistant, California EPA CSU Sacramento, Mathematics
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110105/9b25dc06/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110105/8d0c765d/attachment.pl>