Web-scraping newbie - dynamic table into R?
Hi Julio, I am just working on my first cup of tea of the morning so I am not functioning all that well but I finally noticed that we have dropped the R-help list. I have put it back as a recipient as there are a lot of people that know about 99%+ more than I do about the topic. I'll keep poking around and see what I can find.
On Sun, 19 Apr 2020 at 22:34, Julio Farach <jfarach at gmail.com> wrote:
John, I again thank you for the reply and continued support. After a few hours, I arrived at the point you describe below; namely extracting elements, but from a different tab than the Last 10 Draws, or Winning Numbers tab. On the website, there are 5 tabs. The elements you describe below are from the 3rd tab, "Odds & Prizes." Instead of results, that tab describes the general odds of the Keno game. But, I'm seeking the last 10 draws shown on the "Winning Numbers," or 4th tab. I've played around with a CSS Selector tool, but I'm unable to extract any details (e.g., a draw number or Keno number) from the 4th tab. I could extract elements of other tabs, like you did below, from the 3rd tab. Please let me know if you learn more or if you have other ideas for me to consider. Regards, Julio On Sun, Apr 19, 2020 at 7:00 PM John Kane <jrkrideau at gmail.com> wrote:
I am a comple newbie too but try this library(rvest) Kenopage <- " https://www.galottery.com/en-us/games/draw-games/keno.html#tab-winningNumbers " Keno <- read_html(Kenopage) tt <- html_table(Keno, fill= TRUE) This should give you a list with 10 elements, each of which should be a data.frame Example ken1 <- tt[[1]] str(ken1)
str(ken1)
'data.frame': 12 obs. of 4 variables: $ Numbers Matched : chr "10" "9" "8" "7" ... $ Base Keno! Prize : chr "$100,000*" "$5,000" "$500" "$50" ... $ + Bulls-Eye Prize : chr "$200,000*" "$20,000" "$1,500" "$100" ... $ Keno! w/ Bulls-Eye Prize: chr "$300,000" "$25,000" "$2,000" "$150" ...
I figured this out a little a few ago and just manually stepped through the data.frames to get what I wanted. Brute force and stupidity but it worked Someday I may figure out how to use things like SelectorGadget! On Sun, 19 Apr 2020 at 17:46, Julio Farach <jfarach at gmail.com> wrote:
John - I corrected my email below for typos. On Sun, Apr 19, 2020 at 5:42 PM Julio Farach <jfarach at gmail.com> wrote:
John, Yes, while I can execute the line of code that I provided, I am still unable to capture the table shown in the browser. The last 10 draws are shown in a table if you view the page: https://www.galottery.com/en-us/games/draw-games/keno.html#tab-winningNumbers But, despite using CSS and XPath combinations of
html_nodes(x, CSS or XPath)
I am unable to copy that table into R. One commenter on another forum received an error and suggested that perhaps bots lack permission to access the page. But, I've used the Robotstxt package to ensure that bots are indeed permitted. Any thoughts? Regards, Julio On Sun, Apr 19, 2020 at 4:38 PM John Kane <jrkrideau at gmail.com> wrote:
Keno <- read_html(Kenopage) ? Or Am I misunderstanding the problem? On Sun, 19 Apr 2020 at 15:10, Julio Farach <jfarach at gmail.com> wrote:
How do I scrape the last 10 Keno draws from the Georgia lottery into R? I'm trying to pull the last 10 draws of a Keno lottery game into R. I've read several tutorials on how to scrape websites using the rvest package, Chrome's Inspect Element, and CSS or XPath, but I'm likely stuck because the table I seek is dynamically generated using Javascript. I started with:
install.packages("rvest")
library(rvest)
Kenopage <- "
Keno <- Read.hmtl(Kenopage)
From there, I've been unable to progress, despite hours spend on
combinations of CSS and XPath calls with "html_notes."
Failed example: DrawNumber <- Keno %>% rvest::html_nodes("body") %>%
xml2::xml_find_all("//span[contains(@class,'Draw Number')]") %>%
rvest::html_text()
Someone mentioned using the V8 package in R, but it's new to me.
How do I get started?
--
Julio Farach
https://www.linkedin.com/in/farach
cell phone: 804/363-2161
email: JFarach at gmail.com
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- John Kane Kingston ON Canada
-- Julio Farach https://www.linkedin.com/in/farach cell phone: 804/363-2161 email: JFarach at gmail.com
-- Julio Farach https://www.linkedin.com/in/farach cell phone: 804/363-2161 email: JFarach at gmail.com
-- John Kane Kingston ON Canada
-- Julio Farach https://www.linkedin.com/in/farach cell phone: 804/363-2161 email: JFarach at gmail.com
John Kane Kingston ON Canada [[alternative HTML version deleted]]