Skip to content
Prev 360845 / 398506 Next

web scraping tables generated in multiple server pages

Hey David,

I'm on a Mac as well but have never had to tweak anything to get
[R]Selenium to work (but this is one reason I try to avoid solutions
involving RSelenium as they are pretty fragile IMO).

The site itself has "P?gina 1 de 69" at the top which is where i got
the "69" from and I just re-ran the code in a 100% clean env (on a
completely different Mac) and it worked fine.

I did neglect to put my session info up before (apologies):

    Session info
------------------------------------------------------------------------------------
     setting  value
     version  R version 3.3.0 RC (2016-05-01 r70572)
     system   x86_64, darwin13.4.0
     ui       RStudio (0.99.1172)
     language (EN)
     collate  en_US.UTF-8
     tz       America/New_York
     date     2016-05-11

    Packages ----------------------------------------------------------------------------------------
     package    * version  date       source
     assertthat   0.1      2013-12-06 CRAN (R 3.3.0)
     bitops     * 1.0-6    2013-08-17 CRAN (R 3.3.0)
     caTools      1.17.1   2014-09-10 CRAN (R 3.3.0)
     DBI          0.4      2016-05-02 CRAN (R 3.3.0)
     devtools   * 1.11.1   2016-04-21 CRAN (R 3.3.0)
     digest       0.6.9    2016-01-08 CRAN (R 3.3.0)
     dplyr      * 0.4.3    2015-09-01 CRAN (R 3.3.0)
     httr         1.1.0    2016-01-28 CRAN (R 3.3.0)
     magrittr     1.5      2014-11-22 CRAN (R 3.3.0)
     memoise      1.0.0    2016-01-29 CRAN (R 3.3.0)
     pbapply    * 1.2-1    2016-04-19 CRAN (R 3.3.0)
     R6           2.1.2    2016-01-26 CRAN (R 3.3.0)
     Rcpp         0.12.4   2016-03-26 CRAN (R 3.3.0)
     RCurl      * 1.95-4.8 2016-03-01 CRAN (R 3.3.0)
     RJSONIO    * 1.3-0    2014-07-28 CRAN (R 3.3.0)
     RSelenium  * 1.3.5    2014-10-26 CRAN (R 3.3.0)
     rvest      * 0.3.1    2015-11-11 CRAN (R 3.3.0)
     selectr      0.2-3    2014-12-24 CRAN (R 3.3.0)
     stringi      1.0-1    2015-10-22 CRAN (R 3.3.0)
     stringr      1.0.0    2015-04-30 CRAN (R 3.3.0)
     withr        1.0.1    2016-02-04 CRAN (R 3.3.0)
     XML        * 3.98-1.4 2016-03-01 CRAN (R 3.3.0)
     xml2       * 0.1.2    2015-09-01 CRAN (R 3.3.0)

(and, wow, does that tiny snippet of code end up using alot of pkgs)

I had actually started with smaller snippets to test. The code got
uglier due to the way the site paginates (it loads 10-entries worth of
data on to a single page but requires a server call for the next 10).

I also keep firefox scarily out-of-date (back in the 33's rev) b/c I
only use it with RSelenium (not a big fan of the browser). Let me
update to the 46-series and see if I can replicate.

-Bob
On Wed, May 11, 2016 at 1:48 PM, David Winsemius <dwinsemius at comcast.net> wrote: