How to access https page
On Tue, Mar 10, 2015 at 12:56 PM, Hui <hui.du at savvyrookies.com> wrote:
Thanks. However I got http error 999.
There is an additional complication here that linkedin doesn't want you to
scrape the website and denies requests form non-browser clients. To get
around this you need to set the "User-Agent" header to something that looks
like a browser. Try this:
devtools::install_github("jeroenooms/curl")
h <- new_handle()
handle_setheaders(h, "User-Agent" = "Mozilla/5.0 (Windows NT 6.3; rv:36.0)
Gecko/20100101 Firefox/36.0")
txt <- readLines(curl("https://www.linkedin.com/in/huidu", handle = h))
Hui Sent from my iPhone On Mar 10, 2015, at 12:07 PM, Jeroen Ooms <jeroen.ooms at stat.ucla.edu> wrote: On Mon, Mar 9, 2015 at 3:39 PM, Hui Du <hui.du at savvyrookies.com> wrote:
readLines(url)
Error in file(con, "r") : cannot open the connection In addition: Warning message: In file(con, "r") : unsupported URL scheme
Try: library(curl) readLines(curl(url))