Skip to content
Prev 50037 / 63424 Next

How to access https page

On Tue, Mar 10, 2015 at 12:56 PM, Hui <hui.du at savvyrookies.com> wrote:

            
There is an additional complication here that linkedin doesn't want you to
scrape the website and denies requests form non-browser clients. To get
around this you need to set the "User-Agent" header to something that looks
like a browser. Try this:

devtools::install_github("jeroenooms/curl")
h <- new_handle()
handle_setheaders(h, "User-Agent" = "Mozilla/5.0 (Windows NT 6.3; rv:36.0)
Gecko/20100101 Firefox/36.0")
txt <- readLines(curl("https://www.linkedin.com/in/huidu", handle = h))

  
  
Message-ID: <CABFfbXvWVVwe5gLjzynfZMeo-y0g8no4wXBvPErWfQURzMUJ+Q@mail.gmail.com>
In-Reply-To: <7EAE98EE-9247-4542-A956-ABD04BE29D30@savvyrookies.com>