Skip to content

RCurl and Google Scholar's EndNote references

3 messages · Jarno Tuimala, Duncan Temple Lang

#
Hi Jarno

You've only told us half the story. You didn't show how you
i) performed the original query
ii) retrieved the URL you used in subsequent queries


But I can suggest two possible problems.

a) specifying the cookiejar option tells libcurl where to write the
   cookies that the particular curl handle has collected during its life.
   These are written when the curl handle is destroyed.
   So that wouldn't change the getURL() operation, just change what happens
   when the curl handle is destroyed.

b) You probably mean to use cookiefile rather than cookiejar so that
   the curl request would read existing cookies from a file.
   But in that case, how did that file get created with the correct cookies.

c) libcurl will collect cookies in a curl handle as it receives them from a server
   as part of a response. And it will use these in subsequent requests to that server.
   But you must be using the same curl handle.  Different curl handles are entirely
   independent (unless one is copied from another).
   So a possible solution may be that you need to do the initial query with the same
   curl handle


So I would try something like

curl = getCurlHandle()
z = getForm("http://scholar.google.com/scholar", q ='Frank Harrell', hl = 'en', btnG = 'Search',
              .opts = list(verbose = TRUE), curl = curl)

dd = htmlParse(z)
links = getNodeSet(dd, "//a[@href]")

# do something to identify the link you want

tmp = getURL(linkIWant, curl = curl)


Note that we are using the same curl object in both requests.


This may not do what you want, but if you let us know the details
about how you are doing the preceding steps, we should be able to sort
things out.

  D.
Jarno Tuimala wrote: