Skip to content
Prev 327520 / 398502 Next

Externalptr class to character class from (web) scrape

On 26/07/2013 12:43 PM, Nick McClure wrote:
You should use str() in cases like this. When I look at 
str(website.doc[[1]]) (after producing website.doc with scrape(), not 
parse()), I see

 > str(website.doc[[1]])
Classes 'HTMLInternalDocument', 'HTMLInternalDocument', 
'XMLInternalDocument', 'XMLAbstractDocument' <externalptr>
- attr(*, "headers")= Named chr [1:2] "<HTML><HEAD><meta 
http-equiv=\"content-type\" 
content=\"text/html;charset=utf-8\">\n<TITLE>302 
Moved</TITLE></HEAD><BODY>\n<H1>"| __truncated__ "</BODY></HTML>"
..- attr(*, "names")= chr [1:2] "<HTML><HEAD><meta 
http-equiv=\"content-type\" 
content=\"text/html;charset=utf-8\">\n<TITLE>302 
Moved</TITLE></HEAD><BODY>\n<H1>"| __truncated__ "</BODY></HTML>"

So it is an external pointer with a number of classes. One or more of 
those will have a print method. methods(print) will list all the print 
methods, and I see there's a (hidden) print.XMLInternalDocument method 
somewhere. Then

 > getAnywhere("print.XMLInternalDocument")
A single object matching ?print.XMLInternalDocument? was found
It was found in the following places
registered S3 method for print from namespace XML
namespace:XML
with value

function (x, ...)
{
cat(as(x, "character"), "\n")
}
<environment: namespace:XML>

shows that the as() generic should work, even though as.character() 
doesn't, and indeed as(website.doc[[1]], "character") does display 
something.

Duncan Murdoch