Skip to content
Back to formatted view

Raw Message

Message-ID: <7800973b-6799-4360-bd69-8317d9cbd5dc@e27g2000yqd.googlegroups.com>
Date: 2009-11-25T17:12:41Z
From: Tony
Subject: XML package example code?
In-Reply-To: <366c6f340911250821h1c8a1212h5a4c85ad6e7be294@mail.gmail.com>

Not sure if my code was attached in that last post:

library(RCurl)
library(XML)
html <- getURL("http://www.omegahat.org/RSXML/index.html")
html.tree <- htmlTreeParse(html, useInternalNodes = TRUE, error =
function(...){})


On 25 Nov, 16:21, Peng Yu <pengyu... at gmail.com> wrote:
> On Wed, Nov 25, 2009 at 12:19 AM, cls59 <ch... at sharpsteen.net> wrote:
>
> > Peng Yu wrote:
>
> >> I'm interested in parsing an html page. I should use XML, right? Could
> >> you somebody show me some example code? Is there a tutorial for this
> >> package?
>
> > Did you try looking through the help pages for the XML package or browsing
> > the Omegahat website?
>
> > Look at:
>
> > ?library(XML)
> > ??htmlTreeParse
>
> > And the relevant web page for documentation and examples is:
>
> > ?http://www.omegahat.org/RSXML/
>
> http://www.omegahat.org/RSXML/shortIntro.html
>
> I'm trying the example on the above webpage. But I'm not sure why I
> got the following error. Would you help to take a look?
>
> $ Rscript main.R> library(XML)
>
> > download.file('http://www.omegahat.org/RSXML/index.html','index.html')
>
> trying URL 'http://www.omegahat.org/RSXML/index.html'
> Content type 'text/html; charset=ISO-8859-1' length 3021 bytes
> opened URL
> ==================================================
> downloaded 3021 bytes
>
>
>
> > doc = xmlInternalTreeParse("index.html")
>
> Opening and ending tag mismatch: dd line 68 and dl
> Opening and ending tag mismatch: li line 67 and body
> Opening and ending tag mismatch: dt line 66 and html
> Premature end of data in tag dd line 64
> Premature end of data in tag li line 63
> Premature end of data in tag dt line 62
> Premature end of data in tag dl line 61
> Premature end of data in tag body line 5
> Premature end of data in tag html line 1
> Error: 1: Opening and ending tag mismatch: dd line 68 and dl
> 2: Opening and ending tag mismatch: li line 67 and body
> 3: Opening and ending tag mismatch: dt line 66 and html
> 4: Premature end of data in tag dd line 64
> 5: Premature end of data in tag li line 63
> 6: Premature end of data in tag dt line 62
> 7: Premature end of data in tag dl line 61
> 8: Premature end of data in tag body line 5
> 9: Premature end of data in tag html line 1
> Execution halted
>
> ______________________________________________
> R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.