Skip to content

html into R

4 messages · Nick Wray, Thierry Onkelinx, Rui Barradas

#
hello - I need to download flow data for Scottish river catchments.  The
data is available from the Scottish Environmental protection Agency body
and that doesn't present a problem.  For example the API beneath will
access the 96 flow recordings on the River Tweed on Jan 1st 2020 at one
station:

https://timeseries.sepa.org.uk/KiWIS/KiWIS?service=kisters&type=queryServices&datasource=0&request=getTimeseriesValues&ts_path=1/14972/Q/15m.Cmd&from=2020-01-01&to=2020-01-07&returnfields=Timestamp,Value,Quality%20Code


But this data comes as HTML.  I can copy and paste it into a text doc which
can then be read into R but that's slow and time-consuming.  I have tried
using the package "rvest" to import the HTML into R but I have got nowhere.

Can anyone give me any pointers as to how to do this?


Thanks Nick Wray
#
Dear Nick,

A better solution is to add "&format=json" to the URL. Then the query
returns the data in JSON format.

Best regards,

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx at inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////

<https://www.inbo.be>


Op vr 26 aug. 2022 om 10:44 schreef Nick Wray <nickmwray at gmail.com>:

  
  
#
Hello,

You can try the following. It worked with me.
Read from the link and post-process the html data extracting the element 
"table" and then the table itself.

This table has 3 rows before the actual table so the lapply below will 
get the table and its header.


library(httr)
library(rvest)


link <- 
"https://timeseries.sepa.org.uk/KiWIS/KiWIS?service=kisters&type=queryServices&datasource=0&request=getTimeseriesValues&ts_path=1/14972/Q/15m.Cmd&from=2020-01-01&to=2020-01-07&returnfields=Timestamp,Value,Quality%20Code"

page <- read_html(link)
page |>
   html_elements("table") |>
   html_table(header = TRUE) |>
   lapply(\(x) {
     hdr <- unlist(x[3, ])
     y <- x[-(1:3), ]
     names(y) <- hdr
     y
   })


Hope this helps,

Rui Barradas

?s 09:43 de 26/08/2022, Nick Wray escreveu:
#
Sorry, there's simpler code. I used html_elements (plural) and the 
result is a list. Use html_element (singular) and the output is a tibble.


page |>
   html_element("table") |>
   html_table(header = TRUE) |>
   (\(x) {
     hdr <- unlist(x[3, ])
     y <- x[-(1:3), ]
     names(y) <- hdr
     y
   })()


Hope this helps,

Rui Barradas

?s 11:53 de 26/08/2022, Rui Barradas escreveu: