Extract CRU data

4 messages · Barry Rowlingson, Grzegorz Sapijaszko, Miluji Sb

Original

1

4

Miluji Sb

Tue, Jan 24, 2023 3:13 AM #

Greetings everyone,

I have a question on extracting country-level data from CRU (
https://crudata.uea.ac.uk/cru/data/hrg/cru_ts_4.06/crucy.2205251923.v4.06/countries/tmp/).
The data for each variable are available for individual countries and I am
struggling to download all of them. Can I extract all the files in R then
merge? Thanks so much.

Best,

Milu

Barry Rowlingson

Tue, Jan 24, 2023 4:44 AM #

Are you asking if there's a way to automate the download of a list of links
from that page? You could write an R script to get the HTML, then find all
the HTML <A> tags, and then get the URLs in the link addresses, and there's
packages for doing this kind of web scraping.

But for this kind of thing it might be easier to use a web browser add-on -
I have "Down Them All" set up on Firefox, and with a click or two I can get
a list of all the link URLs and hit a button that downloads everything to a
single folder. Once done, I can use standard R functions to list all the
downloaded files and read them. Took about 20 seconds to do for this page,
and now I have a folder of 292 .tmp.per files.

Barry

On Tue, Jan 24, 2023 at 11:13 AM Miluji Sb <milujisb at gmail.com> wrote:

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Grzegorz Sapijaszko

Tue, Jan 24, 2023 9:29 AM #

On Tue, 2023-01-24 at 12:13 +0100, Miluji Sb wrote:

Something like:

To get all links/filenames in one table:

a <-
rvest::read_html("https://crudata.uea.ac.uk/cru/data/hrg/cru_ts_4.06/crucy.2205251923.v4.06/countries/tmp/
") 

tbl <- a |>
  rvest::html_table() |>
  as.data.frame()

tbl <- tbl[-c(1,2),]

To download them all to specific directory

my_download_function <- function(myurl ="", output_dir = "data") {
  if(!dir.exists({{output_dir}})) {dir.create({{output_dir}})}
  .destfile = paste0({{output_dir}}, "/", {{myurl}})
  .myurl <-
paste0("https://crudata.uea.ac.uk/cru/data/hrg/cru_ts_4.06/crucy.2205251923.v4.06/countries/tmp/
", {{myurl}})
  download.file(url = .myurl, destfile = .destfile, method = "wget",
extra = "-c --progress=bar:force")  
  NULL
}

invisible(lapply(seq(nrow(tbl)), function(i)
my_download_function(tbl[i,1], "data")))

Now, having it locally you can read them one by one with read.csv,
like:

f <- list.files(path = "data", pattern = "crucy*", full.names = TRUE)
read.csv(f[i], skip = 3, header = TRUE)

It doesn't make sense without adding additional information about
country/territotry, but at least you have starting point.

Regards,
Grzegorz

Miluji Sb

Wed, Jan 25, 2023 3:02 AM #

That indeed was the most efficient solution. Thank you!

On Tue, Jan 24, 2023 at 1:45 PM Barry Rowlingson <b.rowlingson at gmail.com>
wrote:

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo