Skip to content

reading text files directly into program from net

3 messages · Nick Wray, Ivan Krylov, Rui Barradas

#
Hello   I am working with daily rainfall data for the UK from 1915 onwards
from the Met Office.  The data is on this site:

https://data.ceda.ac.uk/badc/ukmo-midas-open/data/uk-daily-rain-obs/dataset-version-201901

There are many files for the different counties.  For example there are
seven station files for Berwickshire in this site:

https://data.ceda.ac.uk/badc/ukmo-midas-open/data/uk-daily-rain-obs/dataset-version-201901/berwickshire


The first station in this dataset
 has the name  00265_mertoun
<https://data.ceda.ac.uk/badc/ukmo-midas-open/data/uk-daily-rain-obs/dataset-version-201901/berwickshire/00265_mertoun>
which
is a code and location name, again for example, and inside is a text file

It's easy but time-consuming to download the files one by one - I am sure
that it is possible to read them directly into an R program but whatever
coding or path I try to use I get an error.  I can't find an example online
which I can tweak to allow me to do this.

can anyone help?  Thanks Nick Wray
#
On Sun, 25 Sep 2022 09:53:39 +0100
Nick Wray <nickmwray at gmail.com> wrote:

            
Following the link to the file
<https://dap.ceda.ac.uk/badc/ukmo-midas-open/data/uk-daily-rain-obs/dataset-version-201901/berwickshire/00265_mertoun/midas-open_uk-daily-rain-obs_dv-201901_00265_mertoun_capability.csv?download=1>,
I get a login prompt. Same thing probably happens to R when it tries to
download those files.

Does CEDA Archive have an API for programmatic access? If not, you'll
either have to export the cookies from your browser and use the curl
package to send HTTP requests with those included, or use the developer
toolbar in your browser to find out how the login request is sent and
use the curl package to (1) send the login request, (2) receive cookies
and (3) use those cookies to download files. This is called "website
scraping" and may be brittle, depending on how much the website
administrators dislike bots.

Looking at the documentation, it seems that the datasets may be
available via FTP: https://help.ceda.ac.uk/article/280-ftp

It should be possible to use the curl package to download the files.
Depending on how R is built, it could also be possible to feed the FTP
URL directly to read.csv, if you put the username and the password
inside it: ftp://username:password at ftp-server.hostname/path/to/file.csv
#
Hello,

Inline.

?s 09:53 de 25/09/2022, Nick Wray escreveu:
The file is a csv file and there are also two directories

qc-version-0  empty
qc-version-1  has 10 csv files spanning the years 1961-1970


Which files do you want, these last ones?

Hope this helps,

Rui Barradas