[R-pkg-devel] ORCID ID finder via tools::CRAN_package_db() ?
Looking into one particular example, https://github.com/seabbs/idmodelr/blob/master/DESCRIPTION this appears to be the authors' fault: Authors at R: c( person(given = "Sam Abbott", role = c("aut", "cre"), email = "contact at samabbott.co.uk", comment = c(ORCID = "0000-0001-8057-8037")), person(given = "Akira Endo", role = c("aut"), email = "akira.endo at lshtm.ac.uk", comment = c(ORCID = "0000-0001-6377-7296"))) Maybe CRAN should start checking for missing 'family' fields in Authors at R ... ??? cheers Ben Bolker
On 2024-08-20 9:47 a.m., Kurt Hornik wrote:
Kurt Hornik writes:
The variant attaches drops the URL and does unique. Hmm, the ones in head(with(a, sort_by(a, ~ family + given)), 100) without a family look suspicious ... Best -k
Dirk Eddelbuettel writes:
On 20 August 2024 at 07:57, Dirk Eddelbuettel wrote:
|
| Hi Kurt,
|
| On 20 August 2024 at 14:29, Kurt Hornik wrote:
| | I think for now you could use something like what I attach below.
| |
| | Not ideal: I had not too long ago starting adding orcidtools.R to tools,
| | which e.g. has .persons_from_metadata(), but that works on the unpacked
| | sources and not the CRAN package db. Need to think about that ...
|
| We need something like that too as I fat-fingered the string 'ORCID'. See
| fortune::fortunes("Dirk can type").
|
| Will the function below later. Many thanks for sending it along.
Very nice. Resisted my common impulse to make it a data.table for easy sorting via keys etc. After running your code the line
head(with(a, sort_by(a, ~ family + given)), 100)
shows that we need a bit more QA as person entries are not properly split between 'family' and 'given', use the URL and that we have repeats. Excluding those is next.
Right. One should canonicalize the ORCID (having the URLs is from being nice) and then do unique() ...
Best -k
Dirk
| Dirk
|
| |
| | Best
| | -k
| |
| | ********************************************************************
| | x <- tools::CRAN_package_db()
| | a <- lapply(x[["Authors at R"]],
| | function(a) {
| | if(!is.na(a)) {
| | a <- tryCatch(utils:::.read_authors_at_R_field(a),
| | error = identity)
| | if (inherits(a, "person"))
| | return(a)
| | }
| | NULL
| | })
| | a <- do.call(c, a)
| | a <- lapply(a,
| | function(e) {
| | if(is.null(o <- e$comment["ORCID"]) || is.na(o))
| | return(NULL)
| | cbind(given = paste(e$given, collapse = " "),
| | family = paste(e$family, collapse = " "),
| | oid = unname(o))
| | })
| | a <- as.data.frame(do.call(rbind, a))
| | ********************************************************************
| |
| | > Salut Thierry,
| |
| | > On 20 August 2024 at 13:43, Thierry Onkelinx wrote:
| | > | Happy to help. I'm working on a new version of the checklist package. I could
| | > | export the function if that makes it easier for you.
| |
| | > Would be happy to help / iterate. Can you take a stab at making the
| | > per-column split more robust so that we can bulk-process all non-NA entries
| | > of the returned db?
| |
| | > Best, Dirk
| |
| | > --
| | > dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
|
| --
| dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
-- dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
______________________________________________ R-package-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Dr. Benjamin Bolker Professor, Mathematics & Statistics and Biology, McMaster University Director, School of Computational Science and Engineering > E-mail is sent at my convenience; I don't expect replies outside of working hours.