An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130123/8bf203c7/attachment.pl>
extracting characters from a string
6 messages · Biau David, arun, Bert Gunter +1 more
Hi,
You could try this:
dat1<-read.table(text=pub,sep=",",fill=TRUE,stringsAsFactors=F)
dat2<- as.data.frame(do.call(cbind,lapply(dat1,function(x) gsub(" $","",gsub("^ |\\w+$","",x)))),stringsAsFactors=F)
?dat2
#??????? V1????????????? V2???????? V3???????? V4
#1?? Brown????????? Santos?????? Rome?? Don Juan
#2 Benigni??????????????????????????????????????
#3? Arstra?? Van den Hoops?? lamarque??????
A.K.
----- Original Message -----
From: Biau David <djmbiau at yahoo.fr>
To: r help list <r-help at r-project.org>
Cc:
Sent: Wednesday, January 23, 2013 12:38 PM
Subject: [R] extracting characters from a string
Dear All,
I have a data frame of vectors of publication names such as 'pub':
pub1 <- c('Brown DK, Santos R, Rome DF, Don Juan X')
pub2 <- c('Benigni D')
pub3 <- c('Arstra SD, Van den Hoops DD, lamarque D')
pub <- rbind(pub1, pub2, pub3)
I would like to construct a dataframe with only author's last name and each last name in columns and the publication in rows. Basically I want to get rid of the initials (max 2, always before a comma) and spaces surounding last name. I would like to avoid a loop.
ps: If I could have even a short explanation of the code that extract the values of the character string that would also be great!
?
David
??? [[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
1. Study a regular expression tutorial on the web to learn how to do this. 2. ?regex in R summarizes (tersely! -- but clearly) R's regex's. 3. ?grep tells you about R's regular expression manipulation functions. -- Bert
On Wed, Jan 23, 2013 at 9:38 AM, Biau David <djmbiau at yahoo.fr> wrote:
Dear All,
I have a data frame of vectors of publication names such as 'pub':
pub1 <- c('Brown DK, Santos R, Rome DF, Don Juan X')
pub2 <- c('Benigni D')
pub3 <- c('Arstra SD, Van den Hoops DD, lamarque D')
pub <- rbind(pub1, pub2, pub3)
I would like to construct a dataframe with only author's last name and each last name in columns and the publication in rows. Basically I want to get rid of the initials (max 2, always before a comma) and spaces surounding last name. I would like to avoid a loop.
ps: If I could have even a short explanation of the code that extract the values of the character string that would also be great!
David
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
Hello,
Try the following.
fun <- function(x, sep = ", "){
s <- unlist(strsplit(x, sep))
regmatches(s, regexpr("[[:alpha:]]*", s))
}
fun(pub)
Hope this helps,
Rui Barradas
Em 23-01-2013 17:38, Biau David escreveu:
Dear All,
I have a data frame of vectors of publication names such as 'pub':
pub1 <- c('Brown DK, Santos R, Rome DF, Don Juan X')
pub2 <- c('Benigni D')
pub3 <- c('Arstra SD, Van den Hoops DD, lamarque D')
pub <- rbind(pub1, pub2, pub3)
I would like to construct a dataframe with only author's last name and each last name in columns and the publication in rows. Basically I want to get rid of the initials (max 2, always before a comma) and spaces surounding last name. I would like to avoid a loop.
ps: If I could have even a short explanation of the code that extract the values of the character string that would also be great!
David
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hello,
I've just noticed that my first solution would only return the first set
of alphabetic characters, such as "Van", not "Van den Hoops".
The following will solve that problem.
fun2 <- function(x, sep = ", "){
x <- strsplit(x, sep)
m <- lapply(x, function(y) gregexpr(" [[:alpha:]]*$", y))
res <- lapply(seq_along(x), function(i)
regmatches(x[[i]], m[[i]], invert = TRUE))
res <- lapply(res, unlist)
lapply(res, function(y) y[nchar(y) > 0])
}
fun2(pub)
Hope this helps,
Rui Barradas
Em 23-01-2013 18:33, Rui Barradas escreveu:
Hello,
Try the following.
fun <- function(x, sep = ", "){
s <- unlist(strsplit(x, sep))
regmatches(s, regexpr("[[:alpha:]]*", s))
}
fun(pub)
Hope this helps,
Rui Barradas
Em 23-01-2013 17:38, Biau David escreveu:
Dear All,
I have a data frame of vectors of publication names such as 'pub':
pub1 <- c('Brown DK, Santos R, Rome DF, Don Juan X')
pub2 <- c('Benigni D')
pub3 <- c('Arstra SD, Van den Hoops DD, lamarque D')
pub <- rbind(pub1, pub2, pub3)
I would like to construct a dataframe with only author's last name and
each last name in columns and the publication in rows. Basically I
want to get rid of the initials (max 2, always before a comma) and
spaces surounding last name. I would like to avoid a loop.
ps: If I could have even a short explanation of the code that extract
the values of the character string that would also be great!
David
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130124/50d61354/attachment.pl>