Skip to content

extracting characters from a string

6 messages · Biau David, arun, Bert Gunter +1 more

#
Hi,
You could try this:
dat1<-read.table(text=pub,sep=",",fill=TRUE,stringsAsFactors=F)
dat2<- as.data.frame(do.call(cbind,lapply(dat1,function(x) gsub(" $","",gsub("^ |\\w+$","",x)))),stringsAsFactors=F)


?dat2
#??????? V1????????????? V2???????? V3???????? V4
#1?? Brown????????? Santos?????? Rome?? Don Juan 
#2 Benigni?????????????????????????????????????? 
#3? Arstra?? Van den Hoops?? lamarque?????? 
A.K.

----- Original Message -----
From: Biau David <djmbiau at yahoo.fr>
To: r help list <r-help at r-project.org>
Cc: 
Sent: Wednesday, January 23, 2013 12:38 PM
Subject: [R] extracting characters from a string

Dear All,

I have a data frame of vectors of publication names such as 'pub':

pub1 <- c('Brown DK, Santos R, Rome DF, Don Juan X')
pub2 <- c('Benigni D')
pub3 <- c('Arstra SD, Van den Hoops DD, lamarque D')

pub <- rbind(pub1, pub2, pub3)


I would like to construct a dataframe with only author's last name and each last name in columns and the publication in rows. Basically I want to get rid of the initials (max 2, always before a comma) and spaces surounding last name. I would like to avoid a loop.

ps: If I could have even a short explanation of the code that extract the values of the character string that would also be great!

?
David

??? [[alternative HTML version deleted]]


______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
1. Study a regular expression tutorial on the web to learn how to do this.

2. ?regex in R summarizes (tersely! -- but clearly) R's regex's.

3. ?grep tells you about R's regular expression manipulation functions.

-- Bert
On Wed, Jan 23, 2013 at 9:38 AM, Biau David <djmbiau at yahoo.fr> wrote:

  
    
#
Hello,

Try the following.

fun <- function(x, sep = ", "){
	s <- unlist(strsplit(x, sep))
	regmatches(s, regexpr("[[:alpha:]]*", s))
}

fun(pub)


Hope this helps,

Rui Barradas

Em 23-01-2013 17:38, Biau David escreveu:
#
Hello,

I've just noticed that my first solution would only return the first set 
of alphabetic characters, such as "Van", not "Van den Hoops".
The following will solve that problem.


fun2 <- function(x, sep = ", "){
	x <- strsplit(x, sep)
	m <- lapply(x, function(y) gregexpr(" [[:alpha:]]*$", y))
	res <- lapply(seq_along(x), function(i)
		regmatches(x[[i]], m[[i]], invert = TRUE))
	res <- lapply(res, unlist)
	lapply(res, function(y) y[nchar(y) > 0])
}
fun2(pub)


Hope this helps,

Rui Barradas

Em 23-01-2013 18:33, Rui Barradas escreveu: