Skip to content
Back to formatted view

Raw Message

Message-ID: <51003292.1060607@sapo.pt>
Date: 2013-01-23T18:57:22Z
From: Rui Barradas
Subject: extracting characters from a string
In-Reply-To: <51002CF5.3070908@sapo.pt>

Hello,

I've just noticed that my first solution would only return the first set 
of alphabetic characters, such as "Van", not "Van den Hoops".
The following will solve that problem.


fun2 <- function(x, sep = ", "){
	x <- strsplit(x, sep)
	m <- lapply(x, function(y) gregexpr(" [[:alpha:]]*$", y))
	res <- lapply(seq_along(x), function(i)
		regmatches(x[[i]], m[[i]], invert = TRUE))
	res <- lapply(res, unlist)
	lapply(res, function(y) y[nchar(y) > 0])
}
fun2(pub)


Hope this helps,

Rui Barradas

Em 23-01-2013 18:33, Rui Barradas escreveu:
> Hello,
>
> Try the following.
>
> fun <- function(x, sep = ", "){
>      s <- unlist(strsplit(x, sep))
>      regmatches(s, regexpr("[[:alpha:]]*", s))
> }
>
> fun(pub)
>
>
> Hope this helps,
>
> Rui Barradas
>
> Em 23-01-2013 17:38, Biau David escreveu:
>> Dear All,
>>
>> I have a data frame of vectors of publication names such as 'pub':
>>
>> pub1 <- c('Brown DK, Santos R, Rome DF, Don Juan X')
>> pub2 <- c('Benigni D')
>> pub3 <- c('Arstra SD, Van den Hoops DD, lamarque D')
>>
>> pub <- rbind(pub1, pub2, pub3)
>>
>>
>> I would like to construct a dataframe with only author's last name and
>> each last name in columns and the publication in rows. Basically I
>> want to get rid of the initials (max 2, always before a comma) and
>> spaces surounding last name. I would like to avoid a loop.
>>
>> ps: If I could have even a short explanation of the code that extract
>> the values of the character string that would also be great!
>>
>>
>> David
>>
>>     [[alternative HTML version deleted]]
>>
>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.