Skip to content
Prev 316010 / 398513 Next

extracting characters from a string

HI David,


It could be related to spaces in the data or something else.? 
Suppose, if the data has some spaces at the end or the beginning.
pub1 <- c('Brown DK, Santos R, Rome DF, Don Juan X')
pub2 <- c('Benigni D')
pub3 <- c('Arstra SD, Van den Hoops DD, lamarque D ')

pubnew<-rbind(pub1, pub2, pub3)
res<-as.data.frame(do.call(cbind,lapply(dat1,function(x) gsub("^ | $","",gsub("[A-Za-z]+$","",gsub(" $","",x))))),stringsAsFactors=F)
str(res)
#'data.frame':??? 3 obs. of? 4 variables:
# $ V1: chr? "Brown" "Benigni" "Arstra"
# $ V2: chr? "Santos" "" "Van den Hoops"
# $ V3: chr? "Rome" "" "lamarque"
# $ V4: chr? "Don Juan" "" ""



#If I used the previous solution:
as.data.frame(do.call(cbind,lapply(dat1,function(x) gsub(" $","",gsub("^ |\\w+$","",x)))),stringsAsFactors=F)
?????? V1??????????? V2???????? V3?????? V4
1?? Brown??????? Santos?????? Rome Don Juan
2 Benigni????????????????????????????????? 
3? Arstra Van den Hoops lamarque D? # initial present.

I tried this case with Rui's solution:
fun2(pubnew)
#[[1]]
#[1] " Brown"?? "Santos"?? "Rome"???? "Don Juan"

#[[2]]
#[1] "Benigni"
#
#[[3]]
#[1] "Arstra"??????? "Van den Hoops" "lamarque D"?? # tinitials present.

As Rui's solution works for you, the problem might be something else.
A.K.


??????
Message-ID: <1359009454.42272.YahooMailNeo@web142606.mail.bf1.yahoo.com>
In-Reply-To: <1359006003.64736.YahooMailNeo@web172403.mail.ir2.yahoo.com>