Skip to content

match first consecutive list of capitalized words in string

4 messages · Richter-Dumke, Jonas, Peter Alspach, Gabor Grothendieck

1 day later
#
Tena koe Jonas

Something like the following may help, although you should probably read the help on regexpr regarding locales.

Names <- c("filia Maria", "vidua Joh Dirck Kleve (oo 02.02.1732)", "Bernardus Engelb Franciscus Linde j.u.Doktor referendarius sereniss Judex et gograven Rheinensis")
Names1 <- sub('^[0-9a-z]* ', '', Names)
Names1
ttReg <- regexpr(' [^A-Z]', Names1)
ifelse (ttReg>0, substring(Names1, 1, regexpr(' [^A-Z]', Names1)-1), Names1)

Incidentally, it is not good practice to call your objects 'names' since that is a function in R.

HTH ....

Peter Alspach
The contents of this e-mail are confidential and may be subject to legal privilege.
 If you are not the intended recipient you must not use, disseminate, distribute or
 reproduce all or any part of this e-mail or attachments.  If you have received this
 e-mail in error, please notify the sender and delete all material pertaining to this
 e-mail.  Any opinion or views expressed in this e-mail are those of the individual
 sender and may not represent those of The New Zealand Institute for Plant and
 Food Research Limited.
#
On Tue, Nov 8, 2011 at 7:48 AM, Richter-Dumke, Jonas
<Richter at demogr.mpg.de> wrote:
Try this. It matches a word boundary followed by zero or more of the
parenthesized expression.  That expression is an upper case letter
followed by zero or more lower case letters followed by one or more
spaces.  Finally we match the last word which consists of an upper
case letter followed by zero or more lower case letters and a word
boundary.  Note that it assumes R 2.14.0 or later:
[1] "Maria"                             "Joh Dirck Kleve"
[3] "Bernardus Engelb Franciscus Linde"
1 day later