Skip to content

How to apitalize leading letters & else of personal names?

4 messages · RockO, Ben Bolker, David Winsemius

#
Dear R world,

Do you know about a function that would capitalize in the correct manner
first and family names?
I found in the cwhmisc only the CapLeading function, but it just does not do
the job, taking care only to capitalize the first letter of a word.

I am looking for a function that would recognize " |'|Mc|-" and capitalize
the first letter following these characters.

An example:
names<-c("jean-francois st-john","helene o'donnel", "joe mcintyre")

Desired result:
Thanks,

Rock
#
RockO <rock.ouimet <at> gmail.com> writes:
This is pretty tricky.  gsub() can do some pretty slick things,
including replace with capitalized versions, so you could probably
write a gsub string to capitalize letters appearing at the beginning
of words OR after non-alphabetic characters.  (See the end of the
examples in ?gsub ...) 

"McIntyre" represents
a whole other class of difficulty.  Some Scots capitalize after "Mc",
others don't.  And what about all the rules about capitalization (or not)
after de/du/van/von?  What would you do with a Dutch name like "'t Hooft" ... ?
#
On Dec 14, 2010, at 9:00 PM, RockO wrote:

            
Here are four individually crafted gsub functions that could be  
serially applied:

 > gsub("^([a-z])", "\\U\\1", names, perl=TRUE)
[1] "Jean-francois st-john" "Helene o'donnel"       "Joe mcintyre"

 > gsub(" ([a-z])", " \\U\\1", names, perl=TRUE)
[1] "jean-francois St-john" "helene O'donnel"       "joe Mcintyre"

 > gsub("\\-([a-z])", "-\\U\\1", names, perl=TRUE)
[1] "jean-Francois st-John" "helene o'donnel"       "joe mcintyre"

 > gsub("\\'([a-z])", "'\\U\\1", names, perl=TRUE)
[1] "jean-francois st-john" "helene o'Donnel"       "joe mcintyre"


 > t2 <- gsub("^([a-z])", "\\U\\1", names, perl=TRUE)
 > t2 <- gsub(" ([a-z])", " \\U\\1", t2, perl=TRUE)
 > t2 <- gsub("\\-([a-z])", "-\\U\\1", t2, perl=TRUE)
 > t2 <-  gsub("\\'([a-z])", "'\\U\\1", t2, perl=TRUE)
 > t2
[1] "Jean-Francois St-John" "Helene O'Donnel"       "Joe Mcintyre"

Oooops forgot the mc:
 > gsub("Mc([a-z])", "Mc\\U\\1", t2, perl=TRUE)
[1] "Jean-Francois St-John" "Helene O'Donnel"       "Joe McIntyre"
#
David,

Thank you very much! Indeed Capitalizing names is very tricky, particularly
for people not having English -mother language (as I am). Hopefully, Using
your script will much better than simply having names in uppercase.

Happy Holidays!

Rock