Creating a dataframe from a vector of character strings

4 messages · Cliff Clive, Tóth Dénes, Rolf Turner +1 more

Original

1

4

Cliff Clive

Thu, Apr 14, 2011 2:04 PM #

I have a vector of character strings that I would like to split in two, and
place in columns of a dataframe.

So for example, I start with this:

beatles <- c("John Lennon", "Paul McCartney", "George Harrison", "Ringo
Starr")

and I want to end up with a data frame that looks like this:

lastName=c("Lennon", "McCartney", "Harrison",
"Starr"))

firstName  lastName
1      John    Lennon
2      Paul McCartney
3    George  Harrison
4     Ringo     Starr


I tried string-splitting the first vector on the spaces between first and
last names, and it returned a list:

[[1]]
[1] "John"   "Lennon"

[[2]]
[1] "Paul"      "McCartney"

[[3]]
[1] "George"   "Harrison"

[[4]]
[1] "Ringo" "Starr"


Is there a fast way to convert this list into a data frame?  Right now all I
can think of is using a for loop, which I would like to avoid, since the
real application I am working on involves a much larger dataset.

--
View this message in context: http://r.789695.n4.nabble.com/Creating-a-dataframe-from-a-vector-of-character-strings-tp3450716p3450716.html
Sent from the R help mailing list archive at Nabble.com.

Tóth Dénes

Thu, Apr 14, 2011 3:33 PM #

You could use ?unlist:

structure(data.frame(
    matrix(unlist(strsplit(beatles," ")),length(beatles),2,T)),
    names=c("FirstName","LastName"))

Note that this compact code does not guard you against typos, that is
names with >2 or <2 elements.

Hope that helps,
Denes

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Rolf Turner

Thu, Apr 14, 2011 3:41 PM #

On 15/04/11 09:04, Cliff Clive wrote:

Whenever you think of using a for loop, stop and think about using
some flavour of apply() instead:

melvin <- strsplit(beatles," ")
clyde <- data.frame(firstName=sapply(melvin,function(x){x[1]}),
                                     
lastName=sapply(melvin,function(x){x[2]}))

     cheers,

             Rolf Turner

Brian Diggs

Thu, Apr 14, 2011 3:55 PM #

On 4/14/2011 2:04 PM, Cliff Clive wrote:

Another approach, in addition to the ones you have already been given, 
is to use the colsplit function in the reshape package.  This is the 
sort of thing it is designed to do.

library("reshape")
colsplit(beatles, " ", names=c("firstName", "lastName"))

Similar caveats apply, though, in that it assumes only 2 names that are 
separated by one space (and will give a warning if that is not the case).

Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health & Science University