Dear all, how can I perform a string operation like strsplit(x," ") on a column of a dataframe, and put the first or the second item of the split into a new dataframe column? (so that on each row it is consistent) Thanks Boris
dataframe: string operations on columns
9 messages · boris pezzatti, Hadley Wickham, Peter Ehlers +4 more
Hi,
I guess it's not the nicest way to do it, but it should work for you:
#create some sample data
df <- data.frame(a=c("A B", "C D", "A C", "A D", "B D"),
stringsAsFactors=FALSE)
#split the column by space
df_split <- strsplit(df$a, split=" ")
#place the first element into column a1 and the second into a2
for (i in 1:length(df_split[[1]])){
df[i+1] <- unlist(lapply(df_split, FUN=function(x) x[i]))
names(df)[i+1] <- paste("a",i,sep="")
}
I hope people will give you more compact solutions.
HTH,
Ivan
Le 1/18/2011 16:30, boris pezzatti a ?crit :
Dear all, how can I perform a string operation like strsplit(x," ") on a column of a dataframe, and put the first or the second item of the split into a new dataframe column? (so that on each row it is consistent) Thanks Boris
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. S?ugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calandra at uni-hamburg.de ********** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php
how can I perform a string operation like strsplit(x," ") ?on a column of a dataframe, and put the first or the second item of the split into a new dataframe column? (so that on each row it is consistent)
Have a look at str_split_fixed in the stringr package. Hadley
Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/
On 2011-01-18 08:14, Ivan Calandra wrote:
Hi,
I guess it's not the nicest way to do it, but it should work for you:
#create some sample data
df<- data.frame(a=c("A B", "C D", "A C", "A D", "B D"),
stringsAsFactors=FALSE)
#split the column by space
df_split<- strsplit(df$a, split=" ")
#place the first element into column a1 and the second into a2
for (i in 1:length(df_split[[1]])){
df[i+1]<- unlist(lapply(df_split, FUN=function(x) x[i]))
names(df)[i+1]<- paste("a",i,sep="")
}
I hope people will give you more compact solutions.
HTH,
Ivan
You can replace the loop with
df <- transform(df, a1 = sapply(df_split, "[[", 1),
a2 = sapply(df_split, "[[", 2))
Peter Ehlers
Le 1/18/2011 16:30, boris pezzatti a ?crit :
Dear all, how can I perform a string operation like strsplit(x," ") on a column of a dataframe, and put the first or the second item of the split into a new dataframe column? (so that on each row it is consistent) Thanks Boris
On 2011-01-18 08:14, Ivan Calandra wrote:
Hi,
I guess it's not the nicest way to do it, but it should work for you:
#create some sample data
df<- data.frame(a=c("A B", "C D", "A C", "A D", "B D"),
stringsAsFactors=FALSE)
#split the column by space
df_split<- strsplit(df$a, split=" ")
#place the first element into column a1 and the second into a2
for (i in 1:length(df_split[[1]])){
df[i+1]<- unlist(lapply(df_split, FUN=function(x) x[i]))
names(df)[i+1]<- paste("a",i,sep="")
}
I hope people will give you more compact solutions.
HTH,
Ivan
You can replace the loop with
df <- transform(df, a1 = sapply(df_split, "[[", 1),
a2 = sapply(df_split, "[[", 2))
df <- cbind(df, do.call(rbind, df_split) seems to do the same (up to column names) but faster. However, all the solutions rely on there being exactly two strings when you split. The different solutions behave differently if this assumption is violated and none of them really checks this. You can, for instance, check this with all(sapply(df_split, length) == 2) Best, Niels R. Hansen
Peter Ehlers
Le 1/18/2011 16:30, boris pezzatti a ?crit :
Dear all, how can I perform a string operation like strsplit(x," ") on a column of a dataframe, and put the first or the second item of the split into a new dataframe column? (so that on each row it is consistent) Thanks Boris
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Assuming every row is split into exactly two values by whatever string
you choose as split, one fancy exercise in R data structures is
dfsplit = function(df, split)
as.data.frame(
t(
structure(dim=c(2, nrow(df)),
unlist(
strsplit(split=split,
as.matrix(df))))))
so that if your data frame is
df = data.frame(c('1 2', '3 4', '5 6'))
then
dfsplit(df, ' ')
# V1 V2
# 1 1 2
# 2 3 4
# 3 5 6
renaming the columns left as an exercise.
vQ
On 01/18/2011 05:22 PM, Peter Ehlers wrote:
On 2011-01-18 08:14, Ivan Calandra wrote:
Hi,
I guess it's not the nicest way to do it, but it should work for you:
#create some sample data
df<- data.frame(a=c("A B", "C D", "A C", "A D", "B D"),
stringsAsFactors=FALSE)
#split the column by space
df_split<- strsplit(df$a, split=" ")
#place the first element into column a1 and the second into a2
for (i in 1:length(df_split[[1]])){
df[i+1]<- unlist(lapply(df_split, FUN=function(x) x[i]))
names(df)[i+1]<- paste("a",i,sep="")
}
I hope people will give you more compact solutions.
HTH,
Ivan
You can replace the loop with
df <- transform(df, a1 = sapply(df_split, "[[", 1),
a2 = sapply(df_split, "[[", 2))
Peter Ehlers
Le 1/18/2011 16:30, boris pezzatti a ?crit :
Dear all, how can I perform a string operation like strsplit(x," ") on a column of a dataframe, and put the first or the second item of the split into a new dataframe column? (so that on each row it is consistent) Thanks Boris
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110118/0b4770a7/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110118/1d7c8aad/attachment.pl>
Well, my solution with the loop might be slower (even though I don't see
any difference with my system, at least with up to 100 lines and 3
strings to separate), but it works whatever the number of strings.
But I should have renamed the columns outside of the loop:
names(df)[2:3] <- paste("a", 1:2, sep="") ##or a more general solution
for the indexes
Ivan
Le 1/19/2011 01:42, Niels Richard Hansen a ?crit :
On 2011-01-18 08:14, Ivan Calandra wrote:
Hi,
I guess it's not the nicest way to do it, but it should work for you:
#create some sample data
df<- data.frame(a=c("A B", "C D", "A C", "A D", "B D"),
stringsAsFactors=FALSE)
#split the column by space
df_split<- strsplit(df$a, split=" ")
#place the first element into column a1 and the second into a2
for (i in 1:length(df_split[[1]])){
df[i+1]<- unlist(lapply(df_split, FUN=function(x) x[i]))
names(df)[i+1]<- paste("a",i,sep="")
}
I hope people will give you more compact solutions.
HTH,
Ivan
You can replace the loop with
df <- transform(df, a1 = sapply(df_split, "[[", 1),
a2 = sapply(df_split, "[[", 2))
df <- cbind(df, do.call(rbind, df_split) seems to do the same (up to column names) but faster. However, all the solutions rely on there being exactly two strings when you split. The different solutions behave differently if this assumption is violated and none of them really checks this. You can, for instance, check this with all(sapply(df_split, length) == 2) Best, Niels R. Hansen
Peter Ehlers
Le 1/18/2011 16:30, boris pezzatti a ?crit :
Dear all, how can I perform a string operation like strsplit(x," ") on a column of a dataframe, and put the first or the second item of the split into a new dataframe column? (so that on each row it is consistent) Thanks Boris
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. S?ugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calandra at uni-hamburg.de ********** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php