Dear expeRts,
What is the best approach to create a third data frame from two given ones, when
the new/third data frame has last column computed from the last columns of the two given
data frames?
## Okay, sounds complicated, so here is an example. Assume we have the two data frames:
df1 <- data.frame(Year=rep(2001:2010, each=2), Group=c("Group 1","Group 2"), Value=1:20)
df2 <- data.frame(Year=rep(2001:2010, each=2), Group=c("Group 1","Group 2"), Value=21:40)
## To make this a bit more fun, let's say the order of elements is different...
(df1 <- df1[sample(1:nrow(df1)),])
(df2 <- df2[sample(1:nrow(df2)),])
## Now I would like to create a third data frame that has "Year" in column one,
## "Group" in column two, and each entry of column three should consist of the
## corresponding entry in df1 divided by the one in df2.
## To achieve this, one could do:
df3 <- df1[with(df1, order(Year,Group)),]
df3$Value <- df3$Value/df2[with(df2, order(Year,Group)),]$Value
colnames(df3)[3] <- "New Value" # typically, the column name changes
## or one could do:
df3 <- df1[with(df1, order(Year,Group)), -ncol(df1)]
df3 <- cbind(df3, "New Value"=df1[with(df1, order(Year,Group)),]$Value/df2[with(df2, order(Year,Group)),]$Value)
## Is there a more elegant solution? (maybe with ddply?)
## By the way:
df1[,"Value"] # works
df1[,-"Value"] # does not work
## Is there a way to exclude columns by names? that would make the code more readable.
## I know one could use...
subset(df1, select=c("Year","Group"))
## ... but it seems a bit tedious if you have lots of columns to first remove the
## column name that should be dropped and then put the remaining column names in "select"
Cheers,
Marius
Best way/practice to create a new data frame from two given ones with last column computed from the two data frames?
3 messages · Marius Hofert, Daniel Malter
The "problem" with your first solution is that it relies on that the each
'year x group' combination is present in both data frames. To avoid this, I
would recommend to use merge()
df3<-merge(df1,df2,by.x=c("Year","Group"),by.y=c("Year","Group"))
df3$ratio<-with(df3,Value.x/Value.y)
df3
HTH,
Daniel
mhofert wrote:
Dear expeRts,
What is the best approach to create a third data frame from two given
ones, when
the new/third data frame has last column computed from the last columns of
the two given
data frames?
## Okay, sounds complicated, so here is an example. Assume we have the two
data frames:
df1 <- data.frame(Year=rep(2001:2010, each=2), Group=c("Group 1","Group
2"), Value=1:20)
df2 <- data.frame(Year=rep(2001:2010, each=2), Group=c("Group 1","Group
2"), Value=21:40)
## To make this a bit more fun, let's say the order of elements is
different...
(df1 <- df1[sample(1:nrow(df1)),])
(df2 <- df2[sample(1:nrow(df2)),])
## Now I would like to create a third data frame that has "Year" in column
one,
## "Group" in column two, and each entry of column three should consist of
the
## corresponding entry in df1 divided by the one in df2.
## To achieve this, one could do:
df3 <- df1[with(df1, order(Year,Group)),]
df3$Value <- df3$Value/df2[with(df2, order(Year,Group)),]$Value
colnames(df3)[3] <- "New Value" # typically, the column name changes
## or one could do:
df3 <- df1[with(df1, order(Year,Group)), -ncol(df1)]
df3 <- cbind(df3, "New Value"=df1[with(df1,
order(Year,Group)),]$Value/df2[with(df2, order(Year,Group)),]$Value)
## Is there a more elegant solution? (maybe with ddply?)
## By the way:
df1[,"Value"] # works
df1[,-"Value"] # does not work
## Is there a way to exclude columns by names? that would make the code
more readable.
## I know one could use...
subset(df1, select=c("Year","Group"))
## ... but it seems a bit tedious if you have lots of columns to first
remove the
## column name that should be dropped and then put the remaining column
names in "select"
Cheers,
Marius
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- View this message in context: http://r.789695.n4.nabble.com/Best-way-practice-to-create-a-new-data-frame-from-two-given-ones-with-last-column-computed-from-the--tp3753311p3753558.html Sent from the R help mailing list archive at Nabble.com.
Dear all, okay, I found a one liner based on mutate: (df3 <- mutate(df1, Value=Value[order(Year,Group)] / df2[with(df2, order(Year,Group)),"Value"])) Cheers, Marius
On 2011-08-18, at 20:41 , Marius Hofert wrote:
Dear expeRts,
What is the best approach to create a third data frame from two given ones, when
the new/third data frame has last column computed from the last columns of the two given
data frames?
## Okay, sounds complicated, so here is an example. Assume we have the two data frames:
df1 <- data.frame(Year=rep(2001:2010, each=2), Group=c("Group 1","Group 2"), Value=1:20)
df2 <- data.frame(Year=rep(2001:2010, each=2), Group=c("Group 1","Group 2"), Value=21:40)
## To make this a bit more fun, let's say the order of elements is different...
(df1 <- df1[sample(1:nrow(df1)),])
(df2 <- df2[sample(1:nrow(df2)),])
## Now I would like to create a third data frame that has "Year" in column one,
## "Group" in column two, and each entry of column three should consist of the
## corresponding entry in df1 divided by the one in df2.
## To achieve this, one could do:
df3 <- df1[with(df1, order(Year,Group)),]
df3$Value <- df3$Value/df2[with(df2, order(Year,Group)),]$Value
colnames(df3)[3] <- "New Value" # typically, the column name changes
## or one could do:
df3 <- df1[with(df1, order(Year,Group)), -ncol(df1)]
df3 <- cbind(df3, "New Value"=df1[with(df1, order(Year,Group)),]$Value/df2[with(df2, order(Year,Group)),]$Value)
## Is there a more elegant solution? (maybe with ddply?)
## By the way:
df1[,"Value"] # works
df1[,-"Value"] # does not work
## Is there a way to exclude columns by names? that would make the code more readable.
## I know one could use...
subset(df1, select=c("Year","Group"))
## ... but it seems a bit tedious if you have lots of columns to first remove the
## column name that should be dropped and then put the remaining column names in "select"
Cheers,
Marius