merging or joining 2 dataframes: merge, rbind.fill, etc.?
Hi: The other day I ran 100K simulations, each of which returned a 20 x 4 data frame. I stored these in a list object. When attempting to rbind them into a single large data frame, my first thought was to try plyr: library(plyr) bigD <- ldply(L, rbind) # where L is the list object I quit at around a half hour. Ditto for do.call(rbind, L). [Sorry, I didn't time it - these are approximate times.] I then checked to see if the data.table package could do this, and lo and behold, I discovered the rbindlist() function. When applied to my list object, it ran correctly in under a second. Here's the actual example with some names changed to mask the application: g <- gs[1:100000] # gs is a list of lists
length(g)
[1] 100000
class(g)
[1] "list"
dim(g[[1]])
[1] 20 4
dim(g[[100000]])
[1] 20 4
library(data.table) system.time(bigD <- rbindlist(g))
user system elapsed 0.45 0.02 0.47
dim(bigD)
[1] 2000000 4
class(bigD)
[1] "data.table" "data.frame" Dennis
On Tue, Feb 26, 2013 at 7:05 PM, David Kulp <dkulp at fiksu.com> wrote:
On Feb 26, 2013, at 9:33 PM, Anika Masters <anika.masters at gmail.com> wrote:
Thanks Arun and David. Another issue I am running into are memory issues when one of the data frames I'm trying to rbind to or merge with are "very large". (This is a repetitive problem, as I am trying to merge/rbind thousands of small dataframes into a single "very large" dataframe.) I'm thinking of creating a function that creates an empty dataframe to which I can add data, but will need to first determine and ensure that each dataframe has the exact same columns, in the exact same "location". Before I write any new code, is there any pre-existing functions or code that might solve this problem of "merging small or medium sized dataframes with a "very large" dataframe.)
Consider plyr. Memory issues can be a problem, but it's a piece of cake to write a one liner that iterates over a list of data frames and returns them all rbind'd together. Or just: do.call(rbind, list.of.data.frames). If memory is a serious problem then I think it's best to write your own code that appends each row by index - which avoids copying entire data frames in memory.
On Tue, Feb 26, 2013 at 2:00 PM, David L Carlson <dcarlson at tamu.edu> wrote:
Clumsy but it doesn't require any packages:
merge2 <- function(x, y) {
if(all(union(names(x), names(y)) == intersect(names(x), names(y)))){
rbind(x, y)
} else merge(x, y, all=TRUE)
}
merge2(df1, df2)
df3 <- df1
merge2(df1, df3)
----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
project.org] On Behalf Of arun
Sent: Tuesday, February 26, 2013 1:14 PM
To: Anika Masters
Cc: R help
Subject: Re: [R] merging or joining 2 dataframes: merge, rbind.fill,
etc.?
Hi,
You could also try:
library(gtools)
smartbind(df2,df1)
# a b d
#1 7 99 12
#2 7 99 12
When df1!=df2
smartbind(df1,df2)
# a b d x y c
#1 7 99 12 NA NA NA
#2 NA 34 88 12 44 56
A.K.
----- Original Message -----
From: Anika Masters <anika.masters at gmail.com>
To: r-help at r-project.org
Cc:
Sent: Tuesday, February 26, 2013 1:55 PM
Subject: [R] merging or joining 2 dataframes: merge, rbind.fill, etc.?
#I want to "merge" or "join" 2 dataframes (df1 & df2) into a 3rd
(mydf). I want the 3rd dataframe to contain 1 row for each row in df1
& df2, and all the columns in both df1 & df2. The solution should
"work" even if the 2 dataframes are identical, and even if the 2
dataframes do not have the same column names. The rbind.fill function
seems to work. For learning purposes, are there other "good" ways to
solve this problem, using merge or other functions other than
rbind.fill?
#e.g. These 3 examples all seem to "work" correctly and as I hoped:
df1 <- data.frame(matrix(data=c(7, 99, 12) , nrow=1 , dimnames =
list( NULL , c('a' , 'b' , 'd') ) ) )
df2 <- data.frame(matrix(data=c(88, 34, 12, 44, 56) , nrow=1 ,
dimnames = list( NULL , c('d' , 'b' , 'x' , 'y', 'c') ) ) )
mydf <- merge(df2, df1, all.y=T, all.x=T)
mydf
#e.g. this works:
library(reshape)
mydf <- rbind.fill(df1, df2)
mydf
#This works:
library(reshape)
mydf <- rbind.fill(df1, df2)
mydf
#But this does not (the 2 dataframes are identical)
df1 <- data.frame(matrix(data=c(7, 99, 12) , nrow=1 , dimnames =
list( NULL , c('a' , 'b' , 'd') ) ) )
df2 <- df1
mydf <- merge(df2, df1, all.y=T, all.x=T)
mydf
#Any way to get "mere" to work for this final example? Any other good
solutions?
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.