merge a list of data frames
I don't really know what you want, but if you have many columns with the same names I am wondering why this is so. Do you really want to merge, which puts all of the non-key columns side-by-side in one data frame? If so, why don't you start by renaming the columns so they will make sense in the combined data frame?
If you really want the column names to stay the same, perhaps you want to stack the data frames "vertically" with rbind?
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.
Sam Steingold <sds at gnu.org> wrote:
I have a list of data frames:
str(data)
List of 4 $ :'data.frame': 700773 obs. of 3 variables: ..$ V1: chr [1:700773] "200130446465779" "200070050127778" "200030633708779" "200010587002779" ... ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ... ..$ V3: num [1:700773] 1 1 1 1 1 ... $ :'data.frame': 700773 obs. of 3 variables: ..$ V1: chr [1:700773] "200130446465779" "200070050127778" "200030633708779" "200010587002779" ... ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ... ..$ V3: num [1:700773] 1 1 1 1 1 ... $ :'data.frame': 700773 obs. of 3 variables: ..$ V1: chr [1:700773] "200130446465779" "200070050127778" "200030633708779" "200010587002779" ... ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ... ..$ V3: num [1:700773] 1 1 1 1 1 ... $ :'data.frame': 700773 obs. of 3 variables: ..$ V1: chr [1:700773] "200160325893778" "200130647544079" "200130446465779" "200120186959078" ... ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ... ..$ V3: num [1:700773] 1 1 1 1 1 1 1 1 1 1 ... I want to merge them. I tried to follow http://rwiki.sciviews.org/doku.php?id=tips%3adata-frames%3amerge and did:
data.1 <- Reduce(function(f1,f2) merge(f1,f2,by=c("V1"),all=TRUE),
data)
Warning message:
In merge.data.frame(f1, f2, by = c("V1"), all = TRUE) :
column names 'V2.x', 'V3.x', 'V2.y', 'V3.y' are duplicated in the
result
str(data.1)
'data.frame': 700773 obs. of 9 variables:
$ V1 : chr "100010000099079" "100010000254078" "100010000499078"
"100010000541779" ...
$ V2.x: int 0 0 0 0 0 0 0 0 0 0 ...
$ V3.x: num 0.476 0.748 0.442 0.483 0.577 ...
$ V2.y: int 0 0 0 0 0 0 0 0 0 0 ...
$ V3.y: num 0.476 0.748 0.442 0.483 0.577 ...
$ V2.x: int 0 0 0 0 0 0 0 0 0 0 ...
$ V3.x: num 0.476 0.752 0.443 0.485 0.578 ...
$ V2.y: int 0 0 0 0 0 0 0 0 0 0 ...
$ V3.y: num 0.47 0.733 0.57 0.416 0.616 ...
I don't like the warning and I don't like that I now have to use [n] to
access identically named columns, but, I guess, this is better than
this:
library('reshape')
data.1 <- merge_all(data,by="V1",all=TRUE)
Error in merge.data.frame(dfs[[1]], Recall(dfs[-1]), all = TRUE, sort = FALSE, : formal argument "all" matched by multiple actual arguments
data.1 <- merge_all(data,by="V1",sort=TRUE,all=TRUE)
Error in merge.data.frame(dfs[[1]], Recall(dfs[-1]), all = TRUE, sort = FALSE, : formal argument "all" matched by multiple actual arguments
data.1 <- merge_all(data,by="V1",sort=TRUE)
Error in merge.data.frame(dfs[[1]], Recall(dfs[-1]), all = TRUE, sort = FALSE, : formal argument "sort" matched by multiple actual arguments
data.1 <- merge_all(data,by="V1")
Error in `[.data.frame`(df, , match(names(dfs[[1]]), names(df))) : undefined columns selected
data.1 <- merge_all(data,by=c("V1"))
Error in `[.data.frame`(df, , match(names(dfs[[1]]), names(df))) : undefined columns selected what does 'formal argument "sort" matched by multiple actual arguments' mean? thanks.