merge a list of data frames

I have a list of data frames:
str(data)
List of 4
 $ :'data.frame':	700773 obs. of  3 variables:
  ..$ V1: chr [1:700773] "200130446465779" "200070050127778" "200030633708779" "200010587002779" ...
  ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
  ..$ V3: num [1:700773] 1 1 1 1 1 ...
 $ :'data.frame':	700773 obs. of  3 variables:
  ..$ V1: chr [1:700773] "200130446465779" "200070050127778" "200030633708779" "200010587002779" ...
  ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
  ..$ V3: num [1:700773] 1 1 1 1 1 ...
 $ :'data.frame':	700773 obs. of  3 variables:
  ..$ V1: chr [1:700773] "200130446465779" "200070050127778" "200030633708779" "200010587002779" ...
  ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
  ..$ V3: num [1:700773] 1 1 1 1 1 ...
 $ :'data.frame':	700773 obs. of  3 variables:
  ..$ V1: chr [1:700773] "200160325893778" "200130647544079" "200130446465779" "200120186959078" ...
  ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
  ..$ V3: num [1:700773] 1 1 1 1 1 1 1 1 1 1 ...

I want to merge them.
I tried to follow
http://rwiki.sciviews.org/doku.php?id=tips%3adata-frames%3amerge
and did:
data.1 <- Reduce(function(f1,f2) merge(f1,f2,by=c("V1"),all=TRUE), data)
Warning message:
In merge.data.frame(f1, f2, by = c("V1"), all = TRUE) :
  column names 'V2.x', 'V3.x', 'V2.y', 'V3.y' are duplicated in the result
str(data.1)
'data.frame':	700773 obs. of  9 variables:
 $ V1  : chr  "100010000099079" "100010000254078" "100010000499078" "100010000541779" ...
 $ V2.x: int  0 0 0 0 0 0 0 0 0 0 ...
 $ V3.x: num  0.476 0.748 0.442 0.483 0.577 ...
 $ V2.y: int  0 0 0 0 0 0 0 0 0 0 ...
 $ V3.y: num  0.476 0.748 0.442 0.483 0.577 ...
 $ V2.x: int  0 0 0 0 0 0 0 0 0 0 ...
 $ V3.x: num  0.476 0.752 0.443 0.485 0.578 ...
 $ V2.y: int  0 0 0 0 0 0 0 0 0 0 ...
 $ V3.y: num  0.47 0.733 0.57 0.416 0.616 ...

I don't like the warning and I don't like that I now have to use [n] to
access identically named columns, but, I guess, this is better than
this:

library('reshape')
data.1 <- merge_all(data,by="V1",all=TRUE)
Error in merge.data.frame(dfs[[1]], Recall(dfs[-1]), all = TRUE, sort = FALSE,  : 
  formal argument "all" matched by multiple actual arguments
data.1 <- merge_all(data,by="V1",sort=TRUE,all=TRUE)
Error in merge.data.frame(dfs[[1]], Recall(dfs[-1]), all = TRUE, sort = FALSE,  : 
  formal argument "all" matched by multiple actual arguments
data.1 <- merge_all(data,by="V1",sort=TRUE)
Error in merge.data.frame(dfs[[1]], Recall(dfs[-1]), all = TRUE, sort = FALSE,  : 
  formal argument "sort" matched by multiple actual arguments
data.1 <- merge_all(data,by="V1")
Error in `[.data.frame`(df, , match(names(dfs[[1]]), names(df))) : 
  undefined columns selected
data.1 <- merge_all(data,by=c("V1"))
Error in `[.data.frame`(df, , match(names(dfs[[1]]), names(df))) : 
  undefined columns selected

what does 'formal argument "sort" matched by multiple actual arguments' mean?

thanks.
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://ffii.org http://pmw.org.il
http://dhimmi.com http://palestinefacts.org http://iris.org.il
I just forgot my whole philosophy of life!!!

I have a list of data frames:

str(data)
List of 4
$ :'data.frame':	700773 obs. of  3 variables:
 ..$ V1: chr [1:700773] "200130446465779" "200070050127778" "200030633708779" "200010587002779" ...
 ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
 ..$ V3: num [1:700773] 1 1 1 1 1 ...
$ :'data.frame':	700773 obs. of  3 variables:
 ..$ V1: chr [1:700773] "200130446465779" "200070050127778" "200030633708779" "200010587002779" ...
 ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
 ..$ V3: num [1:700773] 1 1 1 1 1 ...
$ :'data.frame':	700773 obs. of  3 variables:
 ..$ V1: chr [1:700773] "200130446465779" "200070050127778" "200030633708779" "200010587002779" ...
 ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
 ..$ V3: num [1:700773] 1 1 1 1 1 ...
$ :'data.frame':	700773 obs. of  3 variables:
 ..$ V1: chr [1:700773] "200160325893778" "200130647544079" "200130446465779" "200120186959078" ...
 ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
 ..$ V3: num [1:700773] 1 1 1 1 1 1 1 1 1 1 ...

I want to merge them.
Why? What are you expecting?
I tried to follow
http://rwiki.sciviews.org/doku.php?id=tips%3adata-frames%3amerge
and did:

data.1 <- Reduce(function(f1,f2) merge(f1,f2,by=c("V1"),all=TRUE), data)
Warning message:
In merge.data.frame(f1, f2, by = c("V1"), all = TRUE) :
 column names 'V2.x', 'V3.x', 'V2.y', 'V3.y' are duplicated in the result
str(data.1)
'data.frame':	700773 obs. of  9 variables:
$ V1  : chr  "100010000099079" "100010000254078" "100010000499078" "100010000541779" ...
$ V2.x: int  0 0 0 0 0 0 0 0 0 0 ...
$ V3.x: num  0.476 0.748 0.442 0.483 0.577 ...
$ V2.y: int  0 0 0 0 0 0 0 0 0 0 ...
$ V3.y: num  0.476 0.748 0.442 0.483 0.577 ...
$ V2.x: int  0 0 0 0 0 0 0 0 0 0 ...
$ V3.x: num  0.476 0.752 0.443 0.485 0.578 ...
$ V2.y: int  0 0 0 0 0 0 0 0 0 0 ...
$ V3.y: num  0.47 0.733 0.57 0.416 0.616 ...

I don't like the warning and I don't like that I now have to use [n] to
access identically named columns,
Perhaps it would make more sense to explain what your goal was, rather than showing us two divergent efforts, neither of which is satisfactory? Perhaps?
David.

> but, I guess, this is better than
> this:
> 
> library('reshape')
> 
>> data.1 <- merge_all(data,by="V1",all=TRUE)
> Error in merge.data.frame(dfs[[1]], Recall(dfs[-1]), all = TRUE, sort = FALSE,  : 
>  formal argument "all" matched by multiple actual arguments
>> data.1 <- merge_all(data,by="V1",sort=TRUE,all=TRUE)
> Error in merge.data.frame(dfs[[1]], Recall(dfs[-1]), all = TRUE, sort = FALSE,  : 
>  formal argument "all" matched by multiple actual arguments
>> data.1 <- merge_all(data,by="V1",sort=TRUE)
> Error in merge.data.frame(dfs[[1]], Recall(dfs[-1]), all = TRUE, sort = FALSE,  : 
>  formal argument "sort" matched by multiple actual arguments
>> data.1 <- merge_all(data,by="V1")
> Error in `[.data.frame`(df, , match(names(dfs[[1]]), names(df))) : 
>  undefined columns selected
>> data.1 <- merge_all(data,by=c("V1"))
> Error in `[.data.frame`(df, , match(names(dfs[[1]]), names(df))) : 
>  undefined columns selected
> 
> what does 'formal argument "sort" matched by multiple actual arguments' mean?
> 
> thanks.
> 
> -- 
> Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
> http://www.childpsy.net/ http://ffii.org http://pmw.org.il
> http://dhimmi.com http://palestinefacts.org http://iris.org.il
> I just forgot my whole philosophy of life!!!
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Alameda, CA, USA
I don't really know what you want, but if you have many columns with the same names I am wondering why this is so.  Do you really want to merge, which puts all of the non-key columns side-by-side in one data frame? If so, why don't you start by renaming the columns so they will make sense in the combined data frame?

If you really want the column names to stay the same, perhaps you want to stack the data frames "vertically" with rbind?
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

I have a list of data frames:

str(data)
List of 4
$ :'data.frame':	700773 obs. of  3 variables:
..$ V1: chr [1:700773] "200130446465779" "200070050127778"
"200030633708779" "200010587002779" ...
 ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
 ..$ V3: num [1:700773] 1 1 1 1 1 ...
$ :'data.frame':	700773 obs. of  3 variables:
..$ V1: chr [1:700773] "200130446465779" "200070050127778"
"200030633708779" "200010587002779" ...
 ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
 ..$ V3: num [1:700773] 1 1 1 1 1 ...
$ :'data.frame':	700773 obs. of  3 variables:
..$ V1: chr [1:700773] "200130446465779" "200070050127778"
"200030633708779" "200010587002779" ...
 ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
 ..$ V3: num [1:700773] 1 1 1 1 1 ...
$ :'data.frame':	700773 obs. of  3 variables:
..$ V1: chr [1:700773] "200160325893778" "200130647544079"
"200130446465779" "200120186959078" ...
 ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
 ..$ V3: num [1:700773] 1 1 1 1 1 1 1 1 1 1 ...

I want to merge them.
I tried to follow
http://rwiki.sciviews.org/doku.php?id=tips%3adata-frames%3amerge
and did:

data.1 <- Reduce(function(f1,f2) merge(f1,f2,by=c("V1"),all=TRUE),
data)
Warning message:
In merge.data.frame(f1, f2, by = c("V1"), all = TRUE) :
column names 'V2.x', 'V3.x', 'V2.y', 'V3.y' are duplicated in the
result
str(data.1)
'data.frame':	700773 obs. of  9 variables:
$ V1  : chr  "100010000099079" "100010000254078" "100010000499078"
"100010000541779" ...
$ V2.x: int  0 0 0 0 0 0 0 0 0 0 ...
$ V3.x: num  0.476 0.748 0.442 0.483 0.577 ...
$ V2.y: int  0 0 0 0 0 0 0 0 0 0 ...
$ V3.y: num  0.476 0.748 0.442 0.483 0.577 ...
$ V2.x: int  0 0 0 0 0 0 0 0 0 0 ...
$ V3.x: num  0.476 0.752 0.443 0.485 0.578 ...
$ V2.y: int  0 0 0 0 0 0 0 0 0 0 ...
$ V3.y: num  0.47 0.733 0.57 0.416 0.616 ...

I don't like the warning and I don't like that I now have to use [n] to
access identically named columns, but, I guess, this is better than
this:

library('reshape')

data.1 <- merge_all(data,by="V1",all=TRUE)
Error in merge.data.frame(dfs[[1]], Recall(dfs[-1]), all = TRUE, sort =
FALSE,  : 
 formal argument "all" matched by multiple actual arguments
data.1 <- merge_all(data,by="V1",sort=TRUE,all=TRUE)
Error in merge.data.frame(dfs[[1]], Recall(dfs[-1]), all = TRUE, sort =
FALSE,  : 
 formal argument "all" matched by multiple actual arguments
data.1 <- merge_all(data,by="V1",sort=TRUE)
Error in merge.data.frame(dfs[[1]], Recall(dfs[-1]), all = TRUE, sort =
FALSE,  : 
 formal argument "sort" matched by multiple actual arguments
data.1 <- merge_all(data,by="V1")
Error in `[.data.frame`(df, , match(names(dfs[[1]]), names(df))) : 
 undefined columns selected
data.1 <- merge_all(data,by=c("V1"))
Error in `[.data.frame`(df, , match(names(dfs[[1]]), names(df))) : 
 undefined columns selected

what does 'formal argument "sort" matched by multiple actual arguments'
mean?

thanks.
* David Winsemius <qjvafrzvhf at pbzpnfg.arg> [2012-09-05 21:02:16 -0700]:

On Sep 5, 2012, at 8:51 PM, Sam Steingold wrote:

I have a list of data frames:

str(data)
List of 4
$ :'data.frame':	700773 obs. of  3 variables:
 ..$ V1: chr [1:700773] "200130446465779" "200070050127778"
"200030633708779" "200010587002779" ...
 ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
 ..$ V3: num [1:700773] 1 1 1 1 1 ...
$ :'data.frame':	700773 obs. of  3 variables:
 ..$ V1: chr [1:700773] "200130446465779" "200070050127778"
"200030633708779" "200010587002779" ...
 ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
 ..$ V3: num [1:700773] 1 1 1 1 1 ...
$ :'data.frame':	700773 obs. of  3 variables:
 ..$ V1: chr [1:700773] "200130446465779" "200070050127778"
"200030633708779" "200010587002779" ...
 ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
 ..$ V3: num [1:700773] 1 1 1 1 1 ...
$ :'data.frame':	700773 obs. of  3 variables:
 ..$ V1: chr [1:700773] "200160325893778" "200130647544079"
"200130446465779" "200120186959078" ...
 ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
 ..$ V3: num [1:700773] 1 1 1 1 1 1 1 1 1 1 ...

I want to merge them.
Why? What are you expecting?
these are the results of applying a model to the test data.
the first column is the ID
the second column is the actual value
the third column is the model score

after I will merge the frames, I will
1. check that all the V2 columns are identical and drop all but one
(I guess I could just merge on c("V1","V2") instead, right?)

2. compute the sum (or the mean, whatever is easier) of all the V3
columns

3. sort by the sum/mean of the V3 columns and evaluate the combined
model using the lift quality metric
(http://dl.acm.org/citation.cfm?id=380995.381018)

I have many more score files (not just 4), so it is not practical for me
to rename the column to something unique.
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://www.memritv.org http://truepeace.org
http://jihadwatch.org http://mideasttruth.com http://americancensorship.org
To be popular with ladies one has to be smart, handsome & rich. Or to be a cat.
"not practical [...] to rename the column to something unique."

On the contrary, since R is a scripting language it is quite practical. Depending on the format of your data, it is probably necessary as well.

If all of the files have exactly the same number of rows corresponding to the same key values, you may be able to load each file and cbind the V3 columns into a matrix. That is the only scenario I see where you don't need to rename the columns.

If you do need the key alignment capability that merge offers, then it may be most effective to load the data into a list of data frames, add a column to each data frame containing the desired column name (perhaps derived from the file name the data were loaded from, or just a letter with sequential numbers at the end), stack the data frames into a single frame (rbind or ldply from the plyr package), and then use the reshape2 package dcast function to form the wide, combined data frame with the necessary unique column names against your key c(V1,V2).

Depending on the algorithm you plan to use, you may not need or want to do the dcast step at all. The plyr package or the sqldf package or the ave base function can let you combine computations on groups of rows instead of on columns.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

* David Winsemius <qjvafrzvhf at pbzpnfg.arg> [2012-09-05 21:02:16
-0700]:
On Sep 5, 2012, at 8:51 PM, Sam Steingold wrote:

I have a list of data frames:

str(data)
List of 4
$ :'data.frame':	700773 obs. of  3 variables:
 ..$ V1: chr [1:700773] "200130446465779" "200070050127778"
"200030633708779" "200010587002779" ...
 ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
 ..$ V3: num [1:700773] 1 1 1 1 1 ...
$ :'data.frame':	700773 obs. of  3 variables:
 ..$ V1: chr [1:700773] "200130446465779" "200070050127778"
"200030633708779" "200010587002779" ...
 ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
 ..$ V3: num [1:700773] 1 1 1 1 1 ...
$ :'data.frame':	700773 obs. of  3 variables:
 ..$ V1: chr [1:700773] "200130446465779" "200070050127778"
"200030633708779" "200010587002779" ...
 ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
 ..$ V3: num [1:700773] 1 1 1 1 1 ...
$ :'data.frame':	700773 obs. of  3 variables:
 ..$ V1: chr [1:700773] "200160325893778" "200130647544079"
"200130446465779" "200120186959078" ...
 ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
 ..$ V3: num [1:700773] 1 1 1 1 1 1 1 1 1 1 ...

I want to merge them.
Why? What are you expecting?
these are the results of applying a model to the test data.
the first column is the ID
the second column is the actual value
the third column is the model score

after I will merge the frames, I will
1. check that all the V2 columns are identical and drop all but one
(I guess I could just merge on c("V1","V2") instead, right?)

2. compute the sum (or the mean, whatever is easier) of all the V3
columns

3. sort by the sum/mean of the V3 columns and evaluate the combined
model using the lift quality metric
(http://dl.acm.org/citation.cfm?id=380995.381018)

I have many more score files (not just 4), so it is not practical for
me
to rename the column to something unique.
Hi
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
project.org] On Behalf Of Sam Steingold
Sent: Thursday, September 06, 2012 3:43 PM
To: David Winsemius
Cc: r-help at r-project.org
Subject: Re: [R] merge a list of data frames

* David Winsemius <qjvafrzvhf at pbzpnfg.arg> [2012-09-05 21:02:16 -
0700]:
On Sep 5, 2012, at 8:51 PM, Sam Steingold wrote:

I have a list of data frames:

str(data)
List of 4
$ :'data.frame':	700773 obs. of  3 variables:
 ..$ V1: chr [1:700773] "200130446465779" "200070050127778"
"200030633708779" "200010587002779" ...
 ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
 ..$ V3: num [1:700773] 1 1 1 1 1 ...
$ :'data.frame':	700773 obs. of  3 variables:
 ..$ V1: chr [1:700773] "200130446465779" "200070050127778"
"200030633708779" "200010587002779" ...
 ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
 ..$ V3: num [1:700773] 1 1 1 1 1 ...
$ :'data.frame':	700773 obs. of  3 variables:
 ..$ V1: chr [1:700773] "200130446465779" "200070050127778"
"200030633708779" "200010587002779" ...
 ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
 ..$ V3: num [1:700773] 1 1 1 1 1 ...
$ :'data.frame':	700773 obs. of  3 variables:
 ..$ V1: chr [1:700773] "200160325893778" "200130647544079"
"200130446465779" "200120186959078" ...
 ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
 ..$ V3: num [1:700773] 1 1 1 1 1 1 1 1 1 1 ...

I want to merge them.
Why? What are you expecting?
these are the results of applying a model to the test data.
the first column is the ID
the second column is the actual value
the third column is the model score

after I will merge the frames, I will
1. check that all the V2 columns are identical and drop all but one (I
guess I could just merge on c("V1","V2") instead, right?)
colSums(apply(do.call(cbind,lapply(data, "[", "V2")),1,diff)!=0)

shall give you 0 if there is no difference
2. compute the sum (or the mean, whatever is easier) of all the V3
columns
sapply(lapply(data, "[", "V3"), sum)
sapply(lapply(data, "[", "V3"), mean)

shall give you table of means or sums. Sorting them is straightforward

The most tedious part of my response was to prepare toy data. So please, maybe you shall be kind to us to provide them by an appropriate way

dput(header(data))

Regards
Petr
3. sort by the sum/mean of the V3 columns and evaluate the combined
model using the lift quality metric
(http://dl.acm.org/citation.cfm?id=380995.381018)

I have many more score files (not just 4), so it is not practical for
me to rename the column to something unique.

--
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X
11.0.11103000 http://www.childpsy.net/ http://www.memritv.org
http://truepeace.org http://jihadwatch.org http://mideasttruth.com
http://americancensorship.org To be popular with ladies one has to be
smart, handsome & rich. Or to be a cat.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.

* David Winsemius <qjvafrzvhf at pbzpnfg.arg> [2012-09-05 21:02:16 -0700]:

On Sep 5, 2012, at 8:51 PM, Sam Steingold wrote:

I have a list of data frames:

str(data)
List of 4
$ :'data.frame':	700773 obs. of  3 variables:
..$ V1: chr [1:700773] "200130446465779" "200070050127778"
"200030633708779" "200010587002779" ...
..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
..$ V3: num [1:700773] 1 1 1 1 1 ...
$ :'data.frame':	700773 obs. of  3 variables:
..$ V1: chr [1:700773] "200130446465779" "200070050127778"
"200030633708779" "200010587002779" ...
..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
..$ V3: num [1:700773] 1 1 1 1 1 ...
$ :'data.frame':	700773 obs. of  3 variables:
..$ V1: chr [1:700773] "200130446465779" "200070050127778"
"200030633708779" "200010587002779" ...
..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
..$ V3: num [1:700773] 1 1 1 1 1 ...
$ :'data.frame':	700773 obs. of  3 variables:
..$ V1: chr [1:700773] "200160325893778" "200130647544079"
"200130446465779" "200120186959078" ...
..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
..$ V3: num [1:700773] 1 1 1 1 1 1 1 1 1 1 ...

I want to merge them.
Why? What are you expecting?
these are the results of applying a model to the test data.
the first column is the ID
In which case you should be using the 'by' argument to `merge` and specifying that it is only the first column that is to be used. In the curent situation merge will attempt to use all of the columns because the three columns have the same names.

Notice the difference in these results:
merge( data.frame(a=1:3, b=5:7), data.frame(a=1:3, b=10:12) )
[1] a b
<0 rows> (or 0-length row.names)
merge( data.frame(a=1:3, b=5:7), data.frame(a=1:3, b=10:12) , by=1)
a b.x b.y
1 1   5  10
2 2   6  11
3 3   7  12

(`merge` "by" arguments can be column numbers.)
the second column is the actual value
the third column is the model score

after I will merge the frames, I will
1. check that all the V2 columns are identical and drop all but one
(I guess I could just merge on c("V1","V2") instead, right?)
Depends what you want. I already suggested you only what to merge on the id column.
2. compute the sum (or the mean, whatever is easier) of all the V3
columns
`aggregate should do that without difficulty.
3. sort by the sum/mean of the V3 columns and evaluate the combined
model using the lift quality metric
(http://dl.acm.org/citation.cfm?id=380995.381018)
That's going to require more background (or more money since they want $15.00 for a pdf.
I have many more score files (not just 4), so it is not practical for me
to rename the column to something unique.
Which column?

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://www.memritv.org http://truepeace.org
http://jihadwatch.org http://mideasttruth.com http://americancensorship.org
To be popular with ladies one has to be smart, handsome & rich. Or to be a cat.
David Winsemius, MD
Alameda, CA, USA
* David Winsemius <qjvafrzvhf at pbzpnfg.arg> [2012-09-06 10:30:16 -0700]:

these are the results of applying a model to the test data.
the first column is the ID
In which case you should be using the 'by' argument to `merge`
I already do! see my initial message!
3. sort by the sum/mean of the V3 columns and evaluate the combined
model using the lift quality metric
(http://dl.acm.org/citation.cfm?id=380995.381018)
That's going to require more background (or more money since they want $15.00 for a pdf.
:-)
that I have already implemented, works just fine:

proficiency <- function (actual, prediction) {
  proficiency1(ea = entropy(table(actual)),
               ep = entropy(table(prediction)),
               ej = entropy(table(actual,prediction)))
}

proficiency1 <- function (ea, ep, ej) {
  mi <- ea + ep - ej
  list(joint = ej, actual = ea, prediction = ep, mutual = mi,
       proficiency = mi / ea, dependency = mi / ej)
}

detector.statistics <- function (tp,fn,fp,tn) {
  observationCount <- tp + fn + fp + tn
  predictedPositive <- tp + fp
  predictedNegative <- fn + tn
  actualPositive <- tp + fn
  actualNegative <- fp + tn
  correct <- tp + tn
  list(baseRate = actualPositive / observationCount,
       precision = if (tp == 0) 0 else tp / predictedPositive,
       specificity = if (tn == 0) 0 else tn / actualNegative,
       recall = if (tp == 0) 0 else tp / actualPositive,
       accuracy = correct / observationCount,
       lift = (tp * observationCount) / (predictedPositive * actualPositive),
       f1score = if (tp == 0) 0 else 2 * tp / (2 * tp + fp + fn),
       proficiency = proficiency1(ej = entropy(c(tp,fn,fp,tn)),
         ea = entropy(c(actualPositive,actualNegative)),
         ep = entropy(c(predictedPositive,predictedNegative))))
}

## v should be vector of 0&1 sorted according to some model
## Gregory Piatetsky-Shapiro, Samuel Steingold
## "Measuring Lift Quality in Database Marketing"
## http://sds.podval.org/data/l-quality.pdf
## http://www.sigkdd.org/explorations/issues/2-2-2000-12/piatetsky-shapiro.pdf
## SIGKDD Explorations, Vol. 2:2, (2000), 81-86
## tests: lift.quality(rbinom(10000,size=1,prob=0.1)) ==> ~0
##        lift.quality(rev(round((1:10000)/12000))) ==> 1
lift.quality <- function (v, plot = TRUE, file = NULL, main = "lift curve", thresholds = NULL) {
  target.count <- sum(v)
  total.count <- length(v)
  base.rate <- target.count / total.count
  target.level <- cumsum(v)/target.count
  lq <- ((2*sum(target.level) - 1)/total.count - 1) / (1 - base.rate)
  if (plot) {
    if (!is.null(file)) {
      pdf(file = file)
      on.exit(dev.off())
    }
    plot(x=(1:total.count)/total.count,y=target.level,type="l",
         main=paste(main,"( lift quality ",lq,")"),
         xlab="% cutoff",ylab="cumulative % hit")
  }
  if (is.null(thresholds)) thresholds = c(base.rate)
  list(lift.quality = lq,
       detector.statistics = sapply(thresholds, function (l) {
         cutoff <- round(l * total.count)
         tp <- round(target.level[cutoff] * target.count) # = sum(v[1:cutoff])
         fn <- target.count - tp
         fp <- cutoff - tp
         tn <- total.count - target.count - cutoff + tp
         detector.statistics(tp, fn, fp, tn)
       }))
}
I have many more score files (not just 4), so it is not practical for me
to rename the column to something unique.
Which column?
the 3rd ("score") column.

Meanwhile I realised that the fastest way is actuall shell:
sort+cut+paste produced the csv file which can be loaded into R much
faster than the individual score files, so this issue is now purely
academic.  However, I appreciate the replies I got so far, it was quite
educational, thanks!
(I also appreciate comments on the code above)
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://www.memritv.org http://truepeace.org
http://openvotingconsortium.org http://ffii.org http://mideasttruth.com
Save your burned out bulbs for me, I'm building my own dark room.