merge function while obviating duplicate columns XXXX

5 messages · Dan Abner, Ista Zahn, Jeff Newmiller +1 more

Original

1

5

Dan Abner

Mon, Mar 11, 2013 12:17 PM #

Hi everyone,

I have the following call to the merge() function. How does one
prevent duplicate columns in the resulting data frame that the 2
parent data frames have in common but are not true key or "by"
variables?


data3<-merge(data1,data2,by="id")
data3

id total.x total.y balance
1 78  78 90
2 91  91 63
3 74  74 57
4 89  89 58
5 90  90 27


In this example, total is not a true key or "by" variable that
uniquely identifies rows suitable for matching purposes, but instead
just happens to be common to both sets.

In reality, I have hundreds for these "in common" variables, so I need
a solution that is tractable for a large number of "in common"
columns.

Thanks!

Dan

Ista Zahn

Mon, Mar 11, 2013 12:25 PM #

On Mon, Mar 11, 2013 at 3:17 PM, Dan Abner <dan.abner99 at gmail.com> wrote:

Well, which one do you want? Or do you want to exclude total from the result?

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Dan Abner

Mon, Mar 11, 2013 2:02 PM #

Ok, let's say I only want the common columns from data1. Is there a
succinct way of doing this for potentially hundreds of "in common"
columns?

On Mon, Mar 11, 2013 at 3:25 PM, Ista Zahn <istazahn at gmail.com> wrote:

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Mon, Mar 11, 2013 2:19 PM #

intersect(names(data1),names(data2))
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

Dan Abner <dan.abner99 at gmail.com> wrote:

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Mon, Mar 11, 2013 2:24 PM #

You can use the set-oriented functions setdiff(), union(), and intersect().
E.g., setdiff(colnames(data2), colnames(data1)) gives the names of columns
of data2 that are not  names of columns of data1.  The following might be
what you want
    merge(data1, data2[, c("id", setdiff(colnames(data2),colnames(data1)))], by="id")
You didn't give an example of the data nor the desired result so I made some up:
   > data1 <- data.frame(id=c(1,1,2,3), Name=c("Joe","Joe","Ken","Leo"))
   > data2 <- data.frame(id=c(2,3), Name=c("Melody","Nell"), Age=c(45,49))
   > merge(data1, data2, by="id")
     id Name.x Name.y Age
   1  2    Ken Melody  45
   2  3    Leo   Nell  49
   > merge(data1, data2[, c("id", setdiff(colnames(data2),colnames(data1)))], by="id")
     id Name Age
   1  2  Ken  45
   2  3  Leo  49

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.