Skip to content

Merging two data frames with 3 common variables makes duplicated rows

3 messages · Rock Ouimet, Thomas Lumley, Rocko22

#
On Fri, 8 May 2009, Rock Ouimet wrote:

            
Lines 9 and 1 appear to be the same in meil2, as do 2 and 10.  If the 16 rows consist of two repeats of 8 rows that would explain why you are getting two copies of each individual in the output. unique(meil2) would have just the distinct rows.

      -thomas

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle
#
Thomas,

You are very clever! The "meil2" data frame has twice the common variable
combinations:
dist sexe style     meil
1    38    F  clas 02:43:17
2    38    F  free 02:24:46
3    38    H  clas 02:37:36
4    38    H  free 01:59:35
5    45    F  clas 03:46:15
6    45    F  free 02:20:15
7    45    H  clas 02:30:07
8    45    H  free 01:59:36
9    38    F  clas 02:43:17
10   38    F  free 02:24:46
11   38    H  clas 02:37:36
12   38    H  free 01:59:35
13   45    F  clas 03:46:15
14   45    F  free 02:20:15
15   45    H  clas 02:30:07
16   45    H  free 01:59:36

Keeping unique combinations merged correctly with the next data frame. This
merge() function is more subtle than I first thought. That means when
merging two data frames, if the resulting data frame has more rows than
either former data frames, it means that there are duplicate combinations of
the common variables in either or the two data frames.

Thank you very much, I will try to be more careful about this.

Rock
Thomas Lumley wrote: