Merging two data frames with 3 common variables makes duplicated rows
On Fri, 8 May 2009, Rock Ouimet wrote:
I am new to R (ex SAS user) , and I cannot merge two data frames without getting duplicated rows in the results. How to avoid this happening without using the unique() function? 1. First data frame is called "tmv" with 6 variables and 239 rows:
tmv[1:10,]
temps nom prenom sexe dist style 1 01:59:36 Cyr Steve H 45 free 2 02:09:55 Gosselin Erick H 45 free 3 02:12:18 Desfosses Sacha H 45 free 4 02:12:23 Lapointe Sebastien H 45 free 5 02:12:52 Labrie Michel H 45 free 6 02:12:54 Leblanc Michel H 45 free 7 02:13:02 Thibeault Sylvain H 45 free 8 02:13:49 Martel Stephane H 45 free 9 02:14:03 Lavoie Jean-Philippe H 45 free 10 02:14:05 Boivin Jean-Claude H 45 free Its structure is:
str(tmv)
'data.frame': 239 obs. of 6 variables: $ temps :Class 'times' atomic [1:239] 0.0831 0.0902 0.0919 0.0919 0.0923 ... .. ..- attr(*, "format")= chr "h:m:s" $ nom : Factor w/ 167 levels "Aubut","Audy",..: 45 84 55 105 98 110 158 117 109 22 ... $ prenom: Factor w/ 135 levels "Alain","Alexandre",..: 128 33 121 122 93 93 130 126 63 59 ... $ sexe : Factor w/ 2 levels "F","H": 2 2 2 2 2 2 2 2 2 2 ... $ dist : int 45 45 45 45 45 45 45 45 45 45 ... $ style : Factor w/ 2 levels "clas","free": 2 2 2 2 2 2 2 2 2 2 ... 2. The second data frame is called "meil2" with 4 variables and 16 rows;
meil2[1:10,]
dist sexe style meil 1 38 F clas 02:43:17 2 38 F free 02:24:46 3 38 H clas 02:37:36 4 38 H free 01:59:35 5 45 F clas 03:46:15 6 45 F free 02:20:15 7 45 H clas 02:30:07 8 45 H free 01:59:36 9 38 F clas 02:43:17 10 38 F free 02:24:46
Lines 9 and 1 appear to be the same in meil2, as do 2 and 10. If the 16 rows consist of two repeats of 8 rows that would explain why you are getting two copies of each individual in the output. unique(meil2) would have just the distinct rows.
-thomas
Thomas Lumley Assoc. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle