Merging two data frames with 3 common variables makes duplicated rows

Thomas,

You are very clever! The "meil2" data frame has twice the common variable
combinations:
meil2
dist sexe style     meil
1    38    F  clas 02:43:17
2    38    F  free 02:24:46
3    38    H  clas 02:37:36
4    38    H  free 01:59:35
5    45    F  clas 03:46:15
6    45    F  free 02:20:15
7    45    H  clas 02:30:07
8    45    H  free 01:59:36
9    38    F  clas 02:43:17
10   38    F  free 02:24:46
11   38    H  clas 02:37:36
12   38    H  free 01:59:35
13   45    F  clas 03:46:15
14   45    F  free 02:20:15
15   45    H  clas 02:30:07
16   45    H  free 01:59:36

Keeping unique combinations merged correctly with the next data frame. This
merge() function is more subtle than I first thought. That means when
merging two data frames, if the resulting data frame has more rows than
either former data frames, it means that there are duplicate combinations of
the common variables in either or the two data frames.

Thank you very much, I will try to be more careful about this.

Rock
On Fri, 8 May 2009, Rock Ouimet wrote:

I am new to R (ex SAS user) , and I cannot merge two data frames without
getting duplicated rows in the results. How to avoid this happening
without
using the unique() function?

1. First data frame is called "tmv" with 6 variables and 239 rows:

tmv[1:10,]
     temps       nom        prenom sexe dist style
1  01:59:36       Cyr         Steve    H   45  free
2  02:09:55  Gosselin         Erick    H   45  free
3  02:12:18 Desfosses         Sacha    H   45  free
4  02:12:23  Lapointe     Sebastien    H   45  free
5  02:12:52    Labrie        Michel    H   45  free
6  02:12:54   Leblanc        Michel    H   45  free
7  02:13:02 Thibeault       Sylvain    H   45  free
8  02:13:49    Martel      Stephane    H   45  free
9  02:14:03    Lavoie Jean-Philippe    H   45  free
10 02:14:05    Boivin   Jean-Claude    H   45  free

Its structure is:
str(tmv)
'data.frame':   239 obs. of  6 variables:
$ temps :Class 'times'  atomic [1:239] 0.0831 0.0902 0.0919 0.0919 0.0923
...
 .. ..- attr(*, "format")= chr "h:m:s"
$ nom   : Factor w/ 167 levels "Aubut","Audy",..: 45 84 55 105 98 110 158
117 109 22 ...
$ prenom: Factor w/ 135 levels "Alain","Alexandre",..: 128 33 121 122 93
93
130 126 63 59 ...
$ sexe  : Factor w/ 2 levels "F","H": 2 2 2 2 2 2 2 2 2 2 ...
$ dist  : int  45 45 45 45 45 45 45 45 45 45 ...
$ style : Factor w/ 2 levels "clas","free": 2 2 2 2 2 2 2 2 2 2 ...

2. The second data frame is called "meil2" with 4 variables and 16 rows;
meil2[1:10,]
  dist sexe style     meil
1    38    F  clas 02:43:17
2    38    F  free 02:24:46
3    38    H  clas 02:37:36
4    38    H  free 01:59:35
5    45    F  clas 03:46:15
6    45    F  free 02:20:15
7    45    H  clas 02:30:07
8    45    H  free 01:59:36
9    38    F  clas 02:43:17
10   38    F  free 02:24:46

Lines 9 and 1 appear to be the same in meil2, as do 2 and 10.  If the 16
rows consist of two repeats of 8 rows that would explain why you are
getting two copies of each individual in the output. unique(meil2) would
have just the distinct rows.

      -thomas

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

View this message in context: http://www.nabble.com/Merging-two-data-frames-with-3-common-variables-makes-duplicated-rows-tp23454018p23459790.html
Sent from the R help mailing list archive at Nabble.com.

Merging two data frames with 3 common variables makes duplicated rows

Thread (3 messages)