Skip to content
Prev 351222 / 398502 Next

Problem merging data frames and duplicates

Hello All,

I am attempting to merge two data frames that naturally contain duplicate
entries, however when using either merge or dMerge I get even more
duplicates.

For example:

data.frame(state=c("IL", "IL", "LA","LA", "MS","MS", "AR", "AR"),
locus=c(1,1,2,2,3,4,5,6))->x

data.frame(state=c("IL", "IL", "AR", "AR", "TN","TN"),
locus=c(1,1,2,3,3,4,4))->y

These yield:

 x
  state locus
1    IL     1
2    IL     1
3    LA     2
4    LA     2
5    MS     3
6    MS     4
7    AR     5
8    AR     6

y
  state locus
1    IL     1
2    IL     1
3    AR     2
4    AR     3
5    TN     3
6    TN     4

However, when merged by "state: they produce another second set of AR and
IL:
merge(x,y,by=1,all=T)->z

   state locus.x locus.y
1     AR       5       2
2     AR       5       3
3     AR       6       2
4     AR       6       3
5     IL       1       1
6     IL       1       1
7     IL       1       1
8     IL       1       1
9     LA       2      NA
10    LA       2      NA
11    MS       3      NA
12    MS       4      NA
13    TN      NA       3
14    TN      NA       4

While, the NAs are desired when states are missing, I don't want the extra
duplicated states and values but rather:

   state locus.x locus.y
1     AR       5       2
4     AR       6       3
7     IL       1       1
8     IL       1       1
9     LA       2      NA
10    LA       2      NA
11    MS       3      NA
12    MS       4      NA
13    TN      NA       3
14    TN      NA       4

Any help would be much appreciated.

Thanks!

Frank