Skip to content

Merging Issue

2 messages · Farnoosh Sheikhi, jim holtman

#
Hi all,?
I have two data sets similar like below and wanted to merge them with variable "deps".?As this is a sample data with small sample size, I don't have any problem using command merge.?However, the actual data set has ~60,000 observations with a lot of repeated measures. For example, for a given ID I have 100 different dates and groups. Thee problem is using "merge" command gives me a lot of duplicates that I can't even track.?I was wondering if there is any other way to merge such a data.Any help is appreciated. Thanks.
## Data ASubject<- c("2", "2", "2", "3", "3", "3", "4", "4", "5", "5", "5", "5")dates<-seq(as.Date('2011-01-01'),as.Date('2011-01-12'),by = 1)?deps<-c("A", "B", "C", "C", "D", "A", "F", "G", "A", "F", "A", "D")df <- data.frame(Subject, dates, deps)
## Data Bloc<-c("CA","NY", "CA", "NY", "WA", "WA")grp<-c("DE", "OC", "DE", "OT", "DE", "OC")deps<-c("A","B","C", "D", "F","G")df2<-data.frame(loc, grp, deps )
dat<-merge(df, df2, by="deps")
?
#
Don't use HTML on sending email- messes up the data.

What do you mean that you get lots of duplicates?  If you have duplicated
entries in df2 this will lead to dups because of the way merge works (here
is the help file):

 If there is more than one match, all possible matches contribute
     one row each.  For the precise meaning of ?match?, see ?match?.

So you need to define the problem that you want to solve in going the
merge.  Here is what happens in your data if I duplicate some entries in
df2; is this what you are seeing:
deps Subject      dates loc grp
1     A       2 2011-01-01  CA  DE
2     A       2 2011-01-01  yy  xx
3     A       3 2011-01-06  CA  DE
4     A       3 2011-01-06  yy  xx
5     A       5 2011-01-11  CA  DE
6     A       5 2011-01-11  yy  xx
7     A       5 2011-01-09  CA  DE
8     A       5 2011-01-09  yy  xx
9     B       2 2011-01-02  NY  OC
10    C       3 2011-01-04  CA  DE
11    C       2 2011-01-03  CA  DE
12    D       5 2011-01-12  NY  OT
13    D       3 2011-01-05  NY  OT
14    F       5 2011-01-10  WA  DE
15    F       4 2011-01-07  WA  DE
16    G       4 2011-01-08  WA  OC



Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Fri, Jun 17, 2016 at 8:33 PM, Farnoosh Sheikhi via R-help <
r-help at r-project.org> wrote: