-----Original Message-----
From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org] On Behalf Of
Cecilia Carmo
Sent: Sunday, August 22, 2010 10:24 AM
To: Erik Iverson
Cc: r-help at r-project.org; Hadley Wickham
Subject: Re: [R] problems with merge() - the output has
many
repeated lines
I have done
intersect(names(df1), names(df2))
[1] "firm" "year"
This is the key I used to merge
merge(df1,df2,by=c("firm","year"))
And there is just one row firm/year in df1 that matches
with another firm/year row in df2. Df1 has more
firm/year
rows than df2, and them don't match with none in df2.
To get to the bottom of this you may have to show
us some of the relevant rows of data (80000 rows
per dataset would be a lot to mailout). For starters
it would be nice to see the output of
str(df1)
str(df2)
str(m) # where m is merge(df1,df2)
Then it would nice to see the output of
table(duplicated(df1[, c("firm","year")]))
and the same for df2 and m.
You said you saw many repeated rows in the output of
merge(df1,df2,...), which I am calling 'm'. Say the
i'th
row is one of the repeated rows. What are the outputs
of
df1[ df1$firm==m$firm[i] & df1$year==m$year[i],
,drop=FALSE]
df2[ df2$firm==m$firm[i] & df2$year==m$year[i],
,drop=FALSE]
m[ m$firm==m$firm[i] & m$year==m$year[i], ,drop=FALSE]
?
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
Cec?lia
Em Sun, 22 Aug 2010 12:09:57 -0500
Erik Iverson <eriki at ccbr.umn.edu> escreveu:
Cecilia -
Find what columns you're matching on,
intersect(names(df1), names(df2)),
Maybe that will shed some light on the issue.
On 08/22/2010 12:02 PM, Cecilia Carmo wrote:
Thanks, but I don't have multiple matches and the
repeated in the
final dataframe are exactly equal in all columns.
Cec?lia
Sat, 21 Aug 2010 10:58:53 -0500
Hadley Wickham <hadley at rice.edu> escreveu:
You may find a close reading of ?merge helpful,
particularly this
sentence: "If there is more than one match, all
matches contribute one row each" (so check that you
don't have
multiple matches).
Hadley
On Sat, Aug 21, 2010 at 10:45 AM, Cecilia Carmo
<cecilia.carmo at ua.pt>
wrote:
Hi everyone,
I have been merging many big dataframes (about
rows each) and I
never
had this problem, but now it happened to me and I
to know if
someone
knows what could be happening.
The final dataframe has many rows, an impossible
I have done
edit(dataframe) and I saw that there are many
rows (all equal).
Thanks for any help,
Cec?lia Carmo
Universidade de Aveiro
Portugal