help with merging 2 data frames
This should do the trick :
colnames(y)[1:2]=c("a","a")
y2=rbind(y[,-1], y[,-2]) #duplicating the "y" matrix so the identifiers
are only in 1 column
merged = merge(x,y2)
merged
a b d e1 e2
1 aa 1 10 100 101
2 aa 2 20 200 201
3 ab 1 30 100 101
4 ab 2 40 200 201
5 ba 1 50 300 301
6 ba 2 60 400 401
7 bb 1 70 300 301
8 bb 2 80 400 401
desired
a b d e1 e2
1 aa 1 10 100 101
2 aa 2 20 200 201
3 ab 1 30 100 101
4 ab 2 40 200 201
5 ba 1 50 300 301
6 ba 2 60 400 401
7 bb 1 70 300 301
8 bb 2 80 400 401
all(merged==desired)
[1] TRUE
Cheers,
Eloi
On 12-07-11 03:50 PM, Dimitri Liakhovitski wrote:
Jorge, thank you! that seems to be working, but unfortunately in real life I have thousands of variables (except for a, a2, a3 and b) so that manually selecting columns (as in c(2:4, 8:9)) would be too difficult... Dimitri On Wed, Jul 11, 2012 at 6:36 PM, Jorge I Velez <jorgeivanvelez at gmail.com> wrote:
Hi Dimitri, Try creating a key for "x" and "y" and then merging the result by that variable: x$key <- with(x, paste(a, b, sep = "/")) y$key <- with(y, paste(a2, b, sep = "/")) merge(x, y, by = 'key')[, c(2:4, 8:9)] HTH, Jorge.- On Wed, Jul 11, 2012 at 6:28 PM, Dimitri Liakhovitski <> wrote:
Dear R-ers,
I feel I am close, but can't get it quite right.
Thanks a lot for your help!
Dimitri
# I have 2 data frames:
x<-data.frame(a=c("aa","aa","ab","ab","ba","ba","bb","bb"),b=c(1:2,1:2,1:2,1:2),d=c(10,20,30,40,50,60,70,80))
y<-data.frame(a2=c("aa","aa","ba","ba"),a3=c("ab","ab","bb","bb"),b=c(1:2,1:2),e1=c(100,200,300,400),e2=c(101,201,301,401))
(x);(y)
# I'd like to merge them so that the result looks like this:
desired<-data.frame(a=c("aa","aa","ab","ab","ba","ba","bb","bb"),b=c(1:2,1:2,1:2,1:2),d=c(10,20,30,40,50,60,70,80),
e1=c(100,200,100,200,300,400,300,400),e2=c(101,201,101,201,301,401,301,401))
(desired)
# In other words, I want column e1 and e2 entries from data frame y to
be repeated based on matching of column a from x and columns a2 and
then a3 from y.
# I am trying step-by-step - first I am using column a2 from data
frame y for merging:
out1<-merge(x,y[-2],by.x=c("a","b"),by.y=c("a2","b"),all.x=T,all.y=F)
(out1) # looking good - half of the job is done
# Step2 - does not work
# next line produces columns e1 and e2 twice (in real life I have tons
of columns like e1 and e2):
merge(out1,y[-1],by.x=c("a","b"),by.y=c("a3","b"),all.x=T,all.y=F)
# next line also doesn't do the job:
merge(out1,y[-1],by.x=c("a","b","e1","e2"),by.y=c("a3","b","e1","e2"),all.x=T,all.y=F)
# Finally, I tried this approach:
out1<-merge(x,y[-2],by.x=c("a","b"),by.y=c("a2","b"),all.x=T,all.y=F)
out2<-merge(x,y[-1],by.x=c("a","b"),by.y=c("a3","b"),all.x=T,all.y=F)
(out1); (out2)
# Now I need to merge these 2 - however, the next line doubles the
number of entries:
merge(out1,out2,by=names(out1),all.x=T,all.y=T)
--
Dimitri Liakhovitski
marketfusionanalytics.com
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Eloi Mercier Bioinformatics PhD Student, UBC Paul Pavlidis Lab 2185 East Mall University of British Columbia Vancouver BC V6T1Z4