An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090422/527960e2/attachment-0001.pl>
Merging data frames, or one column/vector with a data frame filling out empty rows with NA's
3 messages · Johannes G. Madsen, Gabor Grothendieck, David Winsemius
Try this (where SNP1x is same as SNP1 from your post but without the last line). If the merge below does not work on real data set due to size then try the sqldf alternative as it
SNP1x <-
+ structure(list(Animal = c(194073197L, 194073197L, 194073197L,
+ 194073197L, 194073197L), Marker = structure(1:5, .Label = c("P1001",
+ "P1002", "P1004", "P1005", "P1006", "P1007"), class = "factor"),
+ x = c(2L, 1L, 2L, 0L, 2L)), .Names = c("Animal", "Marker",
+ "x"), row.names = c("3213", "1295", "915", "2833", "1487"), class =
"data.frame")
SNP4 <-
+ structure(list(Animal = c(194073197L, 194073197L, 194073197L,
+ 194073197L, 194073197L, 194073197L), Marker = structure(1:6, .Label
= c("P1001",
+ "P1002", "P1004", "P1005", "P1006", "P1007"), class = "factor"),
+ Y = c(0.021088, 0.021088, 0.021088, 0.021088, 0.021088, 0.021088
+ )), .Names = c("Animal", "Marker", "Y"), class = "data.frame",
row.names = c("3213",
+ "1295", "915", "2833", "1487", "1885"))
merge(SNP1x, SNP4, all = TRUE)
Animal Marker x Y 1 194073197 P1001 2 0.021088 2 194073197 P1002 1 0.021088 3 194073197 P1004 2 0.021088 4 194073197 P1005 0 0.021088 5 194073197 P1006 2 0.021088 6 194073197 P1007 NA 0.021088
library(sqldf)
sqldf("select * from SNP4 left join SNP1x using (Animal, Marker)")
Animal Marker Y x 1 194073197 P1001 0.021088 2 2 194073197 P1002 0.021088 1 3 194073197 P1004 0.021088 2 4 194073197 P1005 0.021088 0 5 194073197 P1006 0.021088 2 6 194073197 P1007 0.021088 NA
# or if that does not work due to size force it to create, use
# and destroy an external data base
sqldf("select * from SNP4 left join SNP1x using (Animal, Marker)", dbname = "temp.db")
Animal Marker Y x 1 194073197 P1001 0.021088 2 2 194073197 P1002 0.021088 1 3 194073197 P1004 0.021088 2 4 194073197 P1005 0.021088 0 5 194073197 P1006 0.021088 2 6 194073197 P1007 0.021088 NA On Wed, Apr 22, 2009 at 5:22 AM, Johannes G. Madsen
<JGM at dansksvineproduktion.dk> wrote:
Hello I have two data frames, SNP4 and SNP1:
head(SNP4)
? ? ? ? ?Animal ? ? Marker ? ? ? ?Y 3213 194073197 ?P1001 0.021088 1295 194073197 ?P1002 0.021088 915 ? 194073197 ?P1004 0.021088 2833 194073197 ?P1005 0.021088 1487 194073197 ?P1006 0.021088 1885 194073197 ?P1007 0.021088
head(SNP1)
? ? ? ? ? Animal ? ?Marker x 3213 194073197 ?P1001 2 1295 194073197 ?P1002 1 915 ? 194073197 ?P1004 2 2833 194073197 ?P1005 0 1487 194073197 ?P1006 2 1885 194073197 ?P1007 0 I want these two data frames merged by 'Marker', but when i try
SNP5 <- merge(SNP4, SNP1, by = 'Marker', all = TRUE)
Error: cannot allocate vector of size 2.4 Gb In addition: Warning messages: 1: In merge.data.frame(SNP4, SNP1, by = "Marker", all = TRUE) : ?Reached total allocation of 1535Mb: see help(memory.size) 2: In merge.data.frame(SNP4, SNP1, by = "Marker", all = TRUE) : ?Reached total allocation of 1535Mb: see help(memory.size) 3: In merge.data.frame(SNP4, SNP1, by = "Marker", all = TRUE) : ?Reached total allocation of 1535Mb: see help(memory.size) 4: In merge.data.frame(SNP4, SNP1, by = "Marker", all = TRUE) : ?Reached total allocation of 1535Mb: see help(memory.size) And error occurs. What i want is the column SNP1$x merged together with SNP4 by Marker, so some markers will have NA's in the 'x'-column in the SNP5 dataset. I also tried this
SNP5 <- merge(SNP4, SNP1$x, by.x = 'Marker', by.y = 'Marker', all = TRUE)
Error in fix.by(by.y, y) : 'by' must specify valid column(s) I won't work either. Does anyone have any idea how to solve this. Regards, Johannes. ? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Apr 22, 2009, at 5:22 AM, Johannes G. Madsen wrote:
Hello I have two data frames, SNP4 and SNP1:
head(SNP4)
Animal Marker Y 3213 194073197 P1001 0.021088 1295 194073197 P1002 0.021088 915 194073197 P1004 0.021088 2833 194073197 P1005 0.021088 1487 194073197 P1006 0.021088 1885 194073197 P1007 0.021088
head(SNP1)
Animal Marker x 3213 194073197 P1001 2 1295 194073197 P1002 1 915 194073197 P1004 2 2833 194073197 P1005 0 1487 194073197 P1006 2 1885 194073197 P1007 0 I want these two data frames merged by 'Marker', but when i try
SNP5 <- merge(SNP4, SNP1, by = 'Marker', all = TRUE)
Error: cannot allocate vector of size 2.4 Gb In addition: Warning messages: 1: In merge.data.frame(SNP4, SNP1, by = "Marker", all = TRUE) : Reached total allocation of 1535Mb: see help(memory.size) 2: In merge.data.frame(SNP4, SNP1, by = "Marker", all = TRUE) : Reached total allocation of 1535Mb: see help(memory.size) 3: In merge.data.frame(SNP4, SNP1, by = "Marker", all = TRUE) : Reached total allocation of 1535Mb: see help(memory.size) 4: In merge.data.frame(SNP4, SNP1, by = "Marker", all = TRUE) : Reached total allocation of 1535Mb: see help(memory.size) And error occurs.
So what are the results of: str(SNP4) ; str(SNP1) # this will tell us how large these objects are. And are you sure you don't want the merge to occur by Animal as well?
What i want is the column SNP1$x merged together with SNP4 by Marker, so some markers will have NA's in the 'x'-column in the SNP5 dataset. I also tried this
SNP5 <- merge(SNP4, SNP1$x, by.x = 'Marker', by.y = 'Marker', all = TRUE)
Error in fix.by(by.y, y) : 'by' must specify valid column(s) I won't work either. Does anyone have any idea how to solve this.
The second error seems pretty obvious. You are trying to merge a vector that has no longer any "Marker" with a dataframe that does.
Regards, Johannes.
David Winsemius, MD Heritage Laboratories West Hartford, CT