Skip to content

Merging data frames, or one column/vector with a data frame filling out empty rows with NA's

3 messages · joe1985, Sarah Goslee, Stephen Bond

#
Hello

I have two data frames, SNP4 and SNP1:
Animal     Marker        Y
3213 194073197  P1001 0.021088
1295 194073197  P1002 0.021088
915   194073197  P1004 0.021088
2833 194073197  P1005 0.021088
1487 194073197  P1006 0.021088
1885 194073197  P1007 0.021088
Animal    Marker x
3213 194073197  P1001 2
1295 194073197  P1002 1
915   194073197  P1004 2
2833 194073197  P1005 0
1487 194073197  P1006 2
1885 194073197  P1007 0

I want these two data frames merged by 'Marker', but when i try
Error: cannot allocate vector of size 2.4 Gb
In addition: Warning messages:
1: In merge.data.frame(SNP4, SNP1, by = "Marker", all = TRUE) :
  Reached total allocation of 1535Mb: see help(memory.size)
2: In merge.data.frame(SNP4, SNP1, by = "Marker", all = TRUE) :
  Reached total allocation of 1535Mb: see help(memory.size)
3: In merge.data.frame(SNP4, SNP1, by = "Marker", all = TRUE) :
  Reached total allocation of 1535Mb: see help(memory.size)
4: In merge.data.frame(SNP4, SNP1, by = "Marker", all = TRUE) :
  Reached total allocation of 1535Mb: see help(memory.size)

And error occurs.

What i want is the column SNP1$x merged together with SNP4 by Marker, so
some markers will have NA's in the 'x'-column in the SNP5 dataset.

I also tried this
Error in fix.by(by.y, y) : 'by' must specify valid column(s)

I won't work either. 

Does anyone have any idea how to solve this.

Regards,

Johannes.
#
Hi,

How about this:
Marker    Animal                  Y x
1  P1001 194073197 0.021088 2
2  P1002 194073197 0.021088 1
3  P1004 194073197 0.021088 2
4  P1005 194073197 0.021088 0
5  P1006 194073197 0.021088 2
6  P1007 194073197 0.021088 0

This ignores Animal, and that may or may not be what you want -
it wasn't clear from your question.

But your error is due to memory limitations - could be due to
specifying the wrong merge, or to having files larger than your
computer can handle. This is a good job for a proper database.
If you just include SNP1$x, there is no Marker column to merge on. You
need to include at least two columns.
On Wed, Apr 22, 2009 at 3:30 AM, joe1985 <johannes at dsr.life.ku.dk> wrote:

  
    
5 days later
#
You are exceeding your max memory here, so R will not be able to do that. 
dump both tables into a db such as mysql and then run the query either from
RMySQL or from mysql directly. then output the result and import back in R.

that will take care of the merge, but not sure what will happen when you
actually try to run some stats on the object. it is very likely the
operation will exceed memory again.

in the end you may have to write your own code which does not attempt to
load everything in memory, it could be either R or a lower level language.

if you have SAS it will probably work as it deals with large sets in long
format well. depending on what you do R may be able to deal with it after a
reshape() to a wide format.
joe1985 wrote: