-----Original Message-----
From: Ping-Hsun Hsieh [mailto:hsiehp at ohsu.edu]
Sent: Friday, May 15, 2009 9:58 AM
To: Peter Alspach; William Dunlap; hadley wickham
Cc: r-help at r-project.org
Subject: RE: [R] memory usage grows too fast
Thanks for Peter, William, and Hadley's helps.
Your codes are much more concise than mine. :P
Both William and Hadley's comments are the same. Here are their codes.
f <- function(dataMatrix) rowMeans(datamatrix=="02")
And Peter's codes are the following.
apply(yourMatrix, 1, function(x)
length(x[x==yourPattern]))/ncol(yourMatrix)
In terms of the running time, the first one ran faster than
the later one on my dataset (2.5 mins vs. 6.4 mins)
The memory consumption, however, of the first one is much
higher than the later. ( >8G vs. ~3G )
Any thoughts? My guess is the rowMeans created extra copies
to perform its calculation, but not so sure.
And I am also interested in understanding ways to handle
memory issues. Help someone could shed light on this for me. :)
Best,
Mike
-----Original Message-----
From: Peter Alspach [mailto:PAlspach at hortresearch.co.nz]
Sent: Thursday, May 14, 2009 4:47 PM
To: Ping-Hsun Hsieh
Subject: RE: [R] memory usage grows too fast
Tena koe Mike
If I understand you correctly, you should be able to use
something like:
apply(yourMatrix, 1, function(x)
length(x[x==yourPattern]))/ncol(yourMatrix)
I see you've divided by nrow(yourMatrix) so perhaps I am missing
something.
HTH ...
Peter Alspach
-----Original Message-----
From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org] On Behalf Of Ping-Hsun Hsieh
Sent: Friday, 15 May 2009 11:22 a.m.
To: r-help at r-project.org
Subject: [R] memory usage grows too fast
Hi All,
I have a 1000x1000000 matrix.
The calculation I would like to do is actually very simple:
for each row, calculate the frequency of a given pattern. For
example, a toy dataset is as follows.
Col1 Col2 Col3 Col4
01 02 02 00 => Freq of "02" is 0.5
02 02 02 01 => Freq of "02" is 0.75
00 02 01 01 ...
My code is quite simple as the following to find the pattern "02".
OccurrenceRate_Fun<-function(dataMatrix)
{
tmp<-NULL
tmpMatrix<-apply(dataMatrix,1,match,"02")
for ( i in 1: ncol(tmpMatrix))
{
tmpRate<-table(tmpMatrix[,i])[[1]]/ nrow(tmpMatrix)
tmp<-c(tmp,tmpHET)
}
rm(tmpMatrix)
rm(tmpRate)
return(tmp)
gc()
}
The problem is the memory usage grows very fast and hard to
be handled on machines with less RAM.
Could anyone please give me some comments on how to reduce
the space complexity in this calculation?
Thanks,
Mike