-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Alex Ruiz Euler
Sent: Wednesday, August 17, 2011 3:54 PM
To: r-help at r-project.org
Subject: [R] More efficient option to append()?
Dear R community,
I have a 2 million by 2 matrix that looks like this:
x<-sample(1:15,2000000, replace=T)
y<-sample(1:10*1000, 2000000, replace=T)
x y
[1,] 10 4000
[2,] 3 1000
[3,] 3 4000
[4,] 8 6000
[5,] 2 9000
[6,] 3 8000
[7,] 2 10000
(...)
The first column is a population expansion factor for the number in the
second column (household income). I want to expand the second column
with the first so that I end up with a vector beginning with 10
observations of 4000, then 3 observations of 1000 and so on. In my mind
the natural approach would be to create a NULL vector and append the
expansions:
myvar<-NULL
myvar<-append(myvar, replicate(x[1],y[1]), 1)
for (i in 2:length(x)) {
myvar<-append(myvar,replicate(x[i],y[i]),sum(x[1:i])+1)
}
to end with a vector of sum(x), which in my real database corresponds
to 22 million observations.
This works fine --if I only run it for the first, say, 1000
observations. If I try to perform this on all 2 million observations
it takes long, way too long for this to be useful (I left it running
11 hours yesterday to no avail).
I know R performs well with operations on relatively large vectors. Why
is this so inefficient? And what would be the smart way to do this?
Thanks in advance.
Alex