Skip to content

efficient ways to dynamically grow a dataframe

3 messages · Matteo Richiardi, jim holtman, R. Michael Weylandt

#
Hi,
I'm trying to write a small microsimulation in R: that is, I have a
dataframe with info on N individuals for the base-year and I have to
grow it dynamically for T periods:

df = data.frame(
 id = 1:N,
 x =....
)

The most straightforward way to solve the problem that came to my mind
is to create for every period a new dataframe:

for(t in 1:T){
 for(i in 1:N){
  row = data.frame(
   id = i,
   t = t,
   x = ...
   )
   df = rbind(df,row)
 }
}

This is very inefficient and my pc gets immediately stucked as N is
raised above some thousands.
As an alternative, I created an empty dataframe for all the projected
periods, and then filled it:

df1 = data.frame(
 id = rep(1:N,T),
 t = rep(1:T, each = N),
 x = rep(NA,N*T)
)

for(t in 1:T){
 for(i in 1:N){
  x = ...
  df1[df1$id==i & df1$t==t,"x"] = x
 }
}
df = rbind(df,df1)

This is also too slow, and my PC gets stucked. I don't want to go for
a matrix, because I'd loose the column names and everything will
become too much error-prone.
Any suggestions on how to do it?
Thanks in advance,
Matteo
#
First, dataframes can be much slower than matrices, for example, if
you are changing/accessing values a lot.  I would suggest that you use
a matrix since is seems that all your values are numeric.  Allocate a
large empty matrix to start (hopefully as large as you need).  If you
exceed this, you have the option of 'rbind'ing more empty rows on and
continuing.  This might depend on how large your final matrix might be
(you did not state the boundary conditions).

On Thu, Dec 1, 2011 at 6:34 AM, Matteo Richiardi
<matteo.richiardi at unito.it> wrote:

  
    
#
I'd also suggest you read circle 2 of the "R inferno" (just google it)
which has some helpful tips on how to deal with these sorts of
problems.

Also, did you know that matrices can have column names and that
rbind() preserves them? E.g.,

m <- matrix(1:6, 3); colnames(m) <- letters[1:2]

print(m)

print(rbind(m, c(10, 11)))

Michael
On Thu, Dec 1, 2011 at 9:02 AM, jim holtman <jholtman at gmail.com> wrote: