First, dataframes can be much slower than matrices, for example, if
you are changing/accessing values a lot. ?I would suggest that you use
a matrix since is seems that all your values are numeric. ?Allocate a
large empty matrix to start (hopefully as large as you need). ?If you
exceed this, you have the option of 'rbind'ing more empty rows on and
continuing. ?This might depend on how large your final matrix might be
(you did not state the boundary conditions).
On Thu, Dec 1, 2011 at 6:34 AM, Matteo Richiardi
<matteo.richiardi at unito.it> wrote:
Hi,
I'm trying to write a small microsimulation in R: that is, I have a
dataframe with info on N individuals for the base-year and I have to
grow it dynamically for T periods:
df = data.frame(
?id = 1:N,
?x =....
)
The most straightforward way to solve the problem that came to my mind
is to create for every period a new dataframe:
for(t in 1:T){
?for(i in 1:N){
?row = data.frame(
? id = i,
? t = t,
? x = ...
? )
? df = rbind(df,row)
?}
}
This is very inefficient and my pc gets immediately stucked as N is
raised above some thousands.
As an alternative, I created an empty dataframe for all the projected
periods, and then filled it:
df1 = data.frame(
?id = rep(1:N,T),
?t = rep(1:T, each = N),
?x = rep(NA,N*T)
)
for(t in 1:T){
?for(i in 1:N){
?x = ...
?df1[df1$id==i & df1$t==t,"x"] = x
?}
}
df = rbind(df,df1)
This is also too slow, and my PC gets stucked. I don't want to go for
a matrix, because I'd loose the column names and everything will
become too much error-prone.
Any suggestions on how to do it?
Thanks in advance,
Matteo
--
Matteo Richiardi
University of Turin
Faculty of Law
Department of Economics "Cognetti De Martiis"
via Po 53, 10124 Torino
Email: matteo.richiardi at unito.it
Tel. +39 011 670 3870
Web page: http://www.personalweb.unito.it/matteo.richiardi/