Skip to content
Prev 319226 / 398506 Next

New Stack

Don't.  This is a classic mistake by newcomers to R that leads to abysmal performance. The best alternative is to compute one column at a time, so your data frame should be initialized with the inputs to your calculations, and you compute output columns as vector expressions of the inputs without looping. For example

dta <- data.frame( X=1:10 )
dta$Y <- 2*dta$X+3
dta$clip <- dta$Y > 13
dta$Yc1 <- ifelse( dta$clip, 13, dta$Y )
The ifelse function computes both possible answers for every element of the result and the chooses between them. If you would prefer to do those computations only for selected rows then you can use indexed assignment:

dta$Yc2 <- dta$Y
dta$Yc2[dta$clip] <- 13

If you absolutely must compute your results in little chunks of rows (e.g. one row at a time) then at least store them into a list as you go and collapse them into one data frame all at once using, say, the sapply function. This avoids allocating a whole sequence of data frames with sizes from 1:n, which is very inefficient use of memory.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.
Anup khanal <zanup at hotmail.com> wrote: