Dear R-helpers, I'm dealing with large datasets, say tables of 60 000 times 12 or so, and some of the functions are (too ) slow and I'm therefore trying to find ways to speed them up. I've found that for instance for-loops are slow in R (both by testing and by searching through mail archives etc ) Are there any more well known arguments that are slow in R, ,maybe at data representation level, code-writing, reading in the data. I've also tried incorporating C-code, which works well, but I'd also like to find other, maybe more "shortcut" ways. Thanks in advance, Freja!
speeding up functions for large datasets
3 messages · Freja.Vamborg@astrazeneca.com, Brian Ripley, Jean Eid
On Fri, 6 Aug 2004 Freja.Vamborg at astrazeneca.com wrote:
Dear R-helpers, I'm dealing with large datasets, say tables of 60 000 times 12 or so, and some of the functions are (too ) slow and I'm therefore trying to find ways to speed them up. I've found that for instance for-loops are slow in R (both by testing and by searching through mail archives etc )
I don't think that is really true, but it is the case that using row-by-row operations in your situation would be slow *if they are unnecessary*. It is a question of choosing the right algorithmic approach, not whether it is implemented by for-loops or lapply or ....
Are there any more well known arguments that are slow in R, ,maybe at data representation level, code-writing, reading in the data. I've also tried incorporating C-code, which works well, but I'd also like to find other, maybe more "shortcut" ways.
`S Programming' (see the R FAQ) has a whole chapter on this sort of thing, with examples. More generally you want to take a `whole object' view and use indexing and other vectorized operations. Note also that what is slow does change with the version of R and especially how much memory you have installed. The first step is to get enough RAM.
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
you might want to turn your data into a matrix. You get much much faster for loops doing that. Jean,
On Fri, 6 Aug 2004 Freja.Vamborg at astrazeneca.com wrote:
Dear R-helpers, I'm dealing with large datasets, say tables of 60 000 times 12 or so, and some of the functions are (too ) slow and I'm therefore trying to find ways to speed them up. I've found that for instance for-loops are slow in R (both by testing and by searching through mail archives etc ) Are there any more well known arguments that are slow in R, ,maybe at data representation level, code-writing, reading in the data. I've also tried incorporating C-code, which works well, but I'd also like to find other, maybe more "shortcut" ways. Thanks in advance, Freja!