for loop performance
I am running some simulations in R involving reading in several hundred datasets, performing some statistics and outputting those statistics to file. I have noticed that it seems that the time it takes to process of a dataset (or, say, a set of 100 datasets) seems to take longer as the simulation progresses.
Reading data, e.g. with read.table can be slow because it does a fair bit of checking content, guessing data types etc. So I guess the question is: how is your data stored (files, in what format, database) and how do you read it into R? Once we know this there may be tricks to speed up the data import.
I am curious to know if this has to do with how R processes code in loops or if it might be due to memory usage issues (e.g., repeatedly reading data into the same matrix).
Probalby not - I would guess it's the parsing of the input data that is slow. cu Philipp
Dr. Philipp Pagel Lehrstuhl f?r Genomorientierte Bioinformatik Technische Universit?t M?nchen Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/