Skip to content
Prev 3025 / 398500 Next

Performance & capacity characteristics of R?

I won't speak for Karsten, but will describe my own use of R for
(potentially) large datasets, just to give you an idea of what at
least one large dataset user is attempting to do...

The application is a simulation of radar detection and tracking of
aircraft.  The data collected is radar detections and tracks.  There
are potentially (though not typically) 200 radars by 200 aircraft in
the simulation.  In this extreme case, I expect to collect approx.
2GB of data in a 4-hour simulation run.  Fortunately, this is not
typical; I'm trying to get a better handle on what's typical.

There are two primary uses.  One is to produce various plots, such as
   a) detections and tracks against time, and 
   b) detections and tracks against geographic location with
      respect to true aircraft position.

The other is to perform statistical measures across multiple runs.
I don't know all the details of what functions will be performed,
but paired t-test has been mentioned.  A typical question to be
answered is: how well did a jammer perform in reducing the
number of detections and tracks?  Another one is:  how much
better did this aircraft perform (in avoiding detection)
compared to that other aircraft?  Another: which flight path is
better to avoid detection and tracking?

The largest datasets contain approx 50M observations of 20 variables
for detections.  For handling these large datasets, I'm counting on
the fact that typical analyses focus on smaller time ranges and on
specific aircraft.  So, I plan to preprocess the data to select just
the radars, aircraft, and time range of interest before loading the
data file into R.

Currently, the dataset is kept in Oracle; I hope to transition
to HDF (http://hdf.ncsa.uiuc.edu/).  Oracle has lots of advantages,
but is very slow to load this much data.  I have yet to evaluate
HDF for large datasets.

--
Terry J. Westley, Principal Engineer
Veridian Engineering, Calspan Operations
P.O. Box 400, Buffalo, NY 14225
twestley at buffalo.veridian.com    http://www.veridian.com

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._