Skip to content
Prev 427 / 1559 Next

Is any database particularly better at "exchanging" large datasets with R?

Some time back, Thomas wrote:
Firstly, as has already been suggested nothing beats testing whatever  
set up you have in mind.
Secondly, assuming this data is just for your use rather than a shared  
database which will be updated by many people, almost everything that  
has been mentioned so far is basically unsuitable.  SQL Server, MySQL,  
Postgres, Oracle etc. devote most of their many megabytes code to a  
vast number of features like access control, transaction rollback,  
stored procedures, logging etc. etc., which are almost certainly of no  
use to you, will slow the code down and cause admin headaches.  If you  
really want SQL look at SQLlite.  It is free.  It is just a SQL  
storage system without any of the overhead you don't need (it is a  
2.3MB library on my computer and that is for four architectures!) and  
will scale to TB size datasets.
If all you want is to store and access large amounts of data, then you  
probably don't want SQL at all.  Someone mentioned BLOBs but that  
might be hard work programming.  You might do well to look at HDF5 (http://hdf.ncsa.uiuc.edu/index.html 
).  This is a storage format specifically designed for storing very  
large amounts of scientific/engineering data.  Again it is free and  
open source and has an R interface.
Cheers
Bill Northcott