Skip to content

creating a database

4 messages · dxc13, jim holtman, elijah wright +1 more

#
useR's,

I am writing a program in which the input can be multidimensional.  As of
now, to hold the input, I have created an n by m matrix where n is the
number of observations and m is the number of variables.  The data that I
could potentially use can contain well over 20,000 observations.  

Can a simple matrix be used for this or would it be better and more
efficient to create an external database to hold the data.  If so, should
the database be created using C and how would I do this (seeing as that I
have never programmed in C)?  

Any help would be greatly appreciated.  Thank you

Derek
#
What are you intending to do with the data?  How big is 'm'?  How do
you want to access the data?  You can always put it in a SQL database
that R can access and then pull out the rows that you are interested
in.  If 'm' is 100, then if you are just keeping numeric data, this
will only require 16MB of memory, so you can just keep it in memory.

Some more information about the characteristics of the data and what
you want to do with it are required to determine what might be the
appropriate method for storing/accessing it.
On Dec 17, 2007 10:10 PM, dxc13 <dxc13 at health.state.ny.us> wrote:

  
    
#
You don't want to be down at the C level, most likely:  it would be much 
more straightforward and programmer-efficient to use one of the available 
bindings to one of the available open-source databases.

R has useful / usable bindings to postgresql, sqlite, and mysql, among 
many others.

These are, however, more generally useful when you reach the point that 
you simply can't manage the volume of your data in R objects or in data 
frames. [And, well, you can go a LONG way with intelligently named R 
objects.  :-)]

--elijah
#
If all your entries are double precision then you are
using 8 bytes per entry, so 20,000*n entries are just
160,000*n bytes, i.e. less than 160*n Kb. If your n is
100 you get 16 Mb which is not that much (especially
if you pre-allocate it only once). So just use the
matrix and don't worry!
--- dxc13 <dxc13 at health.state.ny.us> wrote:

            
http://www.nabble.com/creating-a-database-tp14375875p14375875.html