Skip to content
Prev 277 / 1559 Next

reducing RODBC odbcQuery memory use?

New to R, sorry if one or either of these is an inappropriate list for a
question like this below; please let me know if this is a general help
question.

Jill Willie 
Open Seas
Safeco Insurance
jilwil at safeco.com 

-----Original Message-----
From: WILLIE, JILL 
Sent: Thursday, January 25, 2007 2:27 PM
To: r-help at stat.math.ethz.ch
Subject: reducing RODBC odbcQuery memory use?

Basic Questions:

1.  Can I avoid having RODBC use so much memory (35 times the data size
or more) making a data.frame & then .rda file via. sqlQuery/save?  

2.  If not, is there some more appropriate way from w/in R to pull large
data sets (2-5GB) into .rda files from sql? 

Testing details (R transcript below): 1GB CPU, 1GB RAM windows machine.

1.  testing bigger input table (AUTCombinedWA_BILossCost_1per), size is
20MB in sql, 100000 rows, 2 numeric columns, 55 integer columns;
consumes 350000kb of memory & 800000kb of virtual memory to execute the
sqlQuery command.  Memory not released after the step finishes or upon
execution of odbcCloseAll(), or gc().

2.  tested small input table, size is 2MB in sql, 10000 rows, 2 numeric
columns, 55 integer columns; consumes 55000kb of memory & 515000kb (vm
seems oddly high to me) of virtual memory to execute the sqlQuery
command.

3.  concluded the high memory use is isolated to the odbcQuery step w/in
the sqlQuery function as opposed to sqlGetResults or ODBC itself.


Relevant R session transcript:
AUTCombinedWA_BILossCost_1per"))
Additional testing details:
I exited R which released all memory cleanly, then started R again,
loaded the .rda saved in prior step as below.  This confirmed relatively
little of the memory is consumed going from .rda to data frame;
isolating to the RODBC step:
I closed R & opened MS Access & used the same DSN "psmrd", & to import
the AUTCombinedWA_BILossCost_1per into MS Access which required about
30000kb of memory & 20000kb of virtual.

And finally, I have this excerpt from Prof Brian Ripley that seems
potentially relevant (if it's not just confusion because I called them
'byte-size' when really I should have said they're integers just having
values limited to 1-255).  In any case I'm unable to see from the RODBC
help how to specify this:  "...sqlQuery returns a data frame directly.
I think you need to tell RODBC to translate your 'byte-sized factors' to
numeric, as it will be going through character if these are a type it
does not know about."  

Read all the RODBC help, read all the data import guide & searched help
archives...can't find an answer.  Would appreciate advice, experience,
or direction.

Jill Willie 
Open Seas
Safeco Insurance
jilwil at safeco.com 


-----Original Message-----
From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk] 
Sent: Thursday, January 25, 2007 12:05 AM
To: WILLIE, JILL
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] Size of data vs. needed memory...rule of thumb?
On Wed, 24 Jan 2007, WILLIE, JILL wrote:

            
very
it
Under most OSes.  Because Windows has no means to limit the amount made 
available, R under Windows does have it own limiting mechanism (which
you 
hit in the examples below).  R under Linux will allow you to run a 4GB 
process on a machine with 2GB RAM, but you are likely to get around 0.7%

usage.  (One of my colleagues did that on a server earlier this week, 
hence the very specific answer.)
Not to run computations in R.  Some parts of R (e.g. GUIs) and some 
libraries (e.g. some BLAS) are multithreaded.  There are multiprocess
packages, e.g. Rmpi, rpvm, snow.
Yes, and has for many years.
yes, because the address space will not get seriously fragmented.  See
the 
appropriate section in R-admin.html (referenced from INSTALL).
I'd say it was a bit low, but it really depends on the analysis you are 
doing, how 1GB of data is made up (many rows?, many cols?, etc) and so
on.
Had you asked me to suggest a ratio I would have said 5.
Well, sqlQuery returns a data frame directly.  I think you need to tell 
RODBC to translate your 'byte-sized factors' to numeric, as it will be 
going through character if these are a type it does not know about.
send
the
http://www.R-project.org/posting-guide.html