Skip to content

ggwr and memory problems

6 messages · Luca Moiana, Roger Bivand

#
On Mon, 21 Jan 2008, Luca Moiana wrote:

            
Have you tuned Windows memory use as discussed in the R for Windows FAQ - 
section 2.9? The binaries are 32-bit, and need to be told how much memory 
to use when trying to carry out memory intensive work.
It isn't strictly necessary to use all the observations to find the 
bandwidth - take a couple of 5% samples and see if the results differ 
much.
You can try that, but consider dividing the fit.points up into chunks, and 
running several R processes when actually fitting the ggwr model. The data 
points stay the same, but fit subsets of the fit.points in separate 
processes.

ggwr() has not (yet) been adapted for using a cluster, but gwr() has and a 
snow socket cluster will run happily on Linux there, and since it is run 
within the function, it concatenates the results before returning. If this 
would be useful of ggwr(), consider taking a look at the code.

Roger

  
    
#
On Mon, 21 Jan 2008, Luca Moiana wrote:

            
OK. It may run on Linux, because the memory allocation there accepts many 
small free patches but Windows wants a single free chunk the size of the 
request.
Try subsetting the data= argument object: df[o,] with the output of o <- 
sample(). Remember to say set.seed(whatever) to be able to repeat if need 
be.
If no fit.points are given, the data points are copied across as fit 
points internally. You are free to subset the data.points into many 
fit.points, and concatenate the output objects afterwards. This should 
remove the difficulty.

Roger

  
    
#
On Tue, 22 Jan 2008, Luca Moiana wrote:

            
(Please use a better email client, one that only uses plain text, does not 
use HTML, and does not break lines in the wrong places or add empty 
lines).
Do not assign to data, there is a function called that.
I don't follow, what is Sdati14400? and Sdati14400.coords? Please try 
without so many variables, simplify until you understand what is 
happening. longlat = FALSE, but below it is TRUE?
Warnings in CV search for bandwidths are not a problem, because the search 
algorithm will occasionally try unsuitable values, which get trapped, and 
the search restarted from the last valid value.
This places a Gaussian kernel over each point, but includes all points. In 
addition, you did want to fit over all your points, didn't you? You can do 
this if you like, but why?
I think the onus is on you to answer this, correct depends on what you 
need. I doubt whether this tells you very much. Also, plot pairs() of the 
local coefficients to see if you have induced local collinearity - see 
Wheeler & Tiefelsdorf (2005) referenced in the package help pages.
A formula is a formula, choose as you wish, but best with a substantive 
reasoning behind the choice of variable and its functional form.

Roger