An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20080121/a24f0426/attachment.pl>
ggwr and memory problems
6 messages · Luca Moiana, Roger Bivand
On Mon, 21 Jan 2008, Luca Moiana wrote:
Dear List, Here is my problem: I wanna run a ggwr on a 9000 records Spatial Points Data Frame using R on a Windows Machine (Dual processor, 4 GB RAM).
Have you tuned Windows memory use as discussed in the R for Windows FAQ - section 2.9? The binaries are 32-bit, and need to be told how much memory to use when trying to carry out memory intensive work.
When I try to calculate bandwidth using: Sdati14400test.sel <- ggwr.sel(E14400 ~ V211 + V213 + V240 + V313 + V321 + V322 + V331511 + LnMPI25l.max + B:A, family = poisson(link = log), data = Sdati14400test, coords=Sdati14400test.coords, adapt = FALSE, gweight = gwr.gauss, verbose = TRUE, longlat = FALSE) I get a memory allocation error saying that the software is not able to allocate a 749 Mb memory. Any suggestion??
It isn't strictly necessary to use all the observations to find the bandwidth - take a couple of 5% samples and see if the results differ much.
I can also switch and use the same machine with a 64bit Ubuntu SO.
You can try that, but consider dividing the fit.points up into chunks, and running several R processes when actually fitting the ggwr model. The data points stay the same, but fit subsets of the fit.points in separate processes. ggwr() has not (yet) been adapted for using a cluster, but gwr() has and a snow socket cluster will run happily on Linux there, and since it is run within the function, it concatenates the results before returning. If this would be useful of ggwr(), consider taking a look at the code. Roger
THANK A LOT Luca Moiana
_________________________________________________________________ [[alternative HTML version deleted]] _______________________________________________ R-sig-Geo mailing list R-sig-Geo at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20080121/c4ec1aaf/attachment.pl>
On Mon, 21 Jan 2008, Luca Moiana wrote:
Date: Mon, 21 Jan 2008 14:38:18 +0100 From: Roger.Bivand at nhh.no To: luca_moiana at hotmail.com CC: r-sig-geo at stat.math.ethz.ch Subject: Re: [R-sig-Geo] ggwr and memory problems On Mon, 21 Jan 2008, Luca Moiana wrote:
Dear List, Here is my problem: I wanna run a ggwr on a 9000 records Spatial Points Data Frame using R on a Windows Machine (Dual processor, 4 GB RAM).
Have you tuned Windows memory use as discussed in the R for Windows FAQ - section 2.9? The binaries are 32-bit, and need to be told how much memory to use when trying to carry out memory intensive work.
We tried this but didn't change anything.
OK. It may run on Linux, because the memory allocation there accepts many small free patches but Windows wants a single free chunk the size of the request.
When I try to calculate bandwidth using: Sdati14400test.sel <- ggwr.sel(E14400 ~ V211 + V213 + V240 + V313 + V321 + V322 + V331511 + LnMPI25l.max + B:A, family = poisson(link = log), data = Sdati14400test, coords=Sdati14400test.coords, adapt = FALSE, gweight = gwr.gauss, verbose = TRUE, longlat = FALSE) I get a memory allocation error saying that the software is not able to allocate a 749 Mb memory. Any suggestion??
It isn't strictly necessary to use all the observations to find the bandwidth - take a couple of 5% samples and see if the results differ much.
I didn't know that and I would try, but then I'll have memory problems when I try to run ggwr?? Is there a command to obtain a random 5% sample??
Try subsetting the data= argument object: df[o,] with the output of o <- sample(). Remember to say set.seed(whatever) to be able to repeat if need be.
I can also switch and use the same machine with a 64bit Ubuntu SO.
You can try that, but consider dividing the fit.points up into chunks, and running several R processes when actually fitting the ggwr model. The data points stay the same, but fit subsets of the fit.points in separate processes.
I don't have fit.points cause I'm working on the entire Lombardy Region (Northern Italy) and I'd like to compare the model from ggwr with glm models a colleague obtained from a regular glm.
If no fit.points are given, the data points are copied across as fit points internally. You are free to subset the data.points into many fit.points, and concatenate the output objects afterwards. This should remove the difficulty. Roger
MANY THANKS
ggwr() has not (yet) been adapted for using a cluster, but gwr() has and a snow socket cluster will run happily on Linux there, and since it is run within the function, it concatenates the results before returning. If this would be useful of ggwr(), consider taking a look at the code. Roger
THANK A LOT Luca Moiana
_________________________________________________________________ [[alternative HTML version deleted]] _______________________________________________ R-sig-Geo mailing list R-sig-Geo at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo
-- Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no
_________________________________________________________________ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20080122/c2ea75e3/attachment.pl>
On Tue, 22 Jan 2008, Luca Moiana wrote:
Hello Everyone,
(Please use a better email client, one that only uses plain text, does not use HTML, and does not break lines in the wrong places or add empty lines).
Following yesterday?s suggestions I wrote this code: ##Creation of Spatial Points Data Frame x <- as.matrix(subsample$E) y <- as.matrix (subsample$N) S <- SpatialPoints (cbind(x,y)) S <- SpatialPoints (list (x,y)) S <- SpatialPoints (data.frame (x,y)) data <- (subsample)
Do not assign to data, there is a function called that.
Sdati14400 <- SpatialPointsDataFrame(S, data) ##Random sample for bandwidth (5%) subsample <- dati14400 [sample(1:nrow(dati14400), 488, replace=F),] ##Bandwidth value Sdati14400test.sel <- ggwr.sel(E14400 ~ V211 + V213 + V240 + V313 + V321 + V322 + V331511 + LnMPI25l_max + B:A, family = poisson(link = log), data = Sdati14400, coords=Sdati14400.coords, adapt = FALSE, gweight = gwr.gauss, verbose = TRUE, longlat = FALSE)
I don't follow, what is Sdati14400? and Sdati14400.coords? Please try without so many variables, simplify until you understand what is happening. longlat = FALSE, but below it is TRUE?
##GGwr Sdati14400.ggwr <- ggwr(E14400 ~ V211 + V213 + V313 + V321 + V322 + V331511 + LnMPI25l_max + B:A, data = Sdati14400, coords=Sdati14400 at coords, bandwidth=Sdati14400test.sel, gweight = gwr.gauss, adapt = 1, family = poisson(link = log), longlat = TRUE) Form the Bandwidth calculation I got this message: Warning in glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart, : fitted rates numerically 0 occurred
Warnings in CV search for bandwidths are not a problem, because the search algorithm will occasionally try unsuitable values, which get trapped, and the search restarted from the last valid value.
Skipped and calculated ggwr to get to this results: Call: ggwr(formula = E14400 ~ V211 + V213 + V313 + V321 + V322 + V331511 + LnMPI25l_max + B:A, data = Sdati14400, coords = Sdati14400 at coords, bandwidth = Sdati14400test.sel, gweight = gwr.gauss, adapt = 1,
This places a Gaussian kernel over each point, but includes all points. In addition, you did want to fit over all your points, didn't you? You can do this if you like, but why?
family = poisson(link = log), longlat = TRUE)
Kernel function: gwr.gauss
Adaptive quantile: 1 (about 488 of 488)
Summary of GWR coefficient estimates:
Min. 1st Qu. Median 3rd Qu. Max. Global
X.Intercept. -8.0040 -6.8270 -6.5200 -6.3300 -5.9980 -6.6016
V211 -3.5370 -2.9440 -2.6340 -2.3250 -1.9590 -2.6024
V213 -212.0000 -203.8000 -199.3000 -193.1000 -177.1000 -198.6228
V313 0.1216 0.2915 0.3675 0.4515 0.6626 0.3766
V321 -5.3780 -4.7580 -4.3820 -4.0840 -3.4480 -4.3489
V322 -24.1100 -22.7300 -22.0400 -21.4800 -20.8800 -21.9145
V331511 -110.8000 -92.7700 -70.7300 -56.5300 -49.0700 -68.8769
LnMPI25l_max 0.3357 0.3532 0.3673 0.3850 0.4546 0.3709
B.A 5.3070 5.8140 6.2040 6.4940 6.9850 6.1363
Is that correct or you have other suggestions???
I think the onus is on you to answer this, correct depends on what you need. I doubt whether this tells you very much. Also, plot pairs() of the local coefficients to see if you have induced local collinearity - see Wheeler & Tiefelsdorf (2005) referenced in the package help pages.
Other question, I used variables, coming from a colleague GLM analysis, any suggestions on how to choose the variables and use directly ggwr??
A formula is a formula, choose as you wish, but best with a substantive reasoning behind the choice of variable and its functional form. Roger
THANKS A LOT Luca Moiana PhD Candidate ? Enrivornmental Science Department University of Milan-Bicocca
Date: Mon, 21 Jan 2008 15:11:29 +0100 From: Roger.Bivand at nhh.no To: luca_moiana at hotmail.com CC: r-sig-geo at stat.math.ethz.ch Subject: RE: [R-sig-Geo] ggwr and memory problems On Mon, 21 Jan 2008, Luca Moiana wrote:
Date: Mon, 21 Jan 2008 14:38:18 +0100 From: Roger.Bivand at nhh.no To: luca_moiana at hotmail.com CC: r-sig-geo at stat.math.ethz.ch Subject: Re: [R-sig-Geo] ggwr and memory problems On Mon, 21 Jan 2008, Luca Moiana wrote:
Dear List, Here is my problem: I wanna run a ggwr on a 9000 records Spatial Points Data Frame using R on a Windows Machine (Dual processor, 4 GB RAM).
Have you tuned Windows memory use as discussed in the R for Windows FAQ - section 2.9? The binaries are 32-bit, and need to be told how much memory to use when trying to carry out memory intensive work.
We tried this but didn't change anything.
OK. It may run on Linux, because the memory allocation there accepts many small free patches but Windows wants a single free chunk the size of the request.
When I try to calculate bandwidth using: Sdati14400test.sel <- ggwr.sel(E14400 ~ V211 + V213 + V240 + V313 + V321 + V322 + V331511 + LnMPI25l.max + B:A, family = poisson(link = log), data = Sdati14400test, coords=Sdati14400test.coords, adapt = FALSE, gweight = gwr.gauss, verbose = TRUE, longlat = FALSE) I get a memory allocation error saying that the software is not able to allocate a 749 Mb memory. Any suggestion??
It isn't strictly necessary to use all the observations to find the bandwidth - take a couple of 5% samples and see if the results differ much.
I didn't know that and I would try, but then I'll have memory problems when I try to run ggwr?? Is there a command to obtain a random 5% sample??
Try subsetting the data= argument object: df[o,] with the output of o <- sample(). Remember to say set.seed(whatever) to be able to repeat if need be.
I can also switch and use the same machine with a 64bit Ubuntu SO.
You can try that, but consider dividing the fit.points up into chunks, and running several R processes when actually fitting the ggwr model. The data points stay the same, but fit subsets of the fit.points in separate processes.
I don't have fit.points cause I'm working on the entire Lombardy Region (Northern Italy) and I'd like to compare the model from ggwr with glm models a colleague obtained from a regular glm.
If no fit.points are given, the data points are copied across as fit points internally. You are free to subset the data.points into many fit.points, and concatenate the output objects afterwards. This should remove the difficulty. Roger
MANY THANKS
ggwr() has not (yet) been adapted for using a cluster, but gwr() has and a snow socket cluster will run happily on Linux there, and since it is run within the function, it concatenates the results before returning. If this would be useful of ggwr(), consider taking a look at the code. Roger
THANK A LOT Luca Moiana
_________________________________________________________________ [[alternative HTML version deleted]] _______________________________________________ R-sig-Geo mailing list R-sig-Geo at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo
-- Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no
_________________________________________________________________ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
-- Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no
_________________________________________________________________ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no