Missing local R-squared and residuals in gwr output
On Mon, 7 May 2012, "Spro?, Johann" wrote:
-- Mag. J. Maximilian Spro? Institute of Geography, University of Innsbruck Innrain 52 A-6020 INNSBRUCK Tel. +43 (0)512 507 5413 web: http://www.uibk.ac.at/geographie/projects/lidar/ -----Urspr?ngliche Nachricht----- Von: Roger Bivand [mailto:Roger.Bivand at nhh.no] Gesendet: Mo 07.05.2012 14:48 An: Maximilian Spro? Cc: r-sig-geo Betreff: Re: [R-sig-Geo] Missing local R-squared and residuals in gwr output On Mon, 7 May 2012, Maximilian Spro? wrote:
Dear Roger! Thank you very much for your fast reply and work! I'm not really an expert in HPC-computing, but i will try to report as goog as i can. I updated spgwr and started a job on the cluster which takes normally 1,5 h. So far, it run for 5 hours, which indicates that the parallelization does not work efficient anymore. The function makeCluster(64, type="MPI") worked fine. Our cluster runs with openMPI.
Correct. I'll try to add back an option to use snow instead of parallel. I tried out the new version but it seems still using parallel. code: gwr_50 <- gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR+factor(asp_fac), data=hef, bandwidth=50, gweight=gwr.Gauss,fit.points=coords, hatmatrix=FALSE, cl=cl)
Add use_snow=TRUE to the command to switch to snow. Roger
Loading required package: parallel Attaching package: 'parallel' The following object(s) are masked from 'package:snow': clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, clusterSplit, makeCluster, parApply, parCapply, parLapply, parRapply, parSapply, splitIndices, stopCluster Max When it reaches R-forge, its revision number will be > 1252. Roger
In that context, i found on the CRAN Task view: High-Performance and Parallel Computing with R the following: "<http://www.dict.cc/englisch-deutsch/parallelization.html>Direct support in R is starting with release 2.14.0 which includes a new package parallel incorporating (slightly revised) copies of packages multicore and snow (*but excluding MPI, PVM and NWS clusters*). Does the new parallel support works still in the openMPI environment? regards, Max fyi: sessionInfo() R version 2.14.0 (2011-10-31) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US LC_NUMERIC=C LC_TIME=en_US [4] LC_COLLATE=en_US LC_MONETARY=en_US LC_MESSAGES=en_US [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] spgwr_0.6-15 spdep_0.5-45 coda_0.14-6 deldir_0.0-16 [5] maptools_0.8-10 foreign_0.8-46 nlme_3.1-102 MASS_7.3-16 [9] Matrix_1.0-1 lattice_0.20-0 boot_1.3-3 gstat_1.0-10 [13] spacetime_0.5-7 xts_0.8-2 zoo_1.7-6 sp_0.9-98 [17] snow_0.3-8 Rmpi_0.5-9 loaded via a namespace (and not attached): [1] grid_2.14.0 On 05/05/2012 04:24 PM, Roger Bivand wrote:
On Fri, 4 May 2012, Maximilian Spro? wrote:
Dear r-sig-geo list! I run gwr on a multi-node cluster(on 64 slots). In the gwr output (slot "SDF"), the gwr residuals and the local R-squared are missing. When performing the same model on the local machine, these components are included. Unfortunately, the calculation in this way takes about 5 days instead of few hours when using the cluster. Perhaps, that problem arises due to the argument "fit.points", which has to be passed if the local coefficient estimates should be made on a multi node cluster. Does anyone have an idea how to solve that problem with the missing local R-squared and residuals if the gwr is calculated on a cluster?
The understanding for use on a cluster was that the data points and the fit points are different, so there is no observed dependent variable at the fit point, hence no local R2. I've added logic in the code that checks for equality between the fit and data points, and this for me resolves the problem, but may break other things. I've committed to R-forge, project rspatial, module spgwr. The source tarball and binary packages should be available later this evening European time from: https://r-forge.r-project.org/R/?group_id=1014 Could you please try it out, and report back? I should also migrate spgwr from snow to parallel before I release it. Best wishes, Roger
Thank you very much in advance!
Kind regards,
Max
selected R-code:
### gwr on local machine:
gwr_50 <-
gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR,
data=hef, bandwidth=50, gweight=gwr.Gauss)
# part of the str(gwr_50) output...
List of 11
$ SDF :Formal class 'SpatialPointsDataFrame' [package "sp"] with 5
slots
.. ..@ data :'data.frame': 286288 obs. of 9 variables:
.. .. ..$ sum.w : num [1:286288] 2009 2003 2091 2089 2086 ...
.. .. ..$ (Intercept): num [1:286288] -28.7 -28.5 -29.9 -29.7 -29.5 ...
.. .. ..$ elevation : num [1:286288] 0.0139 0.0138 0.014 0.014 0.014
...
.. .. ..$ sky : num [1:286288] -0.153 -0.155 -0.146 -0.148 -0.149
...
.. .. ..$ slope : num [1:286288] -2.58 -2.61 -2.42 -2.45 -2.48 ...
.. .. ..$ solar : num [1:286288] -0.00139 -0.00136 -0.0015 -0.00147
-0.00144 ...
.. .. ..$ gwr.e : num [1:286288] -0.461 -0.683 -0.5987 -0.2692
0.0406 ...
.. .. ..$ pred : num [1:286288] 0.806 0.833 0.507 0.514 0.576 ...
.. .. ..$ localR2 : num [1:286288] 0.621 0.618 0.638 0.635 0.632 ...
### gwr on cluster :
cl <- makeCluster(32, type="MPI")
coords <- coordinates(hef)
gw <-
gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR,
data=hef, bandwidth=50, gweight=gwr.Gauss,fit.points=coords,
hatmatrix=FALSE, cl=cl)
# part of the str(gwr_50) output...
List of 11
$ SDF :Formal class 'SpatialPointsDataFrame' [package "sp"] with 5
slots
.. ..@ data :'data.frame': 286288 obs. of 6 variables:
.. .. ..$ sum.w : num [1:286288] 1 1 1 1 1 ...
.. .. ..$ (Intercept): num [1:286288] 12541 1970 2057 -1505 -1030 ...
.. .. ..$ elevation : num [1:286288] -3.891 -0.602 -0.738 0.465 0.309
...
.. .. ..$ sky : num [1:286288] -0.954 -0.425 3.714 0.159 0.152
...
.. .. ..$ slope : num [1:286288] 62.19 NA -27.21 1.95 16.03 ...
.. .. ..$ solar : num [1:286288] NA NA NA NA 0.042 ...
[[alternative HTML version deleted]]
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
-- Roger Bivand Department of Economics, NHH Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no
Roger Bivand Department of Economics, NHH Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no