An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20120504/fa6cdb18/attachment.pl>
Missing local R-squared and residuals in gwr output
10 messages · Maximilian Sproß, "Sproß, Johann", Roger Bivand
On Fri, 4 May 2012, Maximilian Spro? wrote:
Dear r-sig-geo list! I run gwr on a multi-node cluster(on 64 slots). In the gwr output (slot "SDF"), the gwr residuals and the local R-squared are missing. When performing the same model on the local machine, these components are included. Unfortunately, the calculation in this way takes about 5 days instead of few hours when using the cluster. Perhaps, that problem arises due to the argument "fit.points", which has to be passed if the local coefficient estimates should be made on a multi node cluster. Does anyone have an idea how to solve that problem with the missing local R-squared and residuals if the gwr is calculated on a cluster?
The understanding for use on a cluster was that the data points and the fit points are different, so there is no observed dependent variable at the fit point, hence no local R2. I've added logic in the code that checks for equality between the fit and data points, and this for me resolves the problem, but may break other things. I've committed to R-forge, project rspatial, module spgwr. The source tarball and binary packages should be available later this evening European time from: https://r-forge.r-project.org/R/?group_id=1014 Could you please try it out, and report back? I should also migrate spgwr from snow to parallel before I release it. Best wishes, Roger
Thank you very much in advance! Kind regards, Max selected R-code: ### gwr on local machine: gwr_50 <- gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR, data=hef, bandwidth=50, gweight=gwr.Gauss) # part of the str(gwr_50) output... List of 11 $ SDF :Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots .. ..@ data :'data.frame': 286288 obs. of 9 variables: .. .. ..$ sum.w : num [1:286288] 2009 2003 2091 2089 2086 ... .. .. ..$ (Intercept): num [1:286288] -28.7 -28.5 -29.9 -29.7 -29.5 ... .. .. ..$ elevation : num [1:286288] 0.0139 0.0138 0.014 0.014 0.014 ... .. .. ..$ sky : num [1:286288] -0.153 -0.155 -0.146 -0.148 -0.149 ... .. .. ..$ slope : num [1:286288] -2.58 -2.61 -2.42 -2.45 -2.48 ... .. .. ..$ solar : num [1:286288] -0.00139 -0.00136 -0.0015 -0.00147 -0.00144 ... .. .. ..$ gwr.e : num [1:286288] -0.461 -0.683 -0.5987 -0.2692 0.0406 ... .. .. ..$ pred : num [1:286288] 0.806 0.833 0.507 0.514 0.576 ... .. .. ..$ localR2 : num [1:286288] 0.621 0.618 0.638 0.635 0.632 ... ### gwr on cluster : cl <- makeCluster(32, type="MPI") coords <- coordinates(hef) gw <- gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR, data=hef, bandwidth=50, gweight=gwr.Gauss,fit.points=coords, hatmatrix=FALSE, cl=cl) # part of the str(gwr_50) output... List of 11 $ SDF :Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots .. ..@ data :'data.frame': 286288 obs. of 6 variables: .. .. ..$ sum.w : num [1:286288] 1 1 1 1 1 ... .. .. ..$ (Intercept): num [1:286288] 12541 1970 2057 -1505 -1030 ... .. .. ..$ elevation : num [1:286288] -3.891 -0.602 -0.738 0.465 0.309 ... .. .. ..$ sky : num [1:286288] -0.954 -0.425 3.714 0.159 0.152 ... .. .. ..$ slope : num [1:286288] 62.19 NA -27.21 1.95 16.03 ... .. .. ..$ solar : num [1:286288] NA NA NA NA 0.042 ... [[alternative HTML version deleted]]
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand Department of Economics, NHH Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no
1 day later
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20120507/7da8d82d/attachment.pl>
On Mon, 7 May 2012, Maximilian Spro? wrote:
Dear Roger! Thank you very much for your fast reply and work! I'm not really an expert in HPC-computing, but i will try to report as goog as i can. I updated spgwr and started a job on the cluster which takes normally 1,5 h. So far, it run for 5 hours, which indicates that the parallelization does not work efficient anymore. The function makeCluster(64, type="MPI") worked fine. Our cluster runs with openMPI.
Correct. I'll try to add back an option to use snow instead of parallel. When it reaches R-forge, its revision number will be > 1252. Roger
In that context, i found on the CRAN Task view: High-Performance and Parallel Computing with R the following: "<http://www.dict.cc/englisch-deutsch/parallelization.html>Direct support in R is starting with release 2.14.0 which includes a new package parallel incorporating (slightly revised) copies of packages multicore and snow (*but excluding MPI, PVM and NWS clusters*). Does the new parallel support works still in the openMPI environment? regards, Max fyi: sessionInfo() R version 2.14.0 (2011-10-31) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US LC_NUMERIC=C LC_TIME=en_US [4] LC_COLLATE=en_US LC_MONETARY=en_US LC_MESSAGES=en_US [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] spgwr_0.6-15 spdep_0.5-45 coda_0.14-6 deldir_0.0-16 [5] maptools_0.8-10 foreign_0.8-46 nlme_3.1-102 MASS_7.3-16 [9] Matrix_1.0-1 lattice_0.20-0 boot_1.3-3 gstat_1.0-10 [13] spacetime_0.5-7 xts_0.8-2 zoo_1.7-6 sp_0.9-98 [17] snow_0.3-8 Rmpi_0.5-9 loaded via a namespace (and not attached): [1] grid_2.14.0 On 05/05/2012 04:24 PM, Roger Bivand wrote:
On Fri, 4 May 2012, Maximilian Spro? wrote:
Dear r-sig-geo list! I run gwr on a multi-node cluster(on 64 slots). In the gwr output (slot "SDF"), the gwr residuals and the local R-squared are missing. When performing the same model on the local machine, these components are included. Unfortunately, the calculation in this way takes about 5 days instead of few hours when using the cluster. Perhaps, that problem arises due to the argument "fit.points", which has to be passed if the local coefficient estimates should be made on a multi node cluster. Does anyone have an idea how to solve that problem with the missing local R-squared and residuals if the gwr is calculated on a cluster?
The understanding for use on a cluster was that the data points and the fit points are different, so there is no observed dependent variable at the fit point, hence no local R2. I've added logic in the code that checks for equality between the fit and data points, and this for me resolves the problem, but may break other things. I've committed to R-forge, project rspatial, module spgwr. The source tarball and binary packages should be available later this evening European time from: https://r-forge.r-project.org/R/?group_id=1014 Could you please try it out, and report back? I should also migrate spgwr from snow to parallel before I release it. Best wishes, Roger
Thank you very much in advance!
Kind regards,
Max
selected R-code:
### gwr on local machine:
gwr_50 <-
gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR,
data=hef, bandwidth=50, gweight=gwr.Gauss)
# part of the str(gwr_50) output...
List of 11
$ SDF :Formal class 'SpatialPointsDataFrame' [package "sp"] with 5
slots
.. ..@ data :'data.frame': 286288 obs. of 9 variables:
.. .. ..$ sum.w : num [1:286288] 2009 2003 2091 2089 2086 ...
.. .. ..$ (Intercept): num [1:286288] -28.7 -28.5 -29.9 -29.7 -29.5 ...
.. .. ..$ elevation : num [1:286288] 0.0139 0.0138 0.014 0.014 0.014
...
.. .. ..$ sky : num [1:286288] -0.153 -0.155 -0.146 -0.148 -0.149
...
.. .. ..$ slope : num [1:286288] -2.58 -2.61 -2.42 -2.45 -2.48 ...
.. .. ..$ solar : num [1:286288] -0.00139 -0.00136 -0.0015 -0.00147
-0.00144 ...
.. .. ..$ gwr.e : num [1:286288] -0.461 -0.683 -0.5987 -0.2692
0.0406 ...
.. .. ..$ pred : num [1:286288] 0.806 0.833 0.507 0.514 0.576 ...
.. .. ..$ localR2 : num [1:286288] 0.621 0.618 0.638 0.635 0.632 ...
### gwr on cluster :
cl <- makeCluster(32, type="MPI")
coords <- coordinates(hef)
gw <-
gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR,
data=hef, bandwidth=50, gweight=gwr.Gauss,fit.points=coords,
hatmatrix=FALSE, cl=cl)
# part of the str(gwr_50) output...
List of 11
$ SDF :Formal class 'SpatialPointsDataFrame' [package "sp"] with 5
slots
.. ..@ data :'data.frame': 286288 obs. of 6 variables:
.. .. ..$ sum.w : num [1:286288] 1 1 1 1 1 ...
.. .. ..$ (Intercept): num [1:286288] 12541 1970 2057 -1505 -1030 ...
.. .. ..$ elevation : num [1:286288] -3.891 -0.602 -0.738 0.465 0.309
...
.. .. ..$ sky : num [1:286288] -0.954 -0.425 3.714 0.159 0.152
...
.. .. ..$ slope : num [1:286288] 62.19 NA -27.21 1.95 16.03 ...
.. .. ..$ solar : num [1:286288] NA NA NA NA 0.042 ...
[[alternative HTML version deleted]]
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand Department of Economics, NHH Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20120507/d5a95612/attachment.pl>
On Mon, 7 May 2012, "Spro?, Johann" wrote:
-- Mag. J. Maximilian Spro? Institute of Geography, University of Innsbruck Innrain 52 A-6020 INNSBRUCK Tel. +43 (0)512 507 5413 web: http://www.uibk.ac.at/geographie/projects/lidar/ -----Urspr?ngliche Nachricht----- Von: Roger Bivand [mailto:Roger.Bivand at nhh.no] Gesendet: Mo 07.05.2012 14:48 An: Maximilian Spro? Cc: r-sig-geo Betreff: Re: [R-sig-Geo] Missing local R-squared and residuals in gwr output On Mon, 7 May 2012, Maximilian Spro? wrote:
Dear Roger! Thank you very much for your fast reply and work! I'm not really an expert in HPC-computing, but i will try to report as goog as i can. I updated spgwr and started a job on the cluster which takes normally 1,5 h. So far, it run for 5 hours, which indicates that the parallelization does not work efficient anymore. The function makeCluster(64, type="MPI") worked fine. Our cluster runs with openMPI.
Correct. I'll try to add back an option to use snow instead of parallel. I tried out the new version but it seems still using parallel. code: gwr_50 <- gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR+factor(asp_fac), data=hef, bandwidth=50, gweight=gwr.Gauss,fit.points=coords, hatmatrix=FALSE, cl=cl)
Add use_snow=TRUE to the command to switch to snow. Roger
Loading required package: parallel Attaching package: 'parallel' The following object(s) are masked from 'package:snow': clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, clusterSplit, makeCluster, parApply, parCapply, parLapply, parRapply, parSapply, splitIndices, stopCluster Max When it reaches R-forge, its revision number will be > 1252. Roger
In that context, i found on the CRAN Task view: High-Performance and Parallel Computing with R the following: "<http://www.dict.cc/englisch-deutsch/parallelization.html>Direct support in R is starting with release 2.14.0 which includes a new package parallel incorporating (slightly revised) copies of packages multicore and snow (*but excluding MPI, PVM and NWS clusters*). Does the new parallel support works still in the openMPI environment? regards, Max fyi: sessionInfo() R version 2.14.0 (2011-10-31) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US LC_NUMERIC=C LC_TIME=en_US [4] LC_COLLATE=en_US LC_MONETARY=en_US LC_MESSAGES=en_US [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] spgwr_0.6-15 spdep_0.5-45 coda_0.14-6 deldir_0.0-16 [5] maptools_0.8-10 foreign_0.8-46 nlme_3.1-102 MASS_7.3-16 [9] Matrix_1.0-1 lattice_0.20-0 boot_1.3-3 gstat_1.0-10 [13] spacetime_0.5-7 xts_0.8-2 zoo_1.7-6 sp_0.9-98 [17] snow_0.3-8 Rmpi_0.5-9 loaded via a namespace (and not attached): [1] grid_2.14.0 On 05/05/2012 04:24 PM, Roger Bivand wrote:
On Fri, 4 May 2012, Maximilian Spro? wrote:
Dear r-sig-geo list! I run gwr on a multi-node cluster(on 64 slots). In the gwr output (slot "SDF"), the gwr residuals and the local R-squared are missing. When performing the same model on the local machine, these components are included. Unfortunately, the calculation in this way takes about 5 days instead of few hours when using the cluster. Perhaps, that problem arises due to the argument "fit.points", which has to be passed if the local coefficient estimates should be made on a multi node cluster. Does anyone have an idea how to solve that problem with the missing local R-squared and residuals if the gwr is calculated on a cluster?
The understanding for use on a cluster was that the data points and the fit points are different, so there is no observed dependent variable at the fit point, hence no local R2. I've added logic in the code that checks for equality between the fit and data points, and this for me resolves the problem, but may break other things. I've committed to R-forge, project rspatial, module spgwr. The source tarball and binary packages should be available later this evening European time from: https://r-forge.r-project.org/R/?group_id=1014 Could you please try it out, and report back? I should also migrate spgwr from snow to parallel before I release it. Best wishes, Roger
Thank you very much in advance!
Kind regards,
Max
selected R-code:
### gwr on local machine:
gwr_50 <-
gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR,
data=hef, bandwidth=50, gweight=gwr.Gauss)
# part of the str(gwr_50) output...
List of 11
$ SDF :Formal class 'SpatialPointsDataFrame' [package "sp"] with 5
slots
.. ..@ data :'data.frame': 286288 obs. of 9 variables:
.. .. ..$ sum.w : num [1:286288] 2009 2003 2091 2089 2086 ...
.. .. ..$ (Intercept): num [1:286288] -28.7 -28.5 -29.9 -29.7 -29.5 ...
.. .. ..$ elevation : num [1:286288] 0.0139 0.0138 0.014 0.014 0.014
...
.. .. ..$ sky : num [1:286288] -0.153 -0.155 -0.146 -0.148 -0.149
...
.. .. ..$ slope : num [1:286288] -2.58 -2.61 -2.42 -2.45 -2.48 ...
.. .. ..$ solar : num [1:286288] -0.00139 -0.00136 -0.0015 -0.00147
-0.00144 ...
.. .. ..$ gwr.e : num [1:286288] -0.461 -0.683 -0.5987 -0.2692
0.0406 ...
.. .. ..$ pred : num [1:286288] 0.806 0.833 0.507 0.514 0.576 ...
.. .. ..$ localR2 : num [1:286288] 0.621 0.618 0.638 0.635 0.632 ...
### gwr on cluster :
cl <- makeCluster(32, type="MPI")
coords <- coordinates(hef)
gw <-
gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR,
data=hef, bandwidth=50, gweight=gwr.Gauss,fit.points=coords,
hatmatrix=FALSE, cl=cl)
# part of the str(gwr_50) output...
List of 11
$ SDF :Formal class 'SpatialPointsDataFrame' [package "sp"] with 5
slots
.. ..@ data :'data.frame': 286288 obs. of 6 variables:
.. .. ..$ sum.w : num [1:286288] 1 1 1 1 1 ...
.. .. ..$ (Intercept): num [1:286288] 12541 1970 2057 -1505 -1030 ...
.. .. ..$ elevation : num [1:286288] -3.891 -0.602 -0.738 0.465 0.309
...
.. .. ..$ sky : num [1:286288] -0.954 -0.425 3.714 0.159 0.152
...
.. .. ..$ slope : num [1:286288] 62.19 NA -27.21 1.95 16.03 ...
.. .. ..$ solar : num [1:286288] NA NA NA NA 0.042 ...
[[alternative HTML version deleted]]
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
-- Roger Bivand Department of Economics, NHH Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no
Roger Bivand Department of Economics, NHH Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no
1 day later
Thank you Roger! The gwr on the MPI cluster works fine. However, now the output object includes the intially missing three data slots: "gwr.e","pred" and "localR2". Unfortunately, the latter contains only NA's. Sorry for of any inconvenience, but do you think you can solve that? Thanks in advance and all the best, Max
On 05/07/2012 08:45 PM, Roger Bivand wrote:
On Mon, 7 May 2012, "Spro?, Johann" wrote:
-- Mag. J. Maximilian Spro? Institute of Geography, University of Innsbruck Innrain 52 A-6020 INNSBRUCK Tel. +43 (0)512 507 5413 web: http://www.uibk.ac.at/geographie/projects/lidar/ -----Urspr?ngliche Nachricht----- Von: Roger Bivand [mailto:Roger.Bivand at nhh.no] Gesendet: Mo 07.05.2012 14:48 An: Maximilian Spro? Cc: r-sig-geo Betreff: Re: [R-sig-Geo] Missing local R-squared and residuals in gwr output On Mon, 7 May 2012, Maximilian Spro? wrote:
Dear Roger! Thank you very much for your fast reply and work! I'm not really an expert in HPC-computing, but i will try to report as goog as i can. I updated spgwr and started a job on the cluster which takes normally 1,5 h. So far, it run for 5 hours, which indicates that the parallelization does not work efficient anymore. The function makeCluster(64, type="MPI") worked fine. Our cluster runs with openMPI.
Correct. I'll try to add back an option to use snow instead of parallel. I tried out the new version but it seems still using parallel. code: gwr_50 <- gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR+factor(asp_fac), data=hef, bandwidth=50, gweight=gwr.Gauss,fit.points=coords, hatmatrix=FALSE, cl=cl)
Add use_snow=TRUE to the command to switch to snow. Roger
Loading required package: parallel Attaching package: 'parallel' The following object(s) are masked from 'package:snow': clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, clusterSplit, makeCluster, parApply, parCapply, parLapply, parRapply, parSapply, splitIndices, stopCluster Max When it reaches R-forge, its revision number will be > 1252. Roger
In that context, i found on the CRAN Task view: High-Performance and Parallel Computing with R the following: "<http://www.dict.cc/englisch-deutsch/parallelization.html>Direct support in R is starting with release 2.14.0 which includes a new package parallel incorporating (slightly revised) copies of packages multicore and snow (*but excluding MPI, PVM and NWS clusters*). Does the new parallel support works still in the openMPI environment? regards, Max fyi: sessionInfo() R version 2.14.0 (2011-10-31) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US LC_NUMERIC=C LC_TIME=en_US [4] LC_COLLATE=en_US LC_MONETARY=en_US LC_MESSAGES=en_US [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] spgwr_0.6-15 spdep_0.5-45 coda_0.14-6 deldir_0.0-16 [5] maptools_0.8-10 foreign_0.8-46 nlme_3.1-102 MASS_7.3-16 [9] Matrix_1.0-1 lattice_0.20-0 boot_1.3-3 gstat_1.0-10 [13] spacetime_0.5-7 xts_0.8-2 zoo_1.7-6 sp_0.9-98 [17] snow_0.3-8 Rmpi_0.5-9 loaded via a namespace (and not attached): [1] grid_2.14.0 On 05/05/2012 04:24 PM, Roger Bivand wrote:
On Fri, 4 May 2012, Maximilian Spro? wrote:
Dear r-sig-geo list! I run gwr on a multi-node cluster(on 64 slots). In the gwr output (slot "SDF"), the gwr residuals and the local R-squared are missing. When performing the same model on the local machine, these components are included. Unfortunately, the calculation in this way takes about 5 days instead of few hours when using the cluster. Perhaps, that problem arises due to the argument "fit.points", which has to be passed if the local coefficient estimates should be made on a multi node cluster. Does anyone have an idea how to solve that problem with the missing local R-squared and residuals if the gwr is calculated on a cluster?
The understanding for use on a cluster was that the data points and the fit points are different, so there is no observed dependent variable at the fit point, hence no local R2. I've added logic in the code that checks for equality between the fit and data points, and this for me resolves the problem, but may break other things. I've committed to R-forge, project rspatial, module spgwr. The source tarball and binary packages should be available later this evening European time from: https://r-forge.r-project.org/R/?group_id=1014 Could you please try it out, and report back? I should also migrate spgwr from snow to parallel before I release it. Best wishes, Roger
Thank you very much in advance!
Kind regards,
Max
selected R-code:
### gwr on local machine:
gwr_50 <-
gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR,
data=hef, bandwidth=50, gweight=gwr.Gauss)
# part of the str(gwr_50) output...
List of 11
$ SDF :Formal class 'SpatialPointsDataFrame' [package "sp"]
with 5
slots
.. ..@ data :'data.frame': 286288 obs. of 9 variables:
.. .. ..$ sum.w : num [1:286288] 2009 2003 2091 2089 2086 ...
.. .. ..$ (Intercept): num [1:286288] -28.7 -28.5 -29.9 -29.7
-29.5 ...
.. .. ..$ elevation : num [1:286288] 0.0139 0.0138 0.014 0.014
0.014
...
.. .. ..$ sky : num [1:286288] -0.153 -0.155 -0.146
-0.148 -0.149
...
.. .. ..$ slope : num [1:286288] -2.58 -2.61 -2.42 -2.45
-2.48 ...
.. .. ..$ solar : num [1:286288] -0.00139 -0.00136 -0.0015
-0.00147
-0.00144 ...
.. .. ..$ gwr.e : num [1:286288] -0.461 -0.683 -0.5987 -0.2692
0.0406 ...
.. .. ..$ pred : num [1:286288] 0.806 0.833 0.507 0.514
0.576 ...
.. .. ..$ localR2 : num [1:286288] 0.621 0.618 0.638 0.635
0.632 ...
### gwr on cluster :
cl <- makeCluster(32, type="MPI")
coords <- coordinates(hef)
gw <-
gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR,
data=hef, bandwidth=50, gweight=gwr.Gauss,fit.points=coords,
hatmatrix=FALSE, cl=cl)
# part of the str(gwr_50) output...
List of 11
$ SDF :Formal class 'SpatialPointsDataFrame' [package "sp"]
with 5
slots
.. ..@ data :'data.frame': 286288 obs. of 6 variables:
.. .. ..$ sum.w : num [1:286288] 1 1 1 1 1 ...
.. .. ..$ (Intercept): num [1:286288] 12541 1970 2057 -1505
-1030 ...
.. .. ..$ elevation : num [1:286288] -3.891 -0.602 -0.738 0.465
0.309
...
.. .. ..$ sky : num [1:286288] -0.954 -0.425 3.714 0.159
0.152
...
.. .. ..$ slope : num [1:286288] 62.19 NA -27.21 1.95 16.03
...
.. .. ..$ solar : num [1:286288] NA NA NA NA 0.042 ...
[[alternative HTML version deleted]]
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
-- Roger Bivand Department of Economics, NHH Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no
On Wed, 9 May 2012, Maximilian Spro? wrote:
Thank you Roger! The gwr on the MPI cluster works fine. However, now the output object includes the intially missing three data slots: "gwr.e","pred" and "localR2". Unfortunately, the latter contains only NA's. Sorry for of any inconvenience, but do you think you can solve that?
I do not see any problem there, and indeed it is after the results have been returned from the cluster. You can tell whether you have been into the code block starting on line 261 in spgwr/R/gwr.R if there is no line beginning with "postprocess_localR2" in the timings component of the output object. The conditions are: ((!fp.given || fit_are_data) && is.null(fittedGWRobject)) where the first is FALSE, the second TRUE and the third TRUE in your case. If the "pred" column in your output contains values that are not finite, this may happen in this code block. If you cannot see what is going on, we need a smaller test data set that replicates the problem. Roger
Thanks in advance and all the best, Max On 05/07/2012 08:45 PM, Roger Bivand wrote:
On Mon, 7 May 2012, "Spro?, Johann" wrote:
-- Mag. J. Maximilian Spro? Institute of Geography, University of Innsbruck Innrain 52 A-6020 INNSBRUCK Tel. +43 (0)512 507 5413 web: http://www.uibk.ac.at/geographie/projects/lidar/ -----Urspr?ngliche Nachricht----- Von: Roger Bivand [mailto:Roger.Bivand at nhh.no] Gesendet: Mo 07.05.2012 14:48 An: Maximilian Spro? Cc: r-sig-geo Betreff: Re: [R-sig-Geo] Missing local R-squared and residuals in gwr output On Mon, 7 May 2012, Maximilian Spro? wrote:
Dear Roger! Thank you very much for your fast reply and work! I'm not really an expert in HPC-computing, but i will try to report as goog as i can. I updated spgwr and started a job on the cluster which takes normally 1,5 h. So far, it run for 5 hours, which indicates that the parallelization does not work efficient anymore. The function makeCluster(64, type="MPI") worked fine. Our cluster runs with openMPI.
Correct. I'll try to add back an option to use snow instead of parallel. I tried out the new version but it seems still using parallel. code: gwr_50 <- gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR+factor(asp_fac), data=hef, bandwidth=50, gweight=gwr.Gauss,fit.points=coords, hatmatrix=FALSE, cl=cl)
Add use_snow=TRUE to the command to switch to snow. Roger
Loading required package: parallel Attaching package: 'parallel' The following object(s) are masked from 'package:snow': clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, clusterSplit, makeCluster, parApply, parCapply, parLapply, parRapply, parSapply, splitIndices, stopCluster Max When it reaches R-forge, its revision number will be > 1252. Roger
In that context, i found on the CRAN Task view: High-Performance and Parallel Computing with R the following: "<http://www.dict.cc/englisch-deutsch/parallelization.html>Direct support in R is starting with release 2.14.0 which includes a new package parallel incorporating (slightly revised) copies of packages multicore and snow (*but excluding MPI, PVM and NWS clusters*). Does the new parallel support works still in the openMPI environment? regards, Max fyi: sessionInfo() R version 2.14.0 (2011-10-31) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US LC_NUMERIC=C LC_TIME=en_US [4] LC_COLLATE=en_US LC_MONETARY=en_US LC_MESSAGES=en_US [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] spgwr_0.6-15 spdep_0.5-45 coda_0.14-6 deldir_0.0-16 [5] maptools_0.8-10 foreign_0.8-46 nlme_3.1-102 MASS_7.3-16 [9] Matrix_1.0-1 lattice_0.20-0 boot_1.3-3 gstat_1.0-10 [13] spacetime_0.5-7 xts_0.8-2 zoo_1.7-6 sp_0.9-98 [17] snow_0.3-8 Rmpi_0.5-9 loaded via a namespace (and not attached): [1] grid_2.14.0 On 05/05/2012 04:24 PM, Roger Bivand wrote:
On Fri, 4 May 2012, Maximilian Spro? wrote:
Dear r-sig-geo list! I run gwr on a multi-node cluster(on 64 slots). In the gwr output (slot "SDF"), the gwr residuals and the local R-squared are missing. When performing the same model on the local machine, these components are included. Unfortunately, the calculation in this way takes about 5 days instead of few hours when using the cluster. Perhaps, that problem arises due to the argument "fit.points", which has to be passed if the local coefficient estimates should be made on a multi node cluster. Does anyone have an idea how to solve that problem with the missing local R-squared and residuals if the gwr is calculated on a cluster?
The understanding for use on a cluster was that the data points and the fit points are different, so there is no observed dependent variable at the fit point, hence no local R2. I've added logic in the code that checks for equality between the fit and data points, and this for me resolves the problem, but may break other things. I've committed to R-forge, project rspatial, module spgwr. The source tarball and binary packages should be available later this evening European time from: https://r-forge.r-project.org/R/?group_id=1014 Could you please try it out, and report back? I should also migrate spgwr from snow to parallel before I release it. Best wishes, Roger
Thank you very much in advance!
Kind regards,
Max
selected R-code:
### gwr on local machine:
gwr_50 <-
gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR,
data=hef, bandwidth=50, gweight=gwr.Gauss)
# part of the str(gwr_50) output...
List of 11
$ SDF :Formal class 'SpatialPointsDataFrame' [package "sp"] with
5
slots
.. ..@ data :'data.frame': 286288 obs. of 9 variables:
.. .. ..$ sum.w : num [1:286288] 2009 2003 2091 2089 2086 ...
.. .. ..$ (Intercept): num [1:286288] -28.7 -28.5 -29.9 -29.7 -29.5
...
.. .. ..$ elevation : num [1:286288] 0.0139 0.0138 0.014 0.014 0.014
...
.. .. ..$ sky : num [1:286288] -0.153 -0.155 -0.146 -0.148
-0.149
...
.. .. ..$ slope : num [1:286288] -2.58 -2.61 -2.42 -2.45 -2.48
...
.. .. ..$ solar : num [1:286288] -0.00139 -0.00136 -0.0015
-0.00147
-0.00144 ...
.. .. ..$ gwr.e : num [1:286288] -0.461 -0.683 -0.5987 -0.2692
0.0406 ...
.. .. ..$ pred : num [1:286288] 0.806 0.833 0.507 0.514 0.576
...
.. .. ..$ localR2 : num [1:286288] 0.621 0.618 0.638 0.635 0.632
...
### gwr on cluster :
cl <- makeCluster(32, type="MPI")
coords <- coordinates(hef)
gw <-
gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR,
data=hef, bandwidth=50, gweight=gwr.Gauss,fit.points=coords,
hatmatrix=FALSE, cl=cl)
# part of the str(gwr_50) output...
List of 11
$ SDF :Formal class 'SpatialPointsDataFrame' [package "sp"] with
5
slots
.. ..@ data :'data.frame': 286288 obs. of 6 variables:
.. .. ..$ sum.w : num [1:286288] 1 1 1 1 1 ...
.. .. ..$ (Intercept): num [1:286288] 12541 1970 2057 -1505 -1030 ...
.. .. ..$ elevation : num [1:286288] -3.891 -0.602 -0.738 0.465
0.309
...
.. .. ..$ sky : num [1:286288] -0.954 -0.425 3.714 0.159 0.152
...
.. .. ..$ slope : num [1:286288] 62.19 NA -27.21 1.95 16.03 ...
.. .. ..$ solar : num [1:286288] NA NA NA NA 0.042 ...
[[alternative HTML version deleted]]
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
-- Roger Bivand Department of Economics, NHH Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no
Roger Bivand Department of Economics, NHH Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no
Dear Roger! Your are right, there are no problems anymore. I did some some comparative tests with a small subset of the dataset. The number of values in the "pred" column, which are not finite depends on the bandwidth. With increasing bandwidth, the NA's disappear. Unfortunately, I cannot compute gwr.sel due to the large data amount. By the way, all problems are solved and the use of the cluster is a really nice feature to decrease processing time efficiently. Thank you very much for your help! Max
On 05/09/2012 01:19 PM, Roger Bivand wrote:
On Wed, 9 May 2012, Maximilian Spro? wrote:
Thank you Roger! The gwr on the MPI cluster works fine. However, now the output object includes the intially missing three data slots: "gwr.e","pred" and "localR2". Unfortunately, the latter contains only NA's. Sorry for of any inconvenience, but do you think you can solve that?
I do not see any problem there, and indeed it is after the results have been returned from the cluster. You can tell whether you have been into the code block starting on line 261 in spgwr/R/gwr.R if there is no line beginning with "postprocess_localR2" in the timings component of the output object. The conditions are: ((!fp.given || fit_are_data) && is.null(fittedGWRobject)) where the first is FALSE, the second TRUE and the third TRUE in your case. If the "pred" column in your output contains values that are not finite, this may happen in this code block. If you cannot see what is going on, we need a smaller test data set that replicates the problem. Roger
Thanks in advance and all the best, Max On 05/07/2012 08:45 PM, Roger Bivand wrote:
On Mon, 7 May 2012, "Spro?, Johann" wrote:
-- Mag. J. Maximilian Spro? Institute of Geography, University of Innsbruck Innrain 52 A-6020 INNSBRUCK Tel. +43 (0)512 507 5413 web: http://www.uibk.ac.at/geographie/projects/lidar/ -----Urspr?ngliche Nachricht----- Von: Roger Bivand [mailto:Roger.Bivand at nhh.no] Gesendet: Mo 07.05.2012 14:48 An: Maximilian Spro? Cc: r-sig-geo Betreff: Re: [R-sig-Geo] Missing local R-squared and residuals in gwr output On Mon, 7 May 2012, Maximilian Spro? wrote:
Dear Roger! Thank you very much for your fast reply and work! I'm not really an expert in HPC-computing, but i will try to report as goog as i can. I updated spgwr and started a job on the cluster which takes normally 1,5 h. So far, it run for 5 hours, which indicates that the parallelization does not work efficient anymore. The function makeCluster(64, type="MPI") worked fine. Our cluster runs with openMPI.
Correct. I'll try to add back an option to use snow instead of parallel. I tried out the new version but it seems still using parallel. code: gwr_50 <- gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR+factor(asp_fac), data=hef, bandwidth=50, gweight=gwr.Gauss,fit.points=coords, hatmatrix=FALSE, cl=cl)
Add use_snow=TRUE to the command to switch to snow. Roger
Loading required package: parallel Attaching package: 'parallel' The following object(s) are masked from 'package:snow': clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, clusterSplit, makeCluster, parApply, parCapply, parLapply, parRapply, parSapply, splitIndices, stopCluster Max When it reaches R-forge, its revision number will be > 1252. Roger
In that context, i found on the CRAN Task view: High-Performance and Parallel Computing with R the following: "<http://www.dict.cc/englisch-deutsch/parallelization.html>Direct support in R is starting with release 2.14.0 which includes a new package parallel incorporating (slightly revised) copies of packages multicore and snow (*but excluding MPI, PVM and NWS clusters*). Does the new parallel support works still in the openMPI environment? regards, Max fyi: sessionInfo() R version 2.14.0 (2011-10-31) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US LC_NUMERIC=C LC_TIME=en_US [4] LC_COLLATE=en_US LC_MONETARY=en_US LC_MESSAGES=en_US [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] spgwr_0.6-15 spdep_0.5-45 coda_0.14-6 deldir_0.0-16 [5] maptools_0.8-10 foreign_0.8-46 nlme_3.1-102 MASS_7.3-16 [9] Matrix_1.0-1 lattice_0.20-0 boot_1.3-3 gstat_1.0-10 [13] spacetime_0.5-7 xts_0.8-2 zoo_1.7-6 sp_0.9-98 [17] snow_0.3-8 Rmpi_0.5-9 loaded via a namespace (and not attached): [1] grid_2.14.0 On 05/05/2012 04:24 PM, Roger Bivand wrote:
On Fri, 4 May 2012, Maximilian Spro? wrote:
Dear r-sig-geo list! I run gwr on a multi-node cluster(on 64 slots). In the gwr output (slot "SDF"), the gwr residuals and the local R-squared are missing. When performing the same model on the local machine, these components are included. Unfortunately, the calculation in this way takes about 5 days instead of few hours when using the cluster. Perhaps, that problem arises due to the argument "fit.points", which has to be passed if the local coefficient estimates should be made on a multi node cluster. Does anyone have an idea how to solve that problem with the missing local R-squared and residuals if the gwr is calculated on a cluster?
The understanding for use on a cluster was that the data points and the fit points are different, so there is no observed dependent variable at the fit point, hence no local R2. I've added logic in the code that checks for equality between the fit and data points, and this for me resolves the problem, but may break other things. I've committed to R-forge, project rspatial, module spgwr. The source tarball and binary packages should be available later this evening European time from: https://r-forge.r-project.org/R/?group_id=1014 Could you please try it out, and report back? I should also migrate spgwr from snow to parallel before I release it. Best wishes, Roger
Thank you very much in advance!
Kind regards,
Max
selected R-code:
### gwr on local machine:
gwr_50 <-
gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR,
data=hef, bandwidth=50, gweight=gwr.Gauss)
# part of the str(gwr_50) output...
List of 11
$ SDF :Formal class 'SpatialPointsDataFrame' [package
"sp"] with 5
slots
.. ..@ data :'data.frame': 286288 obs. of 9 variables:
.. .. ..$ sum.w : num [1:286288] 2009 2003 2091 2089 2086
...
.. .. ..$ (Intercept): num [1:286288] -28.7 -28.5 -29.9 -29.7
-29.5 ...
.. .. ..$ elevation : num [1:286288] 0.0139 0.0138 0.014
0.014 0.014
...
.. .. ..$ sky : num [1:286288] -0.153 -0.155 -0.146
-0.148 -0.149
...
.. .. ..$ slope : num [1:286288] -2.58 -2.61 -2.42 -2.45
-2.48 ...
.. .. ..$ solar : num [1:286288] -0.00139 -0.00136
-0.0015 -0.00147
-0.00144 ...
.. .. ..$ gwr.e : num [1:286288] -0.461 -0.683 -0.5987
-0.2692
0.0406 ...
.. .. ..$ pred : num [1:286288] 0.806 0.833 0.507 0.514
0.576 ...
.. .. ..$ localR2 : num [1:286288] 0.621 0.618 0.638 0.635
0.632 ...
### gwr on cluster :
cl <- makeCluster(32, type="MPI")
coords <- coordinates(hef)
gw <-
gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR,
data=hef, bandwidth=50, gweight=gwr.Gauss,fit.points=coords,
hatmatrix=FALSE, cl=cl)
# part of the str(gwr_50) output...
List of 11
$ SDF :Formal class 'SpatialPointsDataFrame' [package
"sp"] with 5
slots
.. ..@ data :'data.frame': 286288 obs. of 6 variables:
.. .. ..$ sum.w : num [1:286288] 1 1 1 1 1 ...
.. .. ..$ (Intercept): num [1:286288] 12541 1970 2057 -1505
-1030 ...
.. .. ..$ elevation : num [1:286288] -3.891 -0.602 -0.738
0.465 0.309
...
.. .. ..$ sky : num [1:286288] -0.954 -0.425 3.714
0.159 0.152
...
.. .. ..$ slope : num [1:286288] 62.19 NA -27.21 1.95
16.03 ...
.. .. ..$ solar : num [1:286288] NA NA NA NA 0.042 ...
[[alternative HTML version deleted]]
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
-- Roger Bivand Department of Economics, NHH Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no
On Thu, 10 May 2012, Maximilian Spro? wrote:
Dear Roger! Your are right, there are no problems anymore. I did some some comparative tests with a small subset of the dataset. The number of values in the "pred" column, which are not finite depends on the bandwidth. With increasing bandwidth, the NA's disappear. Unfortunately, I cannot compute gwr.sel due to the large data amount. By the way, all problems are solved and the use of the cluster is a really nice feature to decrease processing time efficiently.
Thanks for checking and reporting back. I'll release to CRAN shortly. Best wishes, Roger
Thank you very much for your help! Max On 05/09/2012 01:19 PM, Roger Bivand wrote:
On Wed, 9 May 2012, Maximilian Spro? wrote:
Thank you Roger! The gwr on the MPI cluster works fine. However, now the output object includes the intially missing three data slots: "gwr.e","pred" and "localR2". Unfortunately, the latter contains only NA's. Sorry for of any inconvenience, but do you think you can solve that?
I do not see any problem there, and indeed it is after the results have been returned from the cluster. You can tell whether you have been into the code block starting on line 261 in spgwr/R/gwr.R if there is no line beginning with "postprocess_localR2" in the timings component of the output object. The conditions are: ((!fp.given || fit_are_data) && is.null(fittedGWRobject)) where the first is FALSE, the second TRUE and the third TRUE in your case. If the "pred" column in your output contains values that are not finite, this may happen in this code block. If you cannot see what is going on, we need a smaller test data set that replicates the problem. Roger
Thanks in advance and all the best, Max On 05/07/2012 08:45 PM, Roger Bivand wrote:
On Mon, 7 May 2012, "Spro?, Johann" wrote:
-- Mag. J. Maximilian Spro? Institute of Geography, University of Innsbruck Innrain 52 A-6020 INNSBRUCK Tel. +43 (0)512 507 5413 web: http://www.uibk.ac.at/geographie/projects/lidar/ -----Urspr?ngliche Nachricht----- Von: Roger Bivand [mailto:Roger.Bivand at nhh.no] Gesendet: Mo 07.05.2012 14:48 An: Maximilian Spro? Cc: r-sig-geo Betreff: Re: [R-sig-Geo] Missing local R-squared and residuals in gwr output On Mon, 7 May 2012, Maximilian Spro? wrote:
Dear Roger! Thank you very much for your fast reply and work! I'm not really an expert in HPC-computing, but i will try to report as goog as i can. I updated spgwr and started a job on the cluster which takes normally 1,5 h. So far, it run for 5 hours, which indicates that the parallelization does not work efficient anymore. The function makeCluster(64, type="MPI") worked fine. Our cluster runs with openMPI.
Correct. I'll try to add back an option to use snow instead of parallel. I tried out the new version but it seems still using parallel. code: gwr_50 <- gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR+factor(asp_fac), data=hef, bandwidth=50, gweight=gwr.Gauss,fit.points=coords, hatmatrix=FALSE, cl=cl)
Add use_snow=TRUE to the command to switch to snow. Roger
Loading required package: parallel Attaching package: 'parallel' The following object(s) are masked from 'package:snow': clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, clusterSplit, makeCluster, parApply, parCapply, parLapply, parRapply, parSapply, splitIndices, stopCluster Max When it reaches R-forge, its revision number will be > 1252. Roger
In that context, i found on the CRAN Task view: High-Performance and Parallel Computing with R the following: "<http://www.dict.cc/englisch-deutsch/parallelization.html>Direct support in R is starting with release 2.14.0 which includes a new package parallel incorporating (slightly revised) copies of packages multicore and snow (*but excluding MPI, PVM and NWS clusters*). Does the new parallel support works still in the openMPI environment? regards, Max fyi: sessionInfo() R version 2.14.0 (2011-10-31) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US LC_NUMERIC=C LC_TIME=en_US [4] LC_COLLATE=en_US LC_MONETARY=en_US LC_MESSAGES=en_US [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] spgwr_0.6-15 spdep_0.5-45 coda_0.14-6 deldir_0.0-16 [5] maptools_0.8-10 foreign_0.8-46 nlme_3.1-102 MASS_7.3-16 [9] Matrix_1.0-1 lattice_0.20-0 boot_1.3-3 gstat_1.0-10 [13] spacetime_0.5-7 xts_0.8-2 zoo_1.7-6 sp_0.9-98 [17] snow_0.3-8 Rmpi_0.5-9 loaded via a namespace (and not attached): [1] grid_2.14.0 On 05/05/2012 04:24 PM, Roger Bivand wrote:
On Fri, 4 May 2012, Maximilian Spro? wrote:
Dear r-sig-geo list! I run gwr on a multi-node cluster(on 64 slots). In the gwr output (slot "SDF"), the gwr residuals and the local R-squared are missing. When performing the same model on the local machine, these components are included. Unfortunately, the calculation in this way takes about 5 days instead of few hours when using the cluster. Perhaps, that problem arises due to the argument "fit.points", which has to be passed if the local coefficient estimates should be made on a multi node cluster. Does anyone have an idea how to solve that problem with the missing local R-squared and residuals if the gwr is calculated on a cluster?
The understanding for use on a cluster was that the data points and the fit points are different, so there is no observed dependent variable at the fit point, hence no local R2. I've added logic in the code that checks for equality between the fit and data points, and this for me resolves the problem, but may break other things. I've committed to R-forge, project rspatial, module spgwr. The source tarball and binary packages should be available later this evening European time from: https://r-forge.r-project.org/R/?group_id=1014 Could you please try it out, and report back? I should also migrate spgwr from snow to parallel before I release it. Best wishes, Roger
Thank you very much in advance!
Kind regards,
Max
selected R-code:
### gwr on local machine:
gwr_50 <-
gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR,
data=hef, bandwidth=50, gweight=gwr.Gauss)
# part of the str(gwr_50) output...
List of 11
$ SDF :Formal class 'SpatialPointsDataFrame' [package "sp"]
with 5
slots
.. ..@ data :'data.frame': 286288 obs. of 9 variables:
.. .. ..$ sum.w : num [1:286288] 2009 2003 2091 2089 2086 ...
.. .. ..$ (Intercept): num [1:286288] -28.7 -28.5 -29.9 -29.7 -29.5
...
.. .. ..$ elevation : num [1:286288] 0.0139 0.0138 0.014 0.014
0.014
...
.. .. ..$ sky : num [1:286288] -0.153 -0.155 -0.146 -0.148
-0.149
...
.. .. ..$ slope : num [1:286288] -2.58 -2.61 -2.42 -2.45 -2.48
...
.. .. ..$ solar : num [1:286288] -0.00139 -0.00136 -0.0015
-0.00147
-0.00144 ...
.. .. ..$ gwr.e : num [1:286288] -0.461 -0.683 -0.5987 -0.2692
0.0406 ...
.. .. ..$ pred : num [1:286288] 0.806 0.833 0.507 0.514 0.576
...
.. .. ..$ localR2 : num [1:286288] 0.621 0.618 0.638 0.635 0.632
...
### gwr on cluster :
cl <- makeCluster(32, type="MPI")
coords <- coordinates(hef)
gw <-
gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR,
data=hef, bandwidth=50, gweight=gwr.Gauss,fit.points=coords,
hatmatrix=FALSE, cl=cl)
# part of the str(gwr_50) output...
List of 11
$ SDF :Formal class 'SpatialPointsDataFrame' [package "sp"]
with 5
slots
.. ..@ data :'data.frame': 286288 obs. of 6 variables:
.. .. ..$ sum.w : num [1:286288] 1 1 1 1 1 ...
.. .. ..$ (Intercept): num [1:286288] 12541 1970 2057 -1505 -1030
...
.. .. ..$ elevation : num [1:286288] -3.891 -0.602 -0.738 0.465
0.309
...
.. .. ..$ sky : num [1:286288] -0.954 -0.425 3.714 0.159
0.152
...
.. .. ..$ slope : num [1:286288] 62.19 NA -27.21 1.95 16.03
...
.. .. ..$ solar : num [1:286288] NA NA NA NA 0.042 ...
[[alternative HTML version deleted]]
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
-- Roger Bivand Department of Economics, NHH Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no
Roger Bivand Department of Economics, NHH Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no