Missing local R-squared and residuals in gwr output

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20120504/fa6cdb18/attachment.pl>

Dear r-sig-geo list!

I run gwr on a multi-node cluster(on 64 slots). In the gwr output (slot
"SDF"), the gwr residuals and the local R-squared are missing. When
performing the same model on the local machine, these components are
included. Unfortunately, the calculation in this way takes about 5 days
instead of few hours when using the cluster.

Perhaps, that problem arises due to the argument "fit.points", which has
to be passed if the local coefficient estimates should be made on a
multi node cluster.

Does anyone have an idea how to solve that problem with the missing
local R-squared and residuals if the gwr is calculated on a cluster?
The understanding for use on a cluster was that the data points and the 
fit points are different, so there is no observed dependent variable at 
the fit point, hence no local R2. I've added logic in the code that checks 
for equality between the fit and data points, and this for me resolves the 
problem, but may break other things. I've committed to R-forge, project 
rspatial, module spgwr. The source tarball and binary packages should be 
available later this evening European time from:

https://r-forge.r-project.org/R/?group_id=1014

Could you please try it out, and report back? I should also migrate spgwr 
from snow to parallel before I release it.

Best wishes,

Roger

Thank you very much in advance!

Kind regards,

Max

selected R-code:

### gwr on local machine:

gwr_50 <-
gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR,
data=hef, bandwidth=50, gweight=gwr.Gauss)

# part of the  str(gwr_50) output...

List of 11
 $ SDF      :Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots
  .. ..@ data       :'data.frame':	286288 obs. of  9 variables:
  .. .. ..$ sum.w      : num [1:286288] 2009 2003 2091 2089 2086 ...
  .. .. ..$ (Intercept): num [1:286288] -28.7 -28.5 -29.9 -29.7 -29.5 ...
  .. .. ..$ elevation  : num [1:286288] 0.0139 0.0138 0.014 0.014 0.014 ...
  .. .. ..$ sky        : num [1:286288] -0.153 -0.155 -0.146 -0.148 -0.149 ...
  .. .. ..$ slope      : num [1:286288] -2.58 -2.61 -2.42 -2.45 -2.48 ...
  .. .. ..$ solar      : num [1:286288] -0.00139 -0.00136 -0.0015 -0.00147 -0.00144 ...
  .. .. ..$ gwr.e      : num [1:286288] -0.461 -0.683 -0.5987 -0.2692 0.0406 ...
  .. .. ..$ pred       : num [1:286288] 0.806 0.833 0.507 0.514 0.576 ...
  .. .. ..$ localR2    : num [1:286288] 0.621 0.618 0.638 0.635 0.632 ...

### gwr on cluster :

cl <- makeCluster(32, type="MPI")

coords <- coordinates(hef)

gw <-
gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR,
data=hef, bandwidth=50, gweight=gwr.Gauss,fit.points=coords,
hatmatrix=FALSE, cl=cl)

# part of the  str(gwr_50) output...

List of 11
 $ SDF      :Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots
  .. ..@ data       :'data.frame':	286288 obs. of  6 variables:
  .. .. ..$ sum.w      : num [1:286288] 1 1 1 1 1 ...
  .. .. ..$ (Intercept): num [1:286288] 12541 1970 2057 -1505 -1030 ...
  .. .. ..$ elevation  : num [1:286288] -3.891 -0.602 -0.738 0.465 0.309 ...
  .. .. ..$ sky        : num [1:286288] -0.954 -0.425 3.714 0.159 0.152 ...
  .. .. ..$ slope      : num [1:286288] 62.19 NA -27.21 1.95 16.03 ...
  .. .. ..$ solar      : num [1:286288] NA NA NA NA 0.042 ...

	[[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand
Department of Economics, NHH Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20120507/7da8d82d/attachment.pl>

Dear Roger!

Thank you very much for your fast reply and work!

I'm not really an expert in HPC-computing, but i will try to report as goog 
as i can.

I updated spgwr and started a job on the cluster which takes normally 1,5 h. 
So far, it run for 5 hours, which indicates that the parallelization does not 
work efficient anymore. The function makeCluster(64, type="MPI") worked fine. 
Our cluster runs with openMPI.
Correct. I'll try to add back an option to use snow instead of parallel. 
When it reaches R-forge, its revision number will be > 1252.

Roger
In that context, i found on the CRAN Task view: High-Performance and Parallel 
Computing with R the following: 
"<http://www.dict.cc/englisch-deutsch/parallelization.html>Direct support in 
R is starting with release 2.14.0 which includes a new package parallel 
incorporating (slightly revised) copies of packages multicore and snow (*but 
excluding MPI, PVM and NWS clusters*). Does the new parallel support works 
still in the openMPI environment?

regards,

Max

fyi:

sessionInfo()
R version 2.14.0 (2011-10-31)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_US       LC_NUMERIC=C         LC_TIME=en_US
[4] LC_COLLATE=en_US     LC_MONETARY=en_US    LC_MESSAGES=en_US
[7] LC_PAPER=C           LC_NAME=C            LC_ADDRESS=C
[10] LC_TELEPHONE=C       LC_MEASUREMENT=en_US LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] spgwr_0.6-15    spdep_0.5-45    coda_0.14-6     deldir_0.0-16
[5] maptools_0.8-10 foreign_0.8-46  nlme_3.1-102    MASS_7.3-16
[9] Matrix_1.0-1    lattice_0.20-0  boot_1.3-3      gstat_1.0-10
[13] spacetime_0.5-7 xts_0.8-2       zoo_1.7-6       sp_0.9-98
[17] snow_0.3-8      Rmpi_0.5-9

loaded via a namespace (and not attached):
[1] grid_2.14.0

On 05/05/2012 04:24 PM, Roger Bivand wrote:
On Fri, 4 May 2012, Maximilian Spro? wrote:

Dear r-sig-geo list!

I run gwr on a multi-node cluster(on 64 slots). In the gwr output (slot
"SDF"), the gwr residuals and the local R-squared are missing. When
performing the same model on the local machine, these components are
included. Unfortunately, the calculation in this way takes about 5 days
instead of few hours when using the cluster.

Perhaps, that problem arises due to the argument "fit.points", which has
to be passed if the local coefficient estimates should be made on a
multi node cluster.

Does anyone have an idea how to solve that problem with the missing
local R-squared and residuals if the gwr is calculated on a cluster?
The understanding for use on a cluster was that the data points and the fit 
points are different, so there is no observed dependent variable at the fit 
point, hence no local R2. I've added logic in the code that checks for 
equality between the fit and data points, and this for me resolves the 
problem, but may break other things. I've committed to R-forge, project 
rspatial, module spgwr. The source tarball and binary packages should be 
available later this evening European time from:

https://r-forge.r-project.org/R/?group_id=1014

Could you please try it out, and report back? I should also migrate spgwr 
from snow to parallel before I release it.

Best wishes,

Roger

Thank you very much in advance!

Kind regards,

Max

selected R-code:

### gwr on local machine:

gwr_50 <-
gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR, 
data=hef, bandwidth=50, gweight=gwr.Gauss)

# part of the  str(gwr_50) output...

List of 11
 $ SDF      :Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 
slots
  .. ..@ data       :'data.frame':    286288 obs. of  9 variables:
  .. .. ..$ sum.w      : num [1:286288] 2009 2003 2091 2089 2086 ...
  .. .. ..$ (Intercept): num [1:286288] -28.7 -28.5 -29.9 -29.7 -29.5 ...
  .. .. ..$ elevation  : num [1:286288] 0.0139 0.0138 0.014 0.014 0.014 
...
  .. .. ..$ sky        : num [1:286288] -0.153 -0.155 -0.146 -0.148 -0.149 
...
  .. .. ..$ slope      : num [1:286288] -2.58 -2.61 -2.42 -2.45 -2.48 ...
  .. .. ..$ solar      : num [1:286288] -0.00139 -0.00136 -0.0015 -0.00147 
-0.00144 ...
  .. .. ..$ gwr.e      : num [1:286288] -0.461 -0.683 -0.5987 -0.2692 
0.0406 ...
  .. .. ..$ pred       : num [1:286288] 0.806 0.833 0.507 0.514 0.576 ...
  .. .. ..$ localR2    : num [1:286288] 0.621 0.618 0.638 0.635 0.632 ...

### gwr on cluster :

cl <- makeCluster(32, type="MPI")

coords <- coordinates(hef)

gw <-
gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR, 
data=hef, bandwidth=50, gweight=gwr.Gauss,fit.points=coords,
hatmatrix=FALSE, cl=cl)

# part of the  str(gwr_50) output...

List of 11
 $ SDF      :Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 
slots
  .. ..@ data       :'data.frame':    286288 obs. of  6 variables:
  .. .. ..$ sum.w      : num [1:286288] 1 1 1 1 1 ...
  .. .. ..$ (Intercept): num [1:286288] 12541 1970 2057 -1505 -1030 ...
  .. .. ..$ elevation  : num [1:286288] -3.891 -0.602 -0.738 0.465 0.309 
...
  .. .. ..$ sky        : num [1:286288] -0.954 -0.425 3.714 0.159 0.152 
...
  .. .. ..$ slope      : num [1:286288] 62.19 NA -27.21 1.95 16.03 ...
  .. .. ..$ solar      : num [1:286288] NA NA NA NA 0.042 ...

    [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand
Department of Economics, NHH Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20120507/d5a95612/attachment.pl>

--
Mag. J. Maximilian Spro?
Institute of Geography, University of Innsbruck
Innrain 52
A-6020 INNSBRUCK

Tel. +43 (0)512 507 5413
web: http://www.uibk.ac.at/geographie/projects/lidar/

-----Urspr?ngliche Nachricht-----
Von: Roger Bivand [mailto:Roger.Bivand at nhh.no]
Gesendet: Mo 07.05.2012 14:48
An: Maximilian Spro?
Cc: r-sig-geo
Betreff: Re: [R-sig-Geo] Missing local R-squared and residuals in gwr output

On Mon, 7 May 2012, Maximilian Spro? wrote:

Dear Roger!

Thank you very much for your fast reply and work!

I'm not really an expert in HPC-computing, but i will try to report as goog
as i can.

I updated spgwr and started a job on the cluster which takes normally 1,5 h.
So far, it run for 5 hours, which indicates that the parallelization does not
work efficient anymore. The function makeCluster(64, type="MPI") worked fine.
Our cluster runs with openMPI.
Correct. I'll try to add back an option to use snow instead of parallel.

I tried out the new version but it seems still using parallel.

code:

gwr_50 <- gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR+factor(asp_fac), data=hef, bandwidth=50, gweight=gwr.Gauss,fit.points=coords, hatmatrix=FALSE, cl=cl)
Add use_snow=TRUE to the command to switch to snow.

Roger
Loading required package: parallel

Attaching package: 'parallel'

The following object(s) are masked from 'package:snow':

   clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
   clusterExport, clusterMap, clusterSplit, makeCluster, parApply,
   parCapply, parLapply, parRapply, parSapply, splitIndices,
   stopCluster

Max

When it reaches R-forge, its revision number will be > 1252.

Roger

In that context, i found on the CRAN Task view: High-Performance and Parallel
Computing with R the following:
"<http://www.dict.cc/englisch-deutsch/parallelization.html>Direct support in
R is starting with release 2.14.0 which includes a new package parallel
incorporating (slightly revised) copies of packages multicore and snow (*but
excluding MPI, PVM and NWS clusters*). Does the new parallel support works
still in the openMPI environment?

regards,

Max

fyi:

sessionInfo()
R version 2.14.0 (2011-10-31)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_US       LC_NUMERIC=C         LC_TIME=en_US
[4] LC_COLLATE=en_US     LC_MONETARY=en_US    LC_MESSAGES=en_US
[7] LC_PAPER=C           LC_NAME=C            LC_ADDRESS=C
[10] LC_TELEPHONE=C       LC_MEASUREMENT=en_US LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] spgwr_0.6-15    spdep_0.5-45    coda_0.14-6     deldir_0.0-16
[5] maptools_0.8-10 foreign_0.8-46  nlme_3.1-102    MASS_7.3-16
[9] Matrix_1.0-1    lattice_0.20-0  boot_1.3-3      gstat_1.0-10
[13] spacetime_0.5-7 xts_0.8-2       zoo_1.7-6       sp_0.9-98
[17] snow_0.3-8      Rmpi_0.5-9

loaded via a namespace (and not attached):
[1] grid_2.14.0

On 05/05/2012 04:24 PM, Roger Bivand wrote:
On Fri, 4 May 2012, Maximilian Spro? wrote:

Dear r-sig-geo list!

I run gwr on a multi-node cluster(on 64 slots). In the gwr output (slot
"SDF"), the gwr residuals and the local R-squared are missing. When
performing the same model on the local machine, these components are
included. Unfortunately, the calculation in this way takes about 5 days
instead of few hours when using the cluster.

Perhaps, that problem arises due to the argument "fit.points", which has
to be passed if the local coefficient estimates should be made on a
multi node cluster.

Does anyone have an idea how to solve that problem with the missing
local R-squared and residuals if the gwr is calculated on a cluster?
The understanding for use on a cluster was that the data points and the fit
points are different, so there is no observed dependent variable at the fit
point, hence no local R2. I've added logic in the code that checks for
equality between the fit and data points, and this for me resolves the
problem, but may break other things. I've committed to R-forge, project
rspatial, module spgwr. The source tarball and binary packages should be
available later this evening European time from:

https://r-forge.r-project.org/R/?group_id=1014

Could you please try it out, and report back? I should also migrate spgwr
from snow to parallel before I release it.

Best wishes,

Roger

Thank you very much in advance!

Kind regards,

Max

selected R-code:

### gwr on local machine:

gwr_50 <-
gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR,
data=hef, bandwidth=50, gweight=gwr.Gauss)

# part of the  str(gwr_50) output...

List of 11
 $ SDF      :Formal class 'SpatialPointsDataFrame' [package "sp"] with 5
slots
  .. ..@ data       :'data.frame':    286288 obs. of  9 variables:
  .. .. ..$ sum.w      : num [1:286288] 2009 2003 2091 2089 2086 ...
  .. .. ..$ (Intercept): num [1:286288] -28.7 -28.5 -29.9 -29.7 -29.5 ...
  .. .. ..$ elevation  : num [1:286288] 0.0139 0.0138 0.014 0.014 0.014
...
  .. .. ..$ sky        : num [1:286288] -0.153 -0.155 -0.146 -0.148 -0.149
...
  .. .. ..$ slope      : num [1:286288] -2.58 -2.61 -2.42 -2.45 -2.48 ...
  .. .. ..$ solar      : num [1:286288] -0.00139 -0.00136 -0.0015 -0.00147
-0.00144 ...
  .. .. ..$ gwr.e      : num [1:286288] -0.461 -0.683 -0.5987 -0.2692
0.0406 ...
  .. .. ..$ pred       : num [1:286288] 0.806 0.833 0.507 0.514 0.576 ...
  .. .. ..$ localR2    : num [1:286288] 0.621 0.618 0.638 0.635 0.632 ...

### gwr on cluster :

cl <- makeCluster(32, type="MPI")

coords <- coordinates(hef)

gw <-
gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR,
data=hef, bandwidth=50, gweight=gwr.Gauss,fit.points=coords,
hatmatrix=FALSE, cl=cl)

# part of the  str(gwr_50) output...

List of 11
 $ SDF      :Formal class 'SpatialPointsDataFrame' [package "sp"] with 5
slots
  .. ..@ data       :'data.frame':    286288 obs. of  6 variables:
  .. .. ..$ sum.w      : num [1:286288] 1 1 1 1 1 ...
  .. .. ..$ (Intercept): num [1:286288] 12541 1970 2057 -1505 -1030 ...
  .. .. ..$ elevation  : num [1:286288] -3.891 -0.602 -0.738 0.465 0.309
...
  .. .. ..$ sky        : num [1:286288] -0.954 -0.425 3.714 0.159 0.152
...
  .. .. ..$ slope      : num [1:286288] 62.19 NA -27.21 1.95 16.03 ...
  .. .. ..$ solar      : num [1:286288] NA NA NA NA 0.042 ...

    [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

--
Roger Bivand
Department of Economics, NHH Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no

Roger Bivand
Department of Economics, NHH Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no
Thank you Roger! The gwr on the MPI cluster works fine.

However, now the output object includes the intially missing three data 
slots: "gwr.e","pred" and "localR2". Unfortunately, the latter contains 
only NA's.
Sorry for of any inconvenience, but do you think you can solve that?

Thanks in advance and all the best,

Max
On Mon, 7 May 2012, "Spro?, Johann" wrote:

-- 
Mag. J. Maximilian Spro?
Institute of Geography, University of Innsbruck
Innrain 52
A-6020 INNSBRUCK

Tel. +43 (0)512 507 5413
web: http://www.uibk.ac.at/geographie/projects/lidar/

-----Urspr?ngliche Nachricht-----
Von: Roger Bivand [mailto:Roger.Bivand at nhh.no]
Gesendet: Mo 07.05.2012 14:48
An: Maximilian Spro?
Cc: r-sig-geo
Betreff: Re: [R-sig-Geo] Missing local R-squared and residuals in gwr 
output

On Mon, 7 May 2012, Maximilian Spro? wrote:

Dear Roger!

Thank you very much for your fast reply and work!

I'm not really an expert in HPC-computing, but i will try to report 
as goog
as i can.

I updated spgwr and started a job on the cluster which takes 
normally 1,5 h.
So far, it run for 5 hours, which indicates that the parallelization 
does not
work efficient anymore. The function makeCluster(64, type="MPI") 
worked fine.
Our cluster runs with openMPI.
Correct. I'll try to add back an option to use snow instead of parallel.

I tried out the new version but it seems still using parallel.

code:

gwr_50 <- 
gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR+factor(asp_fac), 
data=hef, bandwidth=50, gweight=gwr.Gauss,fit.points=coords, 
hatmatrix=FALSE, cl=cl)
Add use_snow=TRUE to the command to switch to snow.

Roger

Loading required package: parallel

Attaching package: 'parallel'

The following object(s) are masked from 'package:snow':

   clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
   clusterExport, clusterMap, clusterSplit, makeCluster, parApply,
   parCapply, parLapply, parRapply, parSapply, splitIndices,
   stopCluster

Max

When it reaches R-forge, its revision number will be > 1252.

Roger

In that context, i found on the CRAN Task view: High-Performance and 
Parallel
Computing with R the following:
"<http://www.dict.cc/englisch-deutsch/parallelization.html>Direct 
support in
R is starting with release 2.14.0 which includes a new package parallel
incorporating (slightly revised) copies of packages multicore and 
snow (*but
excluding MPI, PVM and NWS clusters*). Does the new parallel support 
works
still in the openMPI environment?

regards,

Max

fyi:

sessionInfo()
R version 2.14.0 (2011-10-31)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_US       LC_NUMERIC=C         LC_TIME=en_US
[4] LC_COLLATE=en_US     LC_MONETARY=en_US    LC_MESSAGES=en_US
[7] LC_PAPER=C           LC_NAME=C            LC_ADDRESS=C
[10] LC_TELEPHONE=C       LC_MEASUREMENT=en_US LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] spgwr_0.6-15    spdep_0.5-45    coda_0.14-6     deldir_0.0-16
[5] maptools_0.8-10 foreign_0.8-46  nlme_3.1-102    MASS_7.3-16
[9] Matrix_1.0-1    lattice_0.20-0  boot_1.3-3      gstat_1.0-10
[13] spacetime_0.5-7 xts_0.8-2       zoo_1.7-6       sp_0.9-98
[17] snow_0.3-8      Rmpi_0.5-9

loaded via a namespace (and not attached):
[1] grid_2.14.0

On 05/05/2012 04:24 PM, Roger Bivand wrote:
On Fri, 4 May 2012, Maximilian Spro? wrote:

Dear r-sig-geo list!

I run gwr on a multi-node cluster(on 64 slots). In the gwr output 
(slot
"SDF"), the gwr residuals and the local R-squared are missing. When
performing the same model on the local machine, these components are
included. Unfortunately, the calculation in this way takes about 5 
days
instead of few hours when using the cluster.

Perhaps, that problem arises due to the argument "fit.points", 
which has
to be passed if the local coefficient estimates should be made on a
multi node cluster.

Does anyone have an idea how to solve that problem with the missing
local R-squared and residuals if the gwr is calculated on a cluster?
The understanding for use on a cluster was that the data points and 
the fit
points are different, so there is no observed dependent variable at 
the fit
point, hence no local R2. I've added logic in the code that checks for
equality between the fit and data points, and this for me resolves the
problem, but may break other things. I've committed to R-forge, 
project
rspatial, module spgwr. The source tarball and binary packages 
should be
available later this evening European time from:

https://r-forge.r-project.org/R/?group_id=1014

Could you please try it out, and report back? I should also migrate 
spgwr
from snow to parallel before I release it.

Best wishes,

Roger

Thank you very much in advance!

Kind regards,

Max

selected R-code:

### gwr on local machine:

gwr_50 <-
gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR, 

data=hef, bandwidth=50, gweight=gwr.Gauss)

# part of the  str(gwr_50) output...

List of 11
 $ SDF      :Formal class 'SpatialPointsDataFrame' [package "sp"] 
with 5
slots
  .. ..@ data       :'data.frame':    286288 obs. of  9 variables:
  .. .. ..$ sum.w      : num [1:286288] 2009 2003 2091 2089 2086 ...
  .. .. ..$ (Intercept): num [1:286288] -28.7 -28.5 -29.9 -29.7 
-29.5 ...
  .. .. ..$ elevation  : num [1:286288] 0.0139 0.0138 0.014 0.014 
0.014
...
  .. .. ..$ sky        : num [1:286288] -0.153 -0.155 -0.146 
-0.148 -0.149
...
  .. .. ..$ slope      : num [1:286288] -2.58 -2.61 -2.42 -2.45 
-2.48 ...
  .. .. ..$ solar      : num [1:286288] -0.00139 -0.00136 -0.0015 
-0.00147
-0.00144 ...
  .. .. ..$ gwr.e      : num [1:286288] -0.461 -0.683 -0.5987 -0.2692
0.0406 ...
  .. .. ..$ pred       : num [1:286288] 0.806 0.833 0.507 0.514 
0.576 ...
  .. .. ..$ localR2    : num [1:286288] 0.621 0.618 0.638 0.635 
0.632 ...

### gwr on cluster :

cl <- makeCluster(32, type="MPI")

coords <- coordinates(hef)

gw <-
gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR, 

data=hef, bandwidth=50, gweight=gwr.Gauss,fit.points=coords,
hatmatrix=FALSE, cl=cl)

# part of the  str(gwr_50) output...

List of 11
 $ SDF      :Formal class 'SpatialPointsDataFrame' [package "sp"] 
with 5
slots
  .. ..@ data       :'data.frame':    286288 obs. of  6 variables:
  .. .. ..$ sum.w      : num [1:286288] 1 1 1 1 1 ...
  .. .. ..$ (Intercept): num [1:286288] 12541 1970 2057 -1505 
-1030 ...
  .. .. ..$ elevation  : num [1:286288] -3.891 -0.602 -0.738 0.465 
0.309
...
  .. .. ..$ sky        : num [1:286288] -0.954 -0.425 3.714 0.159 
0.152
...
  .. .. ..$ slope      : num [1:286288] 62.19 NA -27.21 1.95 16.03 
...
  .. .. ..$ solar      : num [1:286288] NA NA NA NA 0.042 ...

    [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

-- 
Roger Bivand
Department of Economics, NHH Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no

Thank you Roger! The gwr on the MPI cluster works fine.

However, now the output object includes the intially missing three data 
slots: "gwr.e","pred" and "localR2". Unfortunately, the latter contains only 
NA's.
Sorry for of any inconvenience, but do you think you can solve that?
I do not see any problem there, and indeed it is after the results have 
been returned from the cluster. You can tell whether you have been into 
the code block starting on line 261 in spgwr/R/gwr.R if there is no line 
beginning with "postprocess_localR2" in the timings component of the 
output object. The conditions are:

((!fp.given || fit_are_data) && is.null(fittedGWRobject))

where the first is FALSE, the second TRUE and the third TRUE in your case. 
If the "pred" column in your output contains values that are not finite, 
this may happen in this code block.

If you cannot see what is going on, we need a smaller test data set that 
replicates the problem.

Roger
Thanks in advance and all the best,

Max

On 05/07/2012 08:45 PM, Roger Bivand wrote:
On Mon, 7 May 2012, "Spro?, Johann" wrote:

-- 
Mag. J. Maximilian Spro?
Institute of Geography, University of Innsbruck
Innrain 52
A-6020 INNSBRUCK

Tel. +43 (0)512 507 5413
web: http://www.uibk.ac.at/geographie/projects/lidar/

-----Urspr?ngliche Nachricht-----
Von: Roger Bivand [mailto:Roger.Bivand at nhh.no]
Gesendet: Mo 07.05.2012 14:48
An: Maximilian Spro?
Cc: r-sig-geo
Betreff: Re: [R-sig-Geo] Missing local R-squared and residuals in gwr 
output

On Mon, 7 May 2012, Maximilian Spro? wrote:

Dear Roger!

Thank you very much for your fast reply and work!

I'm not really an expert in HPC-computing, but i will try to report as 
goog
as i can.

I updated spgwr and started a job on the cluster which takes normally 1,5 
h.
So far, it run for 5 hours, which indicates that the parallelization does 
not
work efficient anymore. The function makeCluster(64, type="MPI") worked 
fine.
Our cluster runs with openMPI.
Correct. I'll try to add back an option to use snow instead of parallel.

I tried out the new version but it seems still using parallel.

code:

gwr_50 <- 
gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR+factor(asp_fac), 
data=hef, bandwidth=50, gweight=gwr.Gauss,fit.points=coords, 
hatmatrix=FALSE, cl=cl)
Add use_snow=TRUE to the command to switch to snow.

Roger

Loading required package: parallel

Attaching package: 'parallel'

The following object(s) are masked from 'package:snow':

   clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
   clusterExport, clusterMap, clusterSplit, makeCluster, parApply,
   parCapply, parLapply, parRapply, parSapply, splitIndices,
   stopCluster

Max

When it reaches R-forge, its revision number will be > 1252.

Roger

In that context, i found on the CRAN Task view: High-Performance and 
Parallel
Computing with R the following:
"<http://www.dict.cc/englisch-deutsch/parallelization.html>Direct support 
in
R is starting with release 2.14.0 which includes a new package parallel
incorporating (slightly revised) copies of packages multicore and snow 
(*but
excluding MPI, PVM and NWS clusters*). Does the new parallel support 
works
still in the openMPI environment?

regards,

Max

fyi:

sessionInfo()
R version 2.14.0 (2011-10-31)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_US       LC_NUMERIC=C         LC_TIME=en_US
[4] LC_COLLATE=en_US     LC_MONETARY=en_US    LC_MESSAGES=en_US
[7] LC_PAPER=C           LC_NAME=C            LC_ADDRESS=C
[10] LC_TELEPHONE=C       LC_MEASUREMENT=en_US LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] spgwr_0.6-15    spdep_0.5-45    coda_0.14-6     deldir_0.0-16
[5] maptools_0.8-10 foreign_0.8-46  nlme_3.1-102    MASS_7.3-16
[9] Matrix_1.0-1    lattice_0.20-0  boot_1.3-3      gstat_1.0-10
[13] spacetime_0.5-7 xts_0.8-2       zoo_1.7-6       sp_0.9-98
[17] snow_0.3-8      Rmpi_0.5-9

loaded via a namespace (and not attached):
[1] grid_2.14.0

On 05/05/2012 04:24 PM, Roger Bivand wrote:
On Fri, 4 May 2012, Maximilian Spro? wrote:

Dear r-sig-geo list!

I run gwr on a multi-node cluster(on 64 slots). In the gwr output (slot
"SDF"), the gwr residuals and the local R-squared are missing. When
performing the same model on the local machine, these components are
included. Unfortunately, the calculation in this way takes about 5 days
instead of few hours when using the cluster.

Perhaps, that problem arises due to the argument "fit.points", which 
has
to be passed if the local coefficient estimates should be made on a
multi node cluster.

Does anyone have an idea how to solve that problem with the missing
local R-squared and residuals if the gwr is calculated on a cluster?
The understanding for use on a cluster was that the data points and the 
fit
points are different, so there is no observed dependent variable at the 
fit
point, hence no local R2. I've added logic in the code that checks for
equality between the fit and data points, and this for me resolves the
problem, but may break other things. I've committed to R-forge, project
rspatial, module spgwr. The source tarball and binary packages should be
available later this evening European time from:

https://r-forge.r-project.org/R/?group_id=1014

Could you please try it out, and report back? I should also migrate 
spgwr
from snow to parallel before I release it.

Best wishes,

Roger

Thank you very much in advance!

Kind regards,

Max

selected R-code:

### gwr on local machine:

gwr_50 <-
gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR, 
data=hef, bandwidth=50, gweight=gwr.Gauss)

# part of the  str(gwr_50) output...

List of 11
 $ SDF      :Formal class 'SpatialPointsDataFrame' [package "sp"] with 
5
slots
  .. ..@ data       :'data.frame':    286288 obs. of  9 variables:
  .. .. ..$ sum.w      : num [1:286288] 2009 2003 2091 2089 2086 ...
  .. .. ..$ (Intercept): num [1:286288] -28.7 -28.5 -29.9 -29.7 -29.5 
...
  .. .. ..$ elevation  : num [1:286288] 0.0139 0.0138 0.014 0.014 0.014
...
  .. .. ..$ sky        : num [1:286288] -0.153 -0.155 -0.146 -0.148 
-0.149
...
  .. .. ..$ slope      : num [1:286288] -2.58 -2.61 -2.42 -2.45 -2.48 
...
  .. .. ..$ solar      : num [1:286288] -0.00139 -0.00136 -0.0015 
-0.00147
-0.00144 ...
  .. .. ..$ gwr.e      : num [1:286288] -0.461 -0.683 -0.5987 -0.2692
0.0406 ...
  .. .. ..$ pred       : num [1:286288] 0.806 0.833 0.507 0.514 0.576 
...
  .. .. ..$ localR2    : num [1:286288] 0.621 0.618 0.638 0.635 0.632 
...

### gwr on cluster :

cl <- makeCluster(32, type="MPI")

coords <- coordinates(hef)

gw <-
gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR, 
data=hef, bandwidth=50, gweight=gwr.Gauss,fit.points=coords,
hatmatrix=FALSE, cl=cl)

# part of the  str(gwr_50) output...

List of 11
 $ SDF      :Formal class 'SpatialPointsDataFrame' [package "sp"] with 
5
slots
  .. ..@ data       :'data.frame':    286288 obs. of  6 variables:
  .. .. ..$ sum.w      : num [1:286288] 1 1 1 1 1 ...
  .. .. ..$ (Intercept): num [1:286288] 12541 1970 2057 -1505 -1030 ...
  .. .. ..$ elevation  : num [1:286288] -3.891 -0.602 -0.738 0.465 
0.309
...
  .. .. ..$ sky        : num [1:286288] -0.954 -0.425 3.714 0.159 0.152
...
  .. .. ..$ slope      : num [1:286288] 62.19 NA -27.21 1.95 16.03 ...
  .. .. ..$ solar      : num [1:286288] NA NA NA NA 0.042 ...

    [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

-- 
Roger Bivand
Department of Economics, NHH Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no

Roger Bivand
Department of Economics, NHH Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no
Dear Roger!

Your are right, there are no problems anymore. I did some some 
comparative tests with a small subset of the dataset. The number of 
values in the "pred" column, which are not finite depends on the 
bandwidth. With increasing bandwidth, the NA's disappear.

Unfortunately, I cannot compute gwr.sel due to the large data amount.

By the way, all problems are solved and the use of the cluster is a 
really nice feature to decrease processing time efficiently.

Thank you very much for your help!

Max
On Wed, 9 May 2012, Maximilian Spro? wrote:

Thank you Roger! The gwr on the MPI cluster works fine.

However, now the output object includes the intially missing three 
data slots: "gwr.e","pred" and "localR2". Unfortunately, the latter 
contains only NA's.
Sorry for of any inconvenience, but do you think you can solve that?
I do not see any problem there, and indeed it is after the results 
have been returned from the cluster. You can tell whether you have 
been into the code block starting on line 261 in spgwr/R/gwr.R if 
there is no line beginning with "postprocess_localR2" in the timings 
component of the output object. The conditions are:

((!fp.given || fit_are_data) && is.null(fittedGWRobject))

where the first is FALSE, the second TRUE and the third TRUE in your 
case. If the "pred" column in your output contains values that are not 
finite, this may happen in this code block.

If you cannot see what is going on, we need a smaller test data set 
that replicates the problem.

Roger

Thanks in advance and all the best,

Max

On 05/07/2012 08:45 PM, Roger Bivand wrote:
On Mon, 7 May 2012, "Spro?, Johann" wrote:

-- 
Mag. J. Maximilian Spro?
Institute of Geography, University of Innsbruck
Innrain 52
A-6020 INNSBRUCK

Tel. +43 (0)512 507 5413
web: http://www.uibk.ac.at/geographie/projects/lidar/

-----Urspr?ngliche Nachricht-----
Von: Roger Bivand [mailto:Roger.Bivand at nhh.no]
Gesendet: Mo 07.05.2012 14:48
An: Maximilian Spro?
Cc: r-sig-geo
Betreff: Re: [R-sig-Geo] Missing local R-squared and residuals in 
gwr output

On Mon, 7 May 2012, Maximilian Spro? wrote:

Dear Roger!

Thank you very much for your fast reply and work!

I'm not really an expert in HPC-computing, but i will try to 
report as goog
as i can.

I updated spgwr and started a job on the cluster which takes 
normally 1,5 h.
So far, it run for 5 hours, which indicates that the 
parallelization does not
work efficient anymore. The function makeCluster(64, type="MPI") 
worked fine.
Our cluster runs with openMPI.
Correct. I'll try to add back an option to use snow instead of 
parallel.

I tried out the new version but it seems still using parallel.

code:

gwr_50 <- 
gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR+factor(asp_fac), 
data=hef, bandwidth=50, gweight=gwr.Gauss,fit.points=coords, 
hatmatrix=FALSE, cl=cl)
Add use_snow=TRUE to the command to switch to snow.

Roger

Loading required package: parallel

Attaching package: 'parallel'

The following object(s) are masked from 'package:snow':

   clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
   clusterExport, clusterMap, clusterSplit, makeCluster, parApply,
   parCapply, parLapply, parRapply, parSapply, splitIndices,
   stopCluster

Max

When it reaches R-forge, its revision number will be > 1252.

Roger

In that context, i found on the CRAN Task view: High-Performance 
and Parallel
Computing with R the following:
"<http://www.dict.cc/englisch-deutsch/parallelization.html>Direct 
support in
R is starting with release 2.14.0 which includes a new package 
parallel
incorporating (slightly revised) copies of packages multicore and 
snow (*but
excluding MPI, PVM and NWS clusters*). Does the new parallel 
support works
still in the openMPI environment?

regards,

Max

fyi:

sessionInfo()
R version 2.14.0 (2011-10-31)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_US       LC_NUMERIC=C         LC_TIME=en_US
[4] LC_COLLATE=en_US     LC_MONETARY=en_US    LC_MESSAGES=en_US
[7] LC_PAPER=C           LC_NAME=C            LC_ADDRESS=C
[10] LC_TELEPHONE=C       LC_MEASUREMENT=en_US LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  
methods
[8] base

other attached packages:
[1] spgwr_0.6-15    spdep_0.5-45    coda_0.14-6     deldir_0.0-16
[5] maptools_0.8-10 foreign_0.8-46  nlme_3.1-102    MASS_7.3-16
[9] Matrix_1.0-1    lattice_0.20-0  boot_1.3-3      gstat_1.0-10
[13] spacetime_0.5-7 xts_0.8-2       zoo_1.7-6       sp_0.9-98
[17] snow_0.3-8      Rmpi_0.5-9

loaded via a namespace (and not attached):
[1] grid_2.14.0

On 05/05/2012 04:24 PM, Roger Bivand wrote:
On Fri, 4 May 2012, Maximilian Spro? wrote:

Dear r-sig-geo list!

I run gwr on a multi-node cluster(on 64 slots). In the gwr 
output (slot
"SDF"), the gwr residuals and the local R-squared are missing. When
performing the same model on the local machine, these components 
are
included. Unfortunately, the calculation in this way takes about 
5 days
instead of few hours when using the cluster.

Perhaps, that problem arises due to the argument "fit.points", 
which has
to be passed if the local coefficient estimates should be made on a
multi node cluster.

Does anyone have an idea how to solve that problem with the missing
local R-squared and residuals if the gwr is calculated on a 
cluster?
The understanding for use on a cluster was that the data points 
and the fit
points are different, so there is no observed dependent variable 
at the fit
point, hence no local R2. I've added logic in the code that 
checks for
equality between the fit and data points, and this for me 
resolves the
problem, but may break other things. I've committed to R-forge, 
project
rspatial, module spgwr. The source tarball and binary packages 
should be
available later this evening European time from:

https://r-forge.r-project.org/R/?group_id=1014

Could you please try it out, and report back? I should also 
migrate spgwr
from snow to parallel before I release it.

Best wishes,

Roger

Thank you very much in advance!

Kind regards,

Max

selected R-code:

### gwr on local machine:

gwr_50 <-
gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR, 
data=hef, bandwidth=50, gweight=gwr.Gauss)

# part of the  str(gwr_50) output...

List of 11
 $ SDF      :Formal class 'SpatialPointsDataFrame' [package 
"sp"] with 5
slots
  .. ..@ data       :'data.frame':    286288 obs. of  9 variables:
  .. .. ..$ sum.w      : num [1:286288] 2009 2003 2091 2089 2086 
...
  .. .. ..$ (Intercept): num [1:286288] -28.7 -28.5 -29.9 -29.7 
-29.5 ...
  .. .. ..$ elevation  : num [1:286288] 0.0139 0.0138 0.014 
0.014 0.014
...
  .. .. ..$ sky        : num [1:286288] -0.153 -0.155 -0.146 
-0.148 -0.149
...
  .. .. ..$ slope      : num [1:286288] -2.58 -2.61 -2.42 -2.45 
-2.48 ...
  .. .. ..$ solar      : num [1:286288] -0.00139 -0.00136 
-0.0015 -0.00147
-0.00144 ...
  .. .. ..$ gwr.e      : num [1:286288] -0.461 -0.683 -0.5987 
-0.2692
0.0406 ...
  .. .. ..$ pred       : num [1:286288] 0.806 0.833 0.507 0.514 
0.576 ...
  .. .. ..$ localR2    : num [1:286288] 0.621 0.618 0.638 0.635 
0.632 ...

### gwr on cluster :

cl <- makeCluster(32, type="MPI")

coords <- coordinates(hef)

gw <-
gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR, 
data=hef, bandwidth=50, gweight=gwr.Gauss,fit.points=coords,
hatmatrix=FALSE, cl=cl)

# part of the  str(gwr_50) output...

List of 11
 $ SDF      :Formal class 'SpatialPointsDataFrame' [package 
"sp"] with 5
slots
  .. ..@ data       :'data.frame':    286288 obs. of  6 variables:
  .. .. ..$ sum.w      : num [1:286288] 1 1 1 1 1 ...
  .. .. ..$ (Intercept): num [1:286288] 12541 1970 2057 -1505 
-1030 ...
  .. .. ..$ elevation  : num [1:286288] -3.891 -0.602 -0.738 
0.465 0.309
...
  .. .. ..$ sky        : num [1:286288] -0.954 -0.425 3.714 
0.159 0.152
...
  .. .. ..$ slope      : num [1:286288] 62.19 NA -27.21 1.95 
16.03 ...
  .. .. ..$ solar      : num [1:286288] NA NA NA NA 0.042 ...

    [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

-- 
Roger Bivand
Department of Economics, NHH Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no

Dear Roger!

Your are right, there are no problems anymore. I did some some comparative 
tests with a small subset of the dataset. The number of values in the "pred" 
column, which are not finite depends on the bandwidth. With increasing 
bandwidth, the NA's disappear.

Unfortunately, I cannot compute gwr.sel due to the large data amount.

By the way, all problems are solved and the use of the cluster is a really 
nice feature to decrease processing time efficiently.
Thanks for checking and reporting back. I'll release to CRAN shortly.

Best wishes,

Roger
Thank you very much for your help!

Max

On 05/09/2012 01:19 PM, Roger Bivand wrote:
On Wed, 9 May 2012, Maximilian Spro? wrote:

Thank you Roger! The gwr on the MPI cluster works fine.

However, now the output object includes the intially missing three data 
slots: "gwr.e","pred" and "localR2". Unfortunately, the latter contains 
only NA's.
Sorry for of any inconvenience, but do you think you can solve that?
I do not see any problem there, and indeed it is after the results have 
been returned from the cluster. You can tell whether you have been into the 
code block starting on line 261 in spgwr/R/gwr.R if there is no line 
beginning with "postprocess_localR2" in the timings component of the output 
object. The conditions are:

((!fp.given || fit_are_data) && is.null(fittedGWRobject))

where the first is FALSE, the second TRUE and the third TRUE in your case. 
If the "pred" column in your output contains values that are not finite, 
this may happen in this code block.

If you cannot see what is going on, we need a smaller test data set that 
replicates the problem.

Roger

Thanks in advance and all the best,

Max

On 05/07/2012 08:45 PM, Roger Bivand wrote:
On Mon, 7 May 2012, "Spro?, Johann" wrote:

-- 
Mag. J. Maximilian Spro?
Institute of Geography, University of Innsbruck
Innrain 52
A-6020 INNSBRUCK

Tel. +43 (0)512 507 5413
web: http://www.uibk.ac.at/geographie/projects/lidar/

-----Urspr?ngliche Nachricht-----
Von: Roger Bivand [mailto:Roger.Bivand at nhh.no]
Gesendet: Mo 07.05.2012 14:48
An: Maximilian Spro?
Cc: r-sig-geo
Betreff: Re: [R-sig-Geo] Missing local R-squared and residuals in gwr 
output

On Mon, 7 May 2012, Maximilian Spro? wrote:

Dear Roger!

Thank you very much for your fast reply and work!

I'm not really an expert in HPC-computing, but i will try to report as 
goog
as i can.

I updated spgwr and started a job on the cluster which takes normally 
1,5 h.
So far, it run for 5 hours, which indicates that the parallelization 
does not
work efficient anymore. The function makeCluster(64, type="MPI") worked 
fine.
Our cluster runs with openMPI.
Correct. I'll try to add back an option to use snow instead of parallel.

I tried out the new version but it seems still using parallel.

code:

gwr_50 <- 
gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR+factor(asp_fac), 
data=hef, bandwidth=50, gweight=gwr.Gauss,fit.points=coords, 
hatmatrix=FALSE, cl=cl)
Add use_snow=TRUE to the command to switch to snow.

Roger

Loading required package: parallel

Attaching package: 'parallel'

The following object(s) are masked from 'package:snow':

   clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
   clusterExport, clusterMap, clusterSplit, makeCluster, parApply,
   parCapply, parLapply, parRapply, parSapply, splitIndices,
   stopCluster

Max

When it reaches R-forge, its revision number will be > 1252.

Roger

In that context, i found on the CRAN Task view: High-Performance and 
Parallel
Computing with R the following:
"<http://www.dict.cc/englisch-deutsch/parallelization.html>Direct 
support in
R is starting with release 2.14.0 which includes a new package parallel
incorporating (slightly revised) copies of packages multicore and snow 
(*but
excluding MPI, PVM and NWS clusters*). Does the new parallel support 
works
still in the openMPI environment?

regards,

Max

fyi:

sessionInfo()
R version 2.14.0 (2011-10-31)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_US       LC_NUMERIC=C         LC_TIME=en_US
[4] LC_COLLATE=en_US     LC_MONETARY=en_US    LC_MESSAGES=en_US
[7] LC_PAPER=C           LC_NAME=C            LC_ADDRESS=C
[10] LC_TELEPHONE=C       LC_MEASUREMENT=en_US LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] spgwr_0.6-15    spdep_0.5-45    coda_0.14-6     deldir_0.0-16
[5] maptools_0.8-10 foreign_0.8-46  nlme_3.1-102    MASS_7.3-16
[9] Matrix_1.0-1    lattice_0.20-0  boot_1.3-3      gstat_1.0-10
[13] spacetime_0.5-7 xts_0.8-2       zoo_1.7-6       sp_0.9-98
[17] snow_0.3-8      Rmpi_0.5-9

loaded via a namespace (and not attached):
[1] grid_2.14.0

On 05/05/2012 04:24 PM, Roger Bivand wrote:
On Fri, 4 May 2012, Maximilian Spro? wrote:

Dear r-sig-geo list!

I run gwr on a multi-node cluster(on 64 slots). In the gwr output 
(slot
"SDF"), the gwr residuals and the local R-squared are missing. When
performing the same model on the local machine, these components are
included. Unfortunately, the calculation in this way takes about 5 
days
instead of few hours when using the cluster.

Perhaps, that problem arises due to the argument "fit.points", which 
has
to be passed if the local coefficient estimates should be made on a
multi node cluster.

Does anyone have an idea how to solve that problem with the missing
local R-squared and residuals if the gwr is calculated on a cluster?
The understanding for use on a cluster was that the data points and 
the fit
points are different, so there is no observed dependent variable at 
the fit
point, hence no local R2. I've added logic in the code that checks for
equality between the fit and data points, and this for me resolves the
problem, but may break other things. I've committed to R-forge, 
project
rspatial, module spgwr. The source tarball and binary packages should 
be
available later this evening European time from:

https://r-forge.r-project.org/R/?group_id=1014

Could you please try it out, and report back? I should also migrate 
spgwr
from snow to parallel before I release it.

Best wishes,

Roger

Thank you very much in advance!

Kind regards,

Max

selected R-code:

### gwr on local machine:

gwr_50 <-
gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR, 
data=hef, bandwidth=50, gweight=gwr.Gauss)

# part of the  str(gwr_50) output...

List of 11
 $ SDF      :Formal class 'SpatialPointsDataFrame' [package "sp"] 
with 5
slots
  .. ..@ data       :'data.frame':    286288 obs. of  9 variables:
  .. .. ..$ sum.w      : num [1:286288] 2009 2003 2091 2089 2086 ...
  .. .. ..$ (Intercept): num [1:286288] -28.7 -28.5 -29.9 -29.7 -29.5 
...
  .. .. ..$ elevation  : num [1:286288] 0.0139 0.0138 0.014 0.014 
0.014
...
  .. .. ..$ sky        : num [1:286288] -0.153 -0.155 -0.146 -0.148 
-0.149
...
  .. .. ..$ slope      : num [1:286288] -2.58 -2.61 -2.42 -2.45 -2.48 
...
  .. .. ..$ solar      : num [1:286288] -0.00139 -0.00136 -0.0015 
-0.00147
-0.00144 ...
  .. .. ..$ gwr.e      : num [1:286288] -0.461 -0.683 -0.5987 -0.2692
0.0406 ...
  .. .. ..$ pred       : num [1:286288] 0.806 0.833 0.507 0.514 0.576 
...
  .. .. ..$ localR2    : num [1:286288] 0.621 0.618 0.638 0.635 0.632 
...

### gwr on cluster :

cl <- makeCluster(32, type="MPI")

coords <- coordinates(hef)

gw <-
gwr(hef at data$DIF~hef at data$ELEVATION+hef at data$SKY+hef at data$SLOPE+hef at data$SOLAR, 
data=hef, bandwidth=50, gweight=gwr.Gauss,fit.points=coords,
hatmatrix=FALSE, cl=cl)

# part of the  str(gwr_50) output...

List of 11
 $ SDF      :Formal class 'SpatialPointsDataFrame' [package "sp"] 
with 5
slots
  .. ..@ data       :'data.frame':    286288 obs. of  6 variables:
  .. .. ..$ sum.w      : num [1:286288] 1 1 1 1 1 ...
  .. .. ..$ (Intercept): num [1:286288] 12541 1970 2057 -1505 -1030 
...
  .. .. ..$ elevation  : num [1:286288] -3.891 -0.602 -0.738 0.465 
0.309
...
  .. .. ..$ sky        : num [1:286288] -0.954 -0.425 3.714 0.159 
0.152
...
  .. .. ..$ slope      : num [1:286288] 62.19 NA -27.21 1.95 16.03 
...
  .. .. ..$ solar      : num [1:286288] NA NA NA NA 0.042 ...

    [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

-- 
Roger Bivand
Department of Economics, NHH Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no

Roger Bivand
Department of Economics, NHH Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no