Skip to content

problem with markov random field smooths in mgcv

4 messages · Wilcox, Chris (O&A, Hobart), David Winsemius, Simon Wood

#
Hi all,
    
    I am trying to fit a model with a markov random field smooth in mgcv.  I am having some trouble with getting it to run, and in particular I am getting the message 
    
    Error in initial.sp(w * x, S, off) : S[[1]] matrix is not +ve definite.
    
    After reading everything I could find on mrf, it sounds like there was a bug that was brought up with Simon Wood in 2012, due to differences between windows and linux, with the linus machine stopping due to this error, while windows was not.  I have not been able to find much else on it.  Any suggestions would be much appreciated.
    
    There is reproducible code below.
    
    Thanks 
    
    Chris
    
    
    library(mgcv)
    
    #create data
    Country <- as.factor(c("Australia","Australia","Australia","Australia","Australia","Australia","Bangladesh","Bangladesh","Bangladesh",
    "Bangladesh","Bangladesh","Bangladesh","Cambodia","Cambodia","Cambodia","Cambodia","Cambodia","Cambodia",
    "China","China","China","China","China","China","East Timor","East Timor","East Timor",
    "East Timor","East Timor","East Timor","HighSeas1","HighSeas1","HighSeas1","HighSeas1","HighSeas1","HighSeas1",
    "HighSeas2","HighSeas2","HighSeas2","HighSeas2","HighSeas2","HighSeas2","China","China","China","China","China","China",
    "India","India","India","India","India","India","Indonesia","Indonesia","Indonesia","Indonesia","Indonesia","Indonesia",
    "Malaysia","Malaysia","Malaysia","Malaysia","Malaysia","Malaysia","Myanmar","Myanmar","Myanmar","Myanmar","Myanmar",
    "Myanmar","Philippines","Philippines","Philippines","Philippines","Philippines","Philippines","South Korea","South Korea",
    "South Korea","South Korea","South Korea","South Korea","China","China","China","China","China","China",
    "Sri Lanka","Sri Lanka","Sri Lanka","Sri Lanka","Sri Lanka","Sri Lanka","Taiwan","Taiwan","Taiwan","Taiwan",
    "Taiwan","Taiwan","Thailand","Thailand","Thailand","Thailand","Thailand","Thailand","Vietnam","Vietnam","Vietnam","Vietnam",
    "Vietnam","Vietnam"))
    
    Count <- c(0,0,3,5,1,5,0,0,0,0,0,1,0,0,0,0,0,3,0,0,2,1,0,6,0,0,0,1,0,0,0,1,0,0,0,0
    ,0,0,20,0,1,0,0,0,0,0,0,2,0,0,6,3,3,10,1,1,18,11,8,11,0,1,2,2,1,14,0,0,0,1,0,0
    ,0,0,4,3,9,16,0,0,3,0,0,1,0,0,1,0,0,0,0,0,33,18,8,16,0,0,0,0,0,2,0,1,14,6,8,2
    ,0,0,0,0,1,1)
    
    Data <- data.frame(Count,Country)
    
    #create neighbour matrix
    NB <- list()
    NB$'East Timor' <- c(1,2,15)
    NB$Australia <- c(1,2,15)
    NB$'Sri Lanka' <-c(3,12,16)
    NB$Bangladesh <-c(4,12,13,16)
    NB$Philippines <- c(5,6,11,14,15,17)
    NB$Taiwan <- c(5,6,11)
    NB$Thailand <- c(7,8,10,12,13,14,15)
    NB$Vietnam <- c(7,8,10,11,14,15)
    NB$`South Korea` <- c(9,11)
    NB$Cambodia <- c(7,8,10)
    NB$China <- c(5,6,8,9,11,14)
    NB$India <- c(3,4,7,12,13,15,16)
    NB$Myanmar <- c(4,7,12,13,16)
    NB$Malaysia <- c(5,7,8,11,14,15)
    NB$Indonesia <- c(1,2,5,7,8,12,14,15)
    NB$HighSeas2 <- c(3,4,12,13,16)
    NB$HighSeas1 <- c(5,17)
    
    #check levels and names match
    all.equal(sort(names(NB)), sort(levels(Data$Country)))
    
    #try fitting GAM
    m1 <- gam(Data$Count ~ s(Data$Country, bs = 'mrf', xt = list(nb = NB)))
#
On 3/18/20 12:44 AM, Wilcox, Chris (O&A, Hobart) wrote:
I'm not seeing any spatial data being defined, so I'm puzzled by the 
expectation that this is yet a markov random field problem. You appear 
to be following the last part of the example code in 
?smooth.construct.mrf.smooth.spec {mgcv} without constructing your data 
set to match the structure of the `columb` example dataset.

str(columb)

#------------------

'data.frame': 49 obs. of? 8 variables:
 ?$ area????? : num? 0.3094 0.2593 0.1925 0.0838 0.4889 ...
 ?$ home.value: num? 80.5 44.6 26.4 33.2 23.2 ...
 ?$ income??? : num? 19.53 21.23 15.96 4.48 11.25 ...
 ?$ crime???? : num? 15.7 18.8 30.6 32.4 50.7 ...
 ?$ open.space: num? 2.851 5.297 4.535 0.394 0.406 ...
 ?$ district? : Factor w/ 49 levels "0","1","2","3",..: 1 2 3 4 5 6 7 8 
9 10 ...
 ?$ x???????? : num? 8.83 8.33 9.01 8.46 9.01 ...
 ?$ y???????? : num? 14.4 14 13.8 13.7 13.3 ...


You are also committing a common R-beginner error in accessing columns 
of a data object directly in a formula while failing to use a data 
argument for a regression call.
#
Hi David,

Thanks for the comments.  I am running the analysis on a mac OS 10.12.6, using R R 3.5.3 GUI 1.70 El Capitan build (7632), and mgcv 1.8-31.  

I am aware of the procedure of using data = "xx" in a call to gam.  I am having a strange issue locally, in that gam does not see to be able to find the response variable (Count) if I use the standard way of handing gam the data.  That is the reason I wrote the gam function as I did.  I wanted to ignore that issue for the moment.  I have not had it happen with mgcv before, despite having used it for 10 years or so.

In reference to your question about spatial data, see the section of the code that starts with #create neighbourhood matrix.  This should say list, not matrix.  But it is the indices of the levels in the Country variable that are neighbours.  This list defines the parameter nb in the call to the function that constructs the mrf.

Thanks for spending the time to look at the problem.  I did not use the columb structure, but if you read down in the help for the smooth constructor function you can see that Simon has an example using nb that is like the one I submitted.

Cheers,

Chris
?On 19/3/20, 3:41 am, "David Winsemius" <dwinsemius at comcast.net> wrote:

        
On 3/18/20 12:44 AM, Wilcox, Chris (O&A, Hobart) wrote:
> Hi all,
    >      
    >      I am trying to fit a model with a markov random field smooth in mgcv.  I am having some trouble with getting it to run, and in particular I am getting the message
    >      
    >      Error in initial.sp(w * x, S, off) : S[[1]] matrix is not +ve definite.
    >      
    >      After reading everything I could find on mrf, it sounds like there was a bug that was brought up with Simon Wood in 2012, due to differences between windows and linux, with the linus machine stopping due to this error, while windows was not.  I have not been able to find much else on it.  Any suggestions would be much appreciated.
    >      
    >      There is reproducible code below.
    >      
    >      Thanks
    >      
    >      Chris
    >      
    >      
    >      library(mgcv)
    >      
    >      #create data
    >      Country <- as.factor(c("Australia","Australia","Australia","Australia","Australia","Australia","Bangladesh","Bangladesh","Bangladesh",
    >      "Bangladesh","Bangladesh","Bangladesh","Cambodia","Cambodia","Cambodia","Cambodia","Cambodia","Cambodia",
    >      "China","China","China","China","China","China","East Timor","East Timor","East Timor",
    >      "East Timor","East Timor","East Timor","HighSeas1","HighSeas1","HighSeas1","HighSeas1","HighSeas1","HighSeas1",
    >      "HighSeas2","HighSeas2","HighSeas2","HighSeas2","HighSeas2","HighSeas2","China","China","China","China","China","China",
    >      "India","India","India","India","India","India","Indonesia","Indonesia","Indonesia","Indonesia","Indonesia","Indonesia",
    >      "Malaysia","Malaysia","Malaysia","Malaysia","Malaysia","Malaysia","Myanmar","Myanmar","Myanmar","Myanmar","Myanmar",
    >      "Myanmar","Philippines","Philippines","Philippines","Philippines","Philippines","Philippines","South Korea","South Korea",
    >      "South Korea","South Korea","South Korea","South Korea","China","China","China","China","China","China",
    >      "Sri Lanka","Sri Lanka","Sri Lanka","Sri Lanka","Sri Lanka","Sri Lanka","Taiwan","Taiwan","Taiwan","Taiwan",
    >      "Taiwan","Taiwan","Thailand","Thailand","Thailand","Thailand","Thailand","Thailand","Vietnam","Vietnam","Vietnam","Vietnam",
    >      "Vietnam","Vietnam"))
    >      
    >      Count <- c(0,0,3,5,1,5,0,0,0,0,0,1,0,0,0,0,0,3,0,0,2,1,0,6,0,0,0,1,0,0,0,1,0,0,0,0
    >      ,0,0,20,0,1,0,0,0,0,0,0,2,0,0,6,3,3,10,1,1,18,11,8,11,0,1,2,2,1,14,0,0,0,1,0,0
    >      ,0,0,4,3,9,16,0,0,3,0,0,1,0,0,1,0,0,0,0,0,33,18,8,16,0,0,0,0,0,2,0,1,14,6,8,2
    >      ,0,0,0,0,1,1)
    >      
    >      Data <- data.frame(Count,Country)
    
    
    I'm not seeing any spatial data being defined, so I'm puzzled by the 
    expectation that this is yet a markov random field problem. You appear 
    to be following the last part of the example code in 
    ?smooth.construct.mrf.smooth.spec {mgcv} without constructing your data 
    set to match the structure of the `columb` example dataset.
    
    str(columb)
    
    #------------------
    
    'data.frame': 49 obs. of  8 variables:
      $ area      : num  0.3094 0.2593 0.1925 0.0838 0.4889 ...
      $ home.value: num  80.5 44.6 26.4 33.2 23.2 ...
      $ income    : num  19.53 21.23 15.96 4.48 11.25 ...
      $ crime     : num  15.7 18.8 30.6 32.4 50.7 ...
      $ open.space: num  2.851 5.297 4.535 0.394 0.406 ...
      $ district  : Factor w/ 49 levels "0","1","2","3",..: 1 2 3 4 5 6 7 8 
    9 10 ...
      $ x         : num  8.83 8.33 9.01 8.46 9.01 ...
      $ y         : num  14.4 14 13.8 13.7 13.3 ...
    
    
    You are also committing a common R-beginner error in accessing columns 
    of a data object directly in a formula while failing to use a data 
    argument for a regression call.
    
    -- 
    
    David.
    
    >      
    >      #create neighbour matrix
    >      NB <- list()
    >      NB$'East Timor' <- c(1,2,15)
    >      NB$Australia <- c(1,2,15)
    >      NB$'Sri Lanka' <-c(3,12,16)
    >      NB$Bangladesh <-c(4,12,13,16)
    >      NB$Philippines <- c(5,6,11,14,15,17)
    >      NB$Taiwan <- c(5,6,11)
    >      NB$Thailand <- c(7,8,10,12,13,14,15)
    >      NB$Vietnam <- c(7,8,10,11,14,15)
    >      NB$`South Korea` <- c(9,11)
    >      NB$Cambodia <- c(7,8,10)
    >      NB$China <- c(5,6,8,9,11,14)
    >      NB$India <- c(3,4,7,12,13,15,16)
    >      NB$Myanmar <- c(4,7,12,13,16)
    >      NB$Malaysia <- c(5,7,8,11,14,15)
    >      NB$Indonesia <- c(1,2,5,7,8,12,14,15)
    >      NB$HighSeas2 <- c(3,4,12,13,16)
    >      NB$HighSeas1 <- c(5,17)
    >      
    >      #check levels and names match
    >      all.equal(sort(names(NB)), sort(levels(Data$Country)))
    >      
    >      #try fitting GAM
    >      m1 <- gam(Data$Count ~ s(Data$Country, bs = 'mrf', xt = list(nb = NB)))
    >      
    >      
    >
    > ______________________________________________
    > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
    > https://stat.ethz.ch/mailman/listinfo/r-help
    > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    > and provide commented, minimal, self-contained, reproducible code.
6 days later
#
Hi Chris,

It's kind of a documentation glitch, a node is not supposed to be listed 
as its own neighbour (it causes the diagonal entries in the penalty 
matrix to be over-written by the wrong value). i.e. the neighbour list 
should be.

 ?NB <- list()
 ??? NB$'East Timor' <- c(2,15)
 ??? NB$Australia <- c(1,15)
 ??? NB$'Sri Lanka' <-c(12,16)
 ??? NB$Bangladesh <-c(12,13,16)
 ??? NB$Philippines <- c(6,11,14,15,17)
 ??? NB$Taiwan <- c(5,11)
 ??? NB$Thailand <- c(8,10,12,13,14,15)
 ??? NB$Vietnam <- c(7,10,11,14,15)
 ??? NB$`South Korea` <- c(11)
 ??? NB$Cambodia <- c(7,8)
 ??? NB$China <- c(5,6,8,9,14)
 ??? NB$India <- c(3,4,7,13,15,16)
 ??? NB$Myanmar <- c(4,7,12,16)
 ??? NB$Malaysia <- c(5,7,8,11,15)
 ??? NB$Indonesia <- c(1,2,5,7,8,12,14)
 ??? NB$HighSeas2 <- c(3,4,12,13)
 ??? NB$HighSeas1 <- c(5)

I've fixed the help page and had the smooth constructor ignore 
auto-neighbours for the next release.

best,
Simon
? On 18/03/2020 07:44, Wilcox, Chris (O&A, Hobart) wrote: