This seems to be a recurrent topic, but I don't remember hearing a definitive answer. I also apologies for cross-posting. Say I have a numerical response variable and a bunch of multi-level factors I want to use for modeling. I don't expect factor interaction to be important so there will be no interactions in the model. All this would be a perfect job for ANOVA except for one additional requirement: all factors are ordered, i.e. for any two levels in a factor one is bigger than the other, and I want the regression to preserve the ordering, that is to be monotonic. I can think of two definitions of monotonicity, weak and strong, and in case it matters I am more interested in the strong one, which I define as follows: If L1 < L2 within some factor F than prediction(L1, other factors) < prediction(L2, other factors) for ANY SELECTION of 'other factors' levels. (A weak monotonicity could be defined as prediction(L1) < prediction(L2) on average). Any reference to S, R, C code / papers which address this topic will be highly appreciated. Similar question could be asked about additive models, i.e. estimate an additive model with each (or some) functions being monotonic. Thanks, Vadim -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Regressions with monotonicity constraints
4 messages · Vadim Ogranovich, Thomas Lumley, Frank E Harrell Jr +1 more
On Mon, 12 Mar 2001, Vadim Ogranovich wrote:
This seems to be a recurrent topic, but I don't remember hearing a definitive answer. I also apologies for cross-posting. Say I have a numerical response variable and a bunch of multi-level factors I want to use for modeling. I don't expect factor interaction to be important so there will be no interactions in the model. All this would be a perfect job for ANOVA except for one additional requirement: all factors are ordered, i.e. for any two levels in a factor one is bigger than the other, and I want the regression to preserve the ordering, that is to be monotonic. I can think of two definitions of monotonicity, weak and strong, and in case it matters I am more interested in the strong one, which I define as follows: If L1 < L2 within some factor F than prediction(L1, other factors) < prediction(L2, other factors) for ANY SELECTION of 'other factors' levels. (A weak monotonicity could be defined as prediction(L1) < prediction(L2) on average).
in the absence of interactions these are equivalent, surely?
One-dimensional monotonic regression is easy using the
Pool-Adjacent-Violators algorithm. I think an additive monotonic model
could be estimated by backfitting as for generalised linear models: update
the function for each predictor in turn by using the one-dimensional
algorithm on the partial residuals from the previous iteration.
Backfitting is described in "Generalized Additive Models" by Hastie &
Tibshirani (Chapman & Hall) and probably other places.
The more difficult case is a non-additive model (ie interactions). For
discrete factors there were two papers in the Journal of Computational &
Graphical Statistics by Qian and colleagues. One of them was
@article{qian:eddy:1996,
Author = {Qian, Shixian and Eddy, William F.},
Title = {An Algorithm for Isotonic Regression on Ordered Rectangular
Grids},
Year = 1996,
Journal = {Journal of Computational and Graphical Statistics},
Volume = 5,
Pages = {225--235},
Keywords = {[Block class algorithm]; [Monotone regression]; [Recursive
partitioning]; [Sandwich algorithm]}
}
Any reference to S, R, C code / papers which address this topic will be highly appreciated.
The following does one-dimensional isotonic regression. I believe similar
code has been posted before. It is recursive but seems fast enough for
many uses. It returns a compressed form of the result where the first
component has the distinct fitted values and the second indicates the
number of repeats.
-thomas
Thomas Lumley Asst. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle
pava.blocks<-function(x,w=rep(1,length(x)),b=rep(1,length(x)),up=T){
lasti<-1
if (length(x)==1) return(list(x=x,blocks=b,increasing=up))
for (i in 2:length(x)){
if (x[i]<=x[lasti] & up){
wtotal<-w[lasti]+w[i]
x[lasti]<-(x[lasti]*w[lasti]+x[i]*w[i])/wtotal
w[lasti]<-wtotal
b[lasti]<-b[i]+b[lasti]
b[i]<-0
}else if (x[i]<=x[lasti] & !up){
lasti<-i
}else if (x[i]>x[lasti] & !up){
wtotal<-w[lasti]+w[i]
x[lasti]<-(x[lasti]*w[lasti]+x[i]*w[i])/wtotal
w[lasti]<-wtotal
b[lasti]<-b[i]+b[lasti]
b[i]<-0
}else if(x[i]>x[lasti] & up){
lasti<-i
}
}
if (any(b==0)) {
subset<-(b>0)
return(pava.blocks(x[subset],w[subset],b[subset],up))
}
else return(list(x=x,blocks=b,increasing=up))
}
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Also take a look at Rob Tibshirani's "bootstrap bumping" in
which you select the optimum (best log-likelihood) parameter
estimates (over say 100 bootstrap fits) that satisified the
constraints. Bumping probably works better in smooth situations
such as placing monotonicity restrictions on splines, but
it may work in your case. -Frank Harrell
@TECHREPORT{tib97mod,
author = {Tibshirani, Robert and Knight, Keith},
year = 1997,
title = {Model search and inference by bootstrap ``bumping''},
note = {Presented at the Joint Statistical Meetings, Chicago, August
1996},
address = {{\tt http://www-stat.stanford.edu/~tibs/}},
institution = {Department of Statistics, University of Toronto},
annote = {simultaneous confidence regions; confidence sets;
constrained
optimization; CART; improving performance of estimators via
the
bootstrap}
}
Thomas Lumley wrote:
On Mon, 12 Mar 2001, Vadim Ogranovich wrote:
This seems to be a recurrent topic, but I don't remember hearing a definitive answer. I also apologies for cross-posting. Say I have a numerical response variable and a bunch of multi-level factors I want to use for modeling. I don't expect factor interaction to be important so there will be no interactions in the model. All this would be a perfect job for ANOVA except for one additional requirement: all factors are ordered, i.e. for any two levels in a factor one is bigger than the other, and I want the regression to preserve the ordering, that is to be monotonic. I can think of two definitions of monotonicity, weak and strong, and in case it matters I am more interested in the strong one, which I define as follows: If L1 < L2 within some factor F than prediction(L1, other factors) < prediction(L2, other factors) for ANY SELECTION of 'other factors' levels. (A weak monotonicity could be defined as prediction(L1) < prediction(L2) on average).
in the absence of interactions these are equivalent, surely?
One-dimensional monotonic regression is easy using the
Pool-Adjacent-Violators algorithm. I think an additive monotonic model
could be estimated by backfitting as for generalised linear models: update
the function for each predictor in turn by using the one-dimensional
algorithm on the partial residuals from the previous iteration.
Backfitting is described in "Generalized Additive Models" by Hastie &
Tibshirani (Chapman & Hall) and probably other places.
The more difficult case is a non-additive model (ie interactions). For
discrete factors there were two papers in the Journal of Computational &
Graphical Statistics by Qian and colleagues. One of them was
@article{qian:eddy:1996,
Author = {Qian, Shixian and Eddy, William F.},
Title = {An Algorithm for Isotonic Regression on Ordered Rectangular
Grids},
Year = 1996,
Journal = {Journal of Computational and Graphical Statistics},
Volume = 5,
Pages = {225--235},
Keywords = {[Block class algorithm]; [Monotone regression]; [Recursive
partitioning]; [Sandwich algorithm]}
}
Any reference to S, R, C code / papers which address this topic will be highly appreciated.
The following does one-dimensional isotonic regression. I believe similar
code has been posted before. It is recursive but seems fast enough for
many uses. It returns a compressed form of the result where the first
component has the distinct fitted values and the second indicates the
number of repeats.
-thomas
Thomas Lumley Asst. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle
pava.blocks<-function(x,w=rep(1,length(x)),b=rep(1,length(x)),up=T){
lasti<-1
if (length(x)==1) return(list(x=x,blocks=b,increasing=up))
for (i in 2:length(x)){
if (x[i]<=x[lasti] & up){
wtotal<-w[lasti]+w[i]
x[lasti]<-(x[lasti]*w[lasti]+x[i]*w[i])/wtotal
w[lasti]<-wtotal
b[lasti]<-b[i]+b[lasti]
b[i]<-0
}else if (x[i]<=x[lasti] & !up){
lasti<-i
}else if (x[i]>x[lasti] & !up){
wtotal<-w[lasti]+w[i]
x[lasti]<-(x[lasti]*w[lasti]+x[i]*w[i])/wtotal
w[lasti]<-wtotal
b[lasti]<-b[i]+b[lasti]
b[i]<-0
}else if(x[i]>x[lasti] & up){
lasti<-i
}
}
if (any(b==0)) {
subset<-(b>0)
return(pava.blocks(x[subset],w[subset],b[subset],up))
}
else return(list(x=x,blocks=b,increasing=up))
}
---------------------------------------------------------------------
This message was distributed by s-news at lists.biostat.wustl.edu. To
unsubscribe send e-mail to s-news-request at lists.biostat.wustl.edu with
the BODY of the message: unsubscribe s-news
Frank E Harrell Jr Prof. of Biostatistics & Statistics Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
3 days later
On Mon, 12 Mar 2001, Vadim Ogranovich wrote:
This seems to be a recurrent topic, but I don't remember hearing a definitive answer. I also apologies for cross-posting. Say I have a numerical response variable and a bunch of multi-level factors I want to use for modeling. I don't expect factor interaction to be important so there will be no interactions in the model. All this would be a perfect job for ANOVA except for one additional requirement: all factors are ordered, i.e. for any two levels in a factor one is bigger than the other, and I want the regression to preserve the ordering, that is to be monotonic.
The answers that Thomas gave are probably more theoretically sound than
the below, but from a simple computational standpoint here are a couple of
ideas:
ACE and AVAS both allow monotone constraints, just constrain y to be
linear and replace your odered factors with integers (codes) and constrain
them to be monotone in the call to ace or avas, you'll need to take the
output and do a final regression on it and change the degrees of freedom,
but it should be a decent exploratory tool.
A second option would be to specify a contrasts matrix that is lower
triangular, i.e. for 4 levels:
[,1] [,2] [,3]
[1,] 0 0 0
[2,] 1 0 0
[3,] 1 1 0
[4,] 1 1 1
then each beta is the measure of the difference in 2 levels. The S-PLUS
function nnls.fit will fit the linear regression constraining all
parameter estimates to be non-negative which will give you monotone
increasing in your ordered factors (though possibly with adjacent levels
equal). A contrast matrix like:
[,1] [,2] [,3]
[1,] 0 0 0
[2,] 1 0 0
[3,] 2 1 0
[4,] 3 2 1
would also work (using nnls.fit again), but this time it says that the
increase from one level to the next, must be greater than or equal to the
previous increase (positive second derivative).
hope this helps,
Greg Snow, PhD Office: 223A TMCB Department of Statistics Phone: (801) 378-7049 Brigham Young University Dept.: (801) 378-4505 Provo, UT 84602 email: gls at byu.edu -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._