Hi there, This is not a strictly related R question; I apologize for this. I'm interest in simulate the sampling distribution of the LRT, testing for trend among the levels of some categorical variable X in a regression model. (in practice this is achieved by assigning scores to the levels of X and fitting such numeric variable). To simulate the sampling distribution the steps are: 1. Simulate the model *without* trend 2. for every sample compare the model with and without the "X by score" variable. My question is which scores should I use? It is well known that the score affect the test, so which score have I to use to get the LRT? Can different values lead to different null distributions? Any suggestion is coming? Many Thanks, vito -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
score in LRT testing for trend
2 messages · vito muggeo, Thomas Lumley
On Thu, 6 Jun 2002, vito muggeo wrote:
Hi there, This is not a strictly related R question; I apologize for this. I'm interest in simulate the sampling distribution of the LRT, testing for trend among the levels of some categorical variable X in a regression model. (in practice this is achieved by assigning scores to the levels of X and fitting such numeric variable). To simulate the sampling distribution the steps are: 1. Simulate the model *without* trend 2. for every sample compare the model with and without the "X by score" variable. My question is which scores should I use? It is well known that the score affect the test, so which score have I to use to get the LRT? Can different values lead to different null distributions? Any suggestion is coming?
The maximum likelihood ratio test of constant vs non-decreasing trend is *not* equivalent to any set of scores and has a different null distribution (asymptotically a mixture of chi-squared variables with different degrees of freedom). That's why most people instead assign scores, which is much simpler and has good power against most interesting alternatives. If you want to use a test with scores then use the scores you want to use. The LRT is equivalent to using the best non-decreasing set of scores (found by isotonic regression) for each dataset, and the reason for the strange limiting distribution is to take account of this adaptive choice of scores. -thomas Thomas Lumley Asst. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._