changes in coxph in "survival" from older version?
Hi Tao, For you situation (and even MUCH larger number of events), multivariable modeling will be unreliable unless you use shrinkage, variable selection will select the wrong variables, and univariable screening leads to massive bias in later stages. Terry converted me from SAS to S-Plus in 1991 when I visited Mayo Clinic and he showed me how natural the language was to put a loop around the kind of stepwise analyses requested by users. The bootstrap showed that the list of predictors selected was very random. Another demonstration of this is to bootstrap the ranks of the predictors, ranked by any measure you want (adjusted chi-square, univariable chi-square, ROC area). The confidence intervals for the ranks will be extremely wide. Frank
Shi, Tao wrote:
Thank you, Frank and Terry, for all your answers! I'll upgrade my "survival" package for sure! It seems to me that you two are pointing to two different issues: 1) Is stepwise model selection a good approach (for any data)? 2) Whether the data I have has enough information that even worth to model? For #1, I'm not in a good position to judge and need to read up on it. For #2, I'm still a bit confused about Terry's last comment. If we forget about multivariate model building and just look at variable one by one and select the best predictor (let's say it's highly significant, e.g. p<0.0001), the resulting univariate model still can be wrong? What if I use this data as a validation set to validate an existing model? Anything different? Many thanks! ...Tao ----- Original Message ----
From: Frank Harrell <f.harrell at vanderbilt.edu> To: r-help at r-project.org Sent: Tue, May 17, 2011 10:51:02 AM Subject: Re: [R] changes in coxph in "survival" from older version? It's worse if the model does converge because then you don't have a warning about the result being nonsense. Frank Terry Therneau-2 wrote:
-- begin included message --- I did realize that there are way more predictors in the model. My initial thinking was use that as an initial model for stepwise model selection. Now I wonder if the model selection result is still valid if the initial model didn't even converge? --- end inclusion --- You have 17 predictors with only 22 events. All methods of "variable selection" in such a scenario will give essentially random results. There is simply not enough information present to determine a best predictor or best subset of predictors. Terry Therneau
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
----- Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/changes-in-coxph-in-survival-from-older-version-tp3516101p3530024.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
----- Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/changes-in-coxph-in-survival-from-older-version-tp3516101p3537322.html Sent from the R help mailing list archive at Nabble.com.