Issues with drop.terms
Small follow-up: (1) in order for lm() to actually work you need keep.response=TRUE in the drop.terms() call (I realize that this is *not* the problem in your example) test4 <- terms(mpg ~ hp + I(cyl==4) + disp + wt ) check4 <- drop.terms(test4, 3, keep.response = TRUE) formula(check4) lm( check4, data=mtcars) (2) I'm ambivalent about your "We can argue that the user should have used I(cyl==4), but very many won't." argument. This is the ever-present "document precisely and require users to know and follow the documentation" vs. "try to protect users from themselves" debate - taking either side to an extreme is (IMO) unproductive. I don't know how hard it would be to make drop.terms() **not** drop parentheses, but it seems like it may be very hard/low-level. My vote would be to see if there is a reasonably robust way to detect these constructions and **warn** about them. I have probably asked about this before, but if anyone knows of useful materials that go into more details about the definitions and implementation of model matrix/terms/etc. machinery, *beyond* the appropriate chapter of "Statistical Models in S" (Becker/Chambers white book), *or* the source code itself, I would love some pointers ... Ben Bolker
On 8/23/21 10:36 AM, Therneau, Terry M., Ph.D. via R-devel wrote:
This is a follow-up to my earlier note on [.terms.?? Based on a couple days' work getting the survival package to work around? issues, this will hopefully be more concise and better expressed than the prior note. 1. test1 <- terms( y ~ x1:x2 + x3) check <- drop.terms(termobj =test1, dropx = 1) formula(check) ## ~x1:x2 The documentation for the dropx argument is "vector of positions of variables to drop from the right hand side of the model", but it is not clear what "positions" is.?? I originally assumed "the order in the formula as typed", but was wrong.?? I suggest adding a line "Position refers to the order of terms in the term.labels attribute of the terms object, which is also the order they will appear in a coefficient vector (not counting the intercept). 2. library(splines) test2 <- terms(model.frame(mpg ~? offset(cyl) + ns(hp, df=3) + disp + wt, data=mtcars)) check2 <- drop.terms(test2,? dropx = 2) formula(check2) ## ~ns(hp, df=3) + wt One side effect of how drop.terms is implemented, and one that I suspect was not intended, is that offsets are completly ignored.??? The above drops both the offset and the disp term from the formula ? The dataClasses and predvars attributes of the result are also incorrect: they have lost the ns() term rather than the disp term; the results of predict will be incorrect. attr(check2, "predvars") ## ?? list(offset(cyl), disp, wt) Question: should the function be updated to not drop offsets? If not a line needs to be added to the help file. ? The handling of predvars needs to be fixed regardless. 3. test3 <- terms(mpg ~ hp + (cyl==4) + disp + wt ) check3 <- drop.terms(test3, 3) formula(check3) lm( check3, data=mtcars)?? # fails The drop.terms action has lost the () around the logical expression, which leads to an invalid formula.? We can argue that the user should have used I(cyc==4), but very many won't. 4. As a footnote, more confusion (for me) is generated by the fact that the "specials" attribute of a formula does not use the numbering discussed in 1 above.?? I had solved this issue long ago in the untangle.specials function; long enough ago that I forgot I had solved it, and just wasted a day rediscovering that fact. --- I can create a patch for 1 and 2 (once we answer my question), but a fix for 3 is not clear to me.? It currently leads to failure in a coxph call that includes a strata so I am directly interested in a solution; e.g.,? coxph(Surv(time, status) ~ age + (ph.ecog==2) + strata(inst), data=lung) Terry T
Dr. Benjamin Bolker Professor, Mathematics & Statistics and Biology, McMaster University Director, School of Computational Science and Engineering Graduate chair, Mathematics & Statistics