Hello,
I'm working on a survreg model where the full data are subset for modeling
individual parts of the data separately. When subsetting, the fit variable
("treatment" in the example below) has levels that are not in the data. A
work-around for this is to drop the levels, but it seems inaccurate to have
the `coef()` method provide zero as the coefficient for the level without
data.
Why does coef(model) provide zero as the coefficient for treatment instead
of NA? Is this a bug?
Thanks,
Bill
``` r
library(survival)
library(emmeans)
my_data <-
data.frame(
value=c(rep(1, 5), 6:10),
treatment=factor(rep(c("A", "B"), each=5), levels=c("A", "B", "C"))
)
my_data$cens <- c(0, 1)[(my_data$value == 1) + 1]
model <- survreg(Surv(time=value, event=cens)~treatment, data=my_data)
#> Warning in survreg.fit(X, Y, weights, offset, init = init, controlvals =
#> control, : Ran out of iterations and did not converge
coef(model)
#> (Intercept) treatmentB treatmentC
#> 0.08588218 2.40341893 0.00000000
model$coef
#> (Intercept) treatmentB treatmentC
#> 0.08588218 2.40341893 NA
model$coefficients
#> (Intercept) treatmentB treatmentC
#> 0.08588218 2.40341893 0.00000000
print(model)
#> Call:
#> survreg(formula = Surv(time = value, event = cens) ~ treatment,
#> data = my_data)
#>
#> Coefficients: (1 not defined because of singularities)
#> (Intercept) treatmentB treatmentC
#> 0.08588218 2.40341893 NA
#>
#> Scale= 0.09832254
#>
#> Loglik(model)= 4.9 Loglik(intercept only)= -15
#> Chisq= 39.92 on 2 degrees of freedom, p= 2.15e-09
#> n= 10
summary(model)
#>
#> Call:
#> survreg(formula = Surv(time = value, event = cens) ~ treatment,
#> data = my_data)
#> Value Std. Error z p
#> (Intercept) 0.0859 0.0681 1.26 0.21
#> treatmentB 2.4034 0.2198 10.93 <2e-16
#> treatmentC 0.0000 0.0000 NA NA
#> Log(scale) -2.3195 0.0000 -Inf <2e-16
#>
#> Scale= 0.0983
#>
#> Weibull distribution
#> Loglik(model)= 4.9 Loglik(intercept only)= -15
#> Chisq= 39.92 on 2 degrees of freedom, p= 2.1e-09
#> Number of Newton-Raphson Iterations: 30
#> n= 10
ref_grid(model)
#> Error in ref_grid(model): Something went wrong:
#> Non-conformable elements in reference grid.
my_data_correct_levels <- my_data
my_data_correct_levels$treatment <-
droplevels(my_data_correct_levels$treatment)
model_correct <- survreg(Surv(time=value, event=cens)~treatment,
data=my_data_correct_levels)
#> Warning in survreg.fit(X, Y, weights, offset, init = init, controlvals =
#> control, : Ran out of iterations and did not converge
coef(model_correct)
#> (Intercept) treatmentB
#> 0.08588218 2.40341893
print(model_correct)
#> Call:
#> survreg(formula = Surv(time = value, event = cens) ~ treatment,
#> data = my_data_correct_levels)
#>
#> Coefficients:
#> (Intercept) treatmentB
#> 0.08588218 2.40341893
#>
#> Scale= 0.09832254
#>
#> Loglik(model)= 4.9 Loglik(intercept only)= -15
#> Chisq= 39.92 on 1 degrees of freedom, p= 2.65e-10
#> n= 10
summary(model_correct)
#>
#> Call:
#> survreg(formula = Surv(time = value, event = cens) ~ treatment,
#> data = my_data_correct_levels)
#> Value Std. Error z p
#> (Intercept) 0.0859 0.0681 1.26 0.21
#> treatmentB 2.4034 0.2198 10.93 <2e-16
#> Log(scale) -2.3195 0.0000 -Inf <2e-16
#>
#> Scale= 0.0983
#>
#> Weibull distribution
#> Loglik(model)= 4.9 Loglik(intercept only)= -15
#> Chisq= 39.92 on 1 degrees of freedom, p= 2.6e-10
#> Number of Newton-Raphson Iterations: 30
#> n= 10
ref_grid(model_correct)
#> 'emmGrid' object with variables:
#> treatment = A, B
#> Transformation: "log"
```
<sup>Created on 2021-02-23 by the [reprex
package](https://reprex.tidyverse.org) (v1.0.0)</sup>
print and coef Methods for survreg Differ
4 messages · biii m@iii@g oii de@@ey@ws, Jeff Newmiller
Model equations do not normally have conditional forms dependent on whether specific coefficients are NA or not. If you assign NA to a coefficient then you will not be able to predict outputs for input cases that you should be able to. Zero allows these expected cases to work... NA would prevent any useful prediction output.
On February 23, 2021 6:45:53 AM PST, bill at denney.ws wrote:
Hello,
I'm working on a survreg model where the full data are subset for
modeling
individual parts of the data separately. When subsetting, the fit
variable
("treatment" in the example below) has levels that are not in the data.
A
work-around for this is to drop the levels, but it seems inaccurate to
have
the `coef()` method provide zero as the coefficient for the level
without
data.
Why does coef(model) provide zero as the coefficient for treatment
instead
of NA? Is this a bug?
Thanks,
Bill
``` r
library(survival)
library(emmeans)
my_data <-
data.frame(
value=c(rep(1, 5), 6:10),
treatment=factor(rep(c("A", "B"), each=5), levels=c("A", "B", "C"))
)
my_data$cens <- c(0, 1)[(my_data$value == 1) + 1]
model <- survreg(Surv(time=value, event=cens)~treatment, data=my_data)
#> Warning in survreg.fit(X, Y, weights, offset, init = init,
controlvals =
#> control, : Ran out of iterations and did not converge
coef(model)
#> (Intercept) treatmentB treatmentC
#> 0.08588218 2.40341893 0.00000000
model$coef
#> (Intercept) treatmentB treatmentC
#> 0.08588218 2.40341893 NA
model$coefficients
#> (Intercept) treatmentB treatmentC
#> 0.08588218 2.40341893 0.00000000
print(model)
#> Call:
#> survreg(formula = Surv(time = value, event = cens) ~ treatment,
#> data = my_data)
#>
#> Coefficients: (1 not defined because of singularities)
#> (Intercept) treatmentB treatmentC
#> 0.08588218 2.40341893 NA
#>
#> Scale= 0.09832254
#>
#> Loglik(model)= 4.9 Loglik(intercept only)= -15
#> Chisq= 39.92 on 2 degrees of freedom, p= 2.15e-09
#> n= 10
summary(model)
#>
#> Call:
#> survreg(formula = Surv(time = value, event = cens) ~ treatment,
#> data = my_data)
#> Value Std. Error z p
#> (Intercept) 0.0859 0.0681 1.26 0.21
#> treatmentB 2.4034 0.2198 10.93 <2e-16
#> treatmentC 0.0000 0.0000 NA NA
#> Log(scale) -2.3195 0.0000 -Inf <2e-16
#>
#> Scale= 0.0983
#>
#> Weibull distribution
#> Loglik(model)= 4.9 Loglik(intercept only)= -15
#> Chisq= 39.92 on 2 degrees of freedom, p= 2.1e-09
#> Number of Newton-Raphson Iterations: 30
#> n= 10
ref_grid(model)
#> Error in ref_grid(model): Something went wrong:
#> Non-conformable elements in reference grid.
my_data_correct_levels <- my_data
my_data_correct_levels$treatment <-
droplevels(my_data_correct_levels$treatment)
model_correct <- survreg(Surv(time=value, event=cens)~treatment,
data=my_data_correct_levels)
#> Warning in survreg.fit(X, Y, weights, offset, init = init,
controlvals =
#> control, : Ran out of iterations and did not converge
coef(model_correct)
#> (Intercept) treatmentB
#> 0.08588218 2.40341893
print(model_correct)
#> Call:
#> survreg(formula = Surv(time = value, event = cens) ~ treatment,
#> data = my_data_correct_levels)
#>
#> Coefficients:
#> (Intercept) treatmentB
#> 0.08588218 2.40341893
#>
#> Scale= 0.09832254
#>
#> Loglik(model)= 4.9 Loglik(intercept only)= -15
#> Chisq= 39.92 on 1 degrees of freedom, p= 2.65e-10
#> n= 10
summary(model_correct)
#>
#> Call:
#> survreg(formula = Surv(time = value, event = cens) ~ treatment,
#> data = my_data_correct_levels)
#> Value Std. Error z p
#> (Intercept) 0.0859 0.0681 1.26 0.21
#> treatmentB 2.4034 0.2198 10.93 <2e-16
#> Log(scale) -2.3195 0.0000 -Inf <2e-16
#>
#> Scale= 0.0983
#>
#> Weibull distribution
#> Loglik(model)= 4.9 Loglik(intercept only)= -15
#> Chisq= 39.92 on 1 degrees of freedom, p= 2.6e-10
#> Number of Newton-Raphson Iterations: 30
#> n= 10
ref_grid(model_correct)
#> 'emmGrid' object with variables:
#> treatment = A, B
#> Transformation: "log"
```
<sup>Created on 2021-02-23 by the [reprex
package](https://reprex.tidyverse.org) (v1.0.0)</sup>
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Sent from my phone. Please excuse my brevity.
How should you be able to make a prediction (using this type of model) from a state where there is no data such as treatment="C" in my example? -----Original Message----- From: Jeff Newmiller <jdnewmil at dcn.davis.ca.us> Sent: Tuesday, February 23, 2021 11:10 AM To: r-help at r-project.org; bill at denney.ws Subject: Re: [R] print and coef Methods for survreg Differ Model equations do not normally have conditional forms dependent on whether specific coefficients are NA or not. If you assign NA to a coefficient then you will not be able to predict outputs for input cases that you should be able to. Zero allows these expected cases to work... NA would prevent any useful prediction output.
On February 23, 2021 6:45:53 AM PST, bill at denney.ws wrote:
Hello,
I'm working on a survreg model where the full data are subset for
modeling individual parts of the data separately. When subsetting, the
fit variable ("treatment" in the example below) has levels that are not
in the data.
A
work-around for this is to drop the levels, but it seems inaccurate to
have the `coef()` method provide zero as the coefficient for the level
without data.
Why does coef(model) provide zero as the coefficient for treatment
instead of NA? Is this a bug?
Thanks,
Bill
``` r
library(survival)
library(emmeans)
my_data <-
data.frame(
value=c(rep(1, 5), 6:10),
treatment=factor(rep(c("A", "B"), each=5), levels=c("A", "B", "C"))
)
my_data$cens <- c(0, 1)[(my_data$value == 1) + 1]
model <- survreg(Surv(time=value, event=cens)~treatment, data=my_data)
#> Warning in survreg.fit(X, Y, weights, offset, init = init,
controlvals =
#> control, : Ran out of iterations and did not converge
coef(model)
#> (Intercept) treatmentB treatmentC
#> 0.08588218 2.40341893 0.00000000
model$coef
#> (Intercept) treatmentB treatmentC
#> 0.08588218 2.40341893 NA
model$coefficients
#> (Intercept) treatmentB treatmentC
#> 0.08588218 2.40341893 0.00000000
print(model)
#> Call:
#> survreg(formula = Surv(time = value, event = cens) ~ treatment,
#> data = my_data)
#>
#> Coefficients: (1 not defined because of singularities)
#> (Intercept) treatmentB treatmentC
#> 0.08588218 2.40341893 NA
#>
#> Scale= 0.09832254
#>
#> Loglik(model)= 4.9 Loglik(intercept only)= -15
#> Chisq= 39.92 on 2 degrees of freedom, p= 2.15e-09
#> n= 10
summary(model)
#>
#> Call:
#> survreg(formula = Surv(time = value, event = cens) ~ treatment,
#> data = my_data)
#> Value Std. Error z p
#> (Intercept) 0.0859 0.0681 1.26 0.21
#> treatmentB 2.4034 0.2198 10.93 <2e-16
#> treatmentC 0.0000 0.0000 NA NA
#> Log(scale) -2.3195 0.0000 -Inf <2e-16
#>
#> Scale= 0.0983
#>
#> Weibull distribution
#> Loglik(model)= 4.9 Loglik(intercept only)= -15
#> Chisq= 39.92 on 2 degrees of freedom, p= 2.1e-09
#> Number of Newton-Raphson Iterations: 30
#> n= 10
ref_grid(model)
#> Error in ref_grid(model): Something went wrong:
#> Non-conformable elements in reference grid.
my_data_correct_levels <- my_data
my_data_correct_levels$treatment <-
droplevels(my_data_correct_levels$treatment)
model_correct <- survreg(Surv(time=value, event=cens)~treatment,
data=my_data_correct_levels)
#> Warning in survreg.fit(X, Y, weights, offset, init = init,
controlvals =
#> control, : Ran out of iterations and did not converge
coef(model_correct)
#> (Intercept) treatmentB
#> 0.08588218 2.40341893
print(model_correct)
#> Call:
#> survreg(formula = Surv(time = value, event = cens) ~ treatment,
#> data = my_data_correct_levels)
#>
#> Coefficients:
#> (Intercept) treatmentB
#> 0.08588218 2.40341893
#>
#> Scale= 0.09832254
#>
#> Loglik(model)= 4.9 Loglik(intercept only)= -15
#> Chisq= 39.92 on 1 degrees of freedom, p= 2.65e-10
#> n= 10
summary(model_correct)
#>
#> Call:
#> survreg(formula = Surv(time = value, event = cens) ~ treatment,
#> data = my_data_correct_levels)
#> Value Std. Error z p
#> (Intercept) 0.0859 0.0681 1.26 0.21
#> treatmentB 2.4034 0.2198 10.93 <2e-16
#> Log(scale) -2.3195 0.0000 -Inf <2e-16
#>
#> Scale= 0.0983
#>
#> Weibull distribution
#> Loglik(model)= 4.9 Loglik(intercept only)= -15
#> Chisq= 39.92 on 1 degrees of freedom, p= 2.6e-10
#> Number of Newton-Raphson Iterations: 30
#> n= 10
ref_grid(model_correct)
#> 'emmGrid' object with variables:
#> treatment = A, B
#> Transformation: "log"
```
<sup>Created on 2021-02-23 by the [reprex
package](https://reprex.tidyverse.org) (v1.0.0)</sup>
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Sent from my phone. Please excuse my brevity.
You can't. But in order to be able to predict for states that _were_ in the training data, the coefficient cannot be NA.
On February 23, 2021 8:40:43 AM PST, bill at denney.ws wrote:
How should you be able to make a prediction (using this type of model) from a state where there is no data such as treatment="C" in my example? -----Original Message----- From: Jeff Newmiller <jdnewmil at dcn.davis.ca.us> Sent: Tuesday, February 23, 2021 11:10 AM To: r-help at r-project.org; bill at denney.ws Subject: Re: [R] print and coef Methods for survreg Differ Model equations do not normally have conditional forms dependent on whether specific coefficients are NA or not. If you assign NA to a coefficient then you will not be able to predict outputs for input cases that you should be able to. Zero allows these expected cases to work... NA would prevent any useful prediction output. On February 23, 2021 6:45:53 AM PST, bill at denney.ws wrote:
Hello, I'm working on a survreg model where the full data are subset for modeling individual parts of the data separately. When subsetting,
the
fit variable ("treatment" in the example below) has levels that are
not
in the data. A work-around for this is to drop the levels, but it seems inaccurate to
have the `coef()` method provide zero as the coefficient for the level
without data.
Why does coef(model) provide zero as the coefficient for treatment
instead of NA? Is this a bug?
Thanks,
Bill
``` r
library(survival)
library(emmeans)
my_data <-
data.frame(
value=c(rep(1, 5), 6:10),
treatment=factor(rep(c("A", "B"), each=5), levels=c("A", "B",
"C"))
) my_data$cens <- c(0, 1)[(my_data$value == 1) + 1] model <- survreg(Surv(time=value, event=cens)~treatment, data=my_data) #> Warning in survreg.fit(X, Y, weights, offset, init = init, controlvals = #> control, : Ran out of iterations and did not converge coef(model) #> (Intercept) treatmentB treatmentC #> 0.08588218 2.40341893 0.00000000 model$coef #> (Intercept) treatmentB treatmentC #> 0.08588218 2.40341893 NA model$coefficients #> (Intercept) treatmentB treatmentC #> 0.08588218 2.40341893 0.00000000 print(model) #> Call: #> survreg(formula = Surv(time = value, event = cens) ~ treatment, #> data = my_data) #> #> Coefficients: (1 not defined because of singularities) #> (Intercept) treatmentB treatmentC #> 0.08588218 2.40341893 NA #> #> Scale= 0.09832254 #> #> Loglik(model)= 4.9 Loglik(intercept only)= -15 #> Chisq= 39.92 on 2 degrees of freedom, p= 2.15e-09 #> n= 10 summary(model) #> #> Call: #> survreg(formula = Surv(time = value, event = cens) ~ treatment, #> data = my_data) #> Value Std. Error z p #> (Intercept) 0.0859 0.0681 1.26 0.21 #> treatmentB 2.4034 0.2198 10.93 <2e-16 #> treatmentC 0.0000 0.0000 NA NA #> Log(scale) -2.3195 0.0000 -Inf <2e-16 #> #> Scale= 0.0983 #> #> Weibull distribution #> Loglik(model)= 4.9 Loglik(intercept only)= -15 #> Chisq= 39.92 on 2 degrees of freedom, p= 2.1e-09 #> Number of Newton-Raphson Iterations: 30 #> n= 10 ref_grid(model) #> Error in ref_grid(model): Something went wrong: #> Non-conformable elements in reference grid. my_data_correct_levels <- my_data my_data_correct_levels$treatment <- droplevels(my_data_correct_levels$treatment) model_correct <- survreg(Surv(time=value, event=cens)~treatment, data=my_data_correct_levels) #> Warning in survreg.fit(X, Y, weights, offset, init = init, controlvals = #> control, : Ran out of iterations and did not converge coef(model_correct) #> (Intercept) treatmentB #> 0.08588218 2.40341893 print(model_correct) #> Call: #> survreg(formula = Surv(time = value, event = cens) ~ treatment, #> data = my_data_correct_levels) #> #> Coefficients: #> (Intercept) treatmentB #> 0.08588218 2.40341893 #> #> Scale= 0.09832254 #> #> Loglik(model)= 4.9 Loglik(intercept only)= -15 #> Chisq= 39.92 on 1 degrees of freedom, p= 2.65e-10 #> n= 10 summary(model_correct) #> #> Call: #> survreg(formula = Surv(time = value, event = cens) ~ treatment, #> data = my_data_correct_levels) #> Value Std. Error z p #> (Intercept) 0.0859 0.0681 1.26 0.21 #> treatmentB 2.4034 0.2198 10.93 <2e-16 #> Log(scale) -2.3195 0.0000 -Inf <2e-16 #> #> Scale= 0.0983 #> #> Weibull distribution #> Loglik(model)= 4.9 Loglik(intercept only)= -15 #> Chisq= 39.92 on 1 degrees of freedom, p= 2.6e-10 #> Number of Newton-Raphson Iterations: 30 #> n= 10 ref_grid(model_correct) #> 'emmGrid' object with variables: #> treatment = A, B #> Transformation: "log" ``` <sup>Created on 2021-02-23 by the [reprex package](https://reprex.tidyverse.org) (v1.0.0)</sup> [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Sent from my phone. Please excuse my brevity.
Sent from my phone. Please excuse my brevity.