Thank you very much Martin.
Below is a patch implementing that.
Two newbie questions:
- should I add row.names = NULL, optional = FALSE to match the arguments of the generic? (this is not the case for e.g. as.data.frame.table but I thought it was needed: https://cloud.r-project.org/doc/manuals/r-devel/R-exts.html#Generic-functions-and-methods)
- shouldn't we use match.fun(transFUN)?
diff --git a/src/library/stats/R/lm.R b/src/library/stats/R/lm.R
index 13a458797b..2ce6b16f6e 100644
--- a/src/library/stats/R/lm.R
+++ b/src/library/stats/R/lm.R
@@ -982,3 +982,18 @@ labels.lm <- function(object, ...)
asgn <- object$assign[qr.lm(object)$pivot[1L:object$rank]]
tl[unique(asgn)]
}
+
+as.data.frame.lm <- function(x, ..., level = 0.95, transFUN = NULL)
+{
+ cf <- x |> summary() |> coef()
+ ci <- confint(x, level = level)
+ if(!is.null(transFUN)) {
+ stopifnot(is.function(transFUN))
+ cf[, "Estimate"] <- transFUN(cf[, "Estimate"])
+ ci <- transFUN(ci)
+ }
+ df <- data.frame(row.names(cf), cf, ci, row.names = NULL)
+ names(df) <- c("term", "estimate", "std.error", "statistic", "p.value",
+ "conf.low", "conf.high")
+ df
+}
diff --git a/src/library/stats/man/lm.Rd b/src/library/stats/man/lm.Rd
index ff05afabff..b54373dff4 100644
--- a/src/library/stats/man/lm.Rd
+++ b/src/library/stats/man/lm.Rd
@@ -21,6 +21,8 @@ lm(formula, data, subset, weights, na.action,
singular.ok = TRUE, contrasts = NULL, offset, \dots)
\S3method{print}{lm}(x, digits = max(3L, getOption("digits") - 3L), \dots)
+
+\S3method{as.data.frame}{lm}(x, ..., level = 0.95, transFUN = NULL)
}
\arguments{
\item{formula}{an object of class \code{"\link{formula}"} (or one that
@@ -81,6 +83,10 @@ lm(formula, data, subset, weights, na.action,
\item{digits}{the number of \emph{significant} digits to be
passed to \code{\link{format}(\link{coef}(x), .)} when
\I{\code{\link{print}()}ing}.}
+ %% as.data.frame.lm():
+ \item{level}{the confidence level required.}
+ \item{transFUN}{a function to transform \code{estimate}, \code{conf.low} and
+ \code{conf.high}.}
}
\details{
Models for \code{lm} are specified symbolically. A typical model has
@@ -168,6 +174,10 @@ lm(formula, data, subset, weights, na.action,
\code{effects} and (unless not requested) \code{qr} relating to the linear
fit, for use by extractor functions such as \code{summary} and
\code{\link{effects}}.
+
+ \code{as.data.frame} returns a data frame with statistics as provided by
+ \code{coef(summary(.))} and confidence intervals for model
+ estimates.
}
\section{Using time series}{
Considerable care is needed when using \code{lm} with time series.
De?: Martin Maechler [mailto:maechler at stat.math.ethz.ch]
Envoy??: vendredi 17 janvier 2025 17:04
??: SOEIRO Thomas
Cc?: r-devel at r-project.org
Objet?: Re: [Rd] as.data.frame() methods for model objects
SOEIRO Thomas via R-devel
on Fri, 17 Jan 2025 14:19:31 +0000 writes:
Following Duncan Murdoch's off-list comments (thanks again!), here is a more complete/flexible version:
as.data.frame.lm <- function(x, ..., level = 0.95, exp = FALSE) {
cf <- x |> summary() |> stats::coef()
ci <- stats::confint(x, level = level)
if (exp) {
cf[, "Estimate"] <- exp(cf[, "Estimate"])
ci <- exp(ci)
}
df <- data.frame(row.names(cf), cf, ci, row.names = NULL)
names(df) <- c("term", "estimate", "std.error", "statistic", "p.value", "conf.low", "conf.high")
df
}
Indeed, using level is much better already.
Instead of the exp = FALSE ,
I'd use transFUN = NULL
and then
if(!is.null(transFUN)) {
stopifnot(is.function(transFUN))
cf[, "Estimate"] <- transFUN(cf[, "Estimate"])
ci <- transFUN(ci)
}
Noting that I'd want "inverse-logit" (*) in some cases, but also
different things for different link functions, hence just
exp = T/F is not enough.
Martin
--
*) "inverse-logit" is simply R's plogis() function; quite a
few people have been re-inventing it, also in their packages ...
Waiting for profiling to be done...
term estimate std.error statistic p.value conf.low conf.high
1 (Intercept) 1.076887 0.1226144 0.6041221 0.54849393 0.8468381 1.369429
2 woolB 1.076887 0.1226144 0.6041221 0.54849393 0.8468381 1.369429
3 tensionM 1.248849 0.1501714 1.4797909 0.14520270 0.9304302 1.676239
4 tensionH 1.395612 0.1501714 2.2196863 0.03100435 1.0397735 1.873229
Thank you.
Best regards,
Thomas
-----Message d'origine-----
De?: SOEIRO Thomas
Envoy??: jeudi 16 janvier 2025 14:36
??: r-devel at r-project.org
Objet?: as.data.frame() methods for model objects
Hello all,
Would there be any interest for adding as.data.frame() methods for model objects?
Of course there is packages (e.g. broom), but I think providing methods would be more discoverable (and the patch would be small).
It is really useful for exporting model results or for plotting.
e.g.:
as.data.frame.lm <- function(x) { # could get other arguments, e.g. exp = TRUE/FALSE to exponentiate estimate, conf.low, conf.high
cf <- x |> summary() |> stats::coef()
ci <- stats::confint(x)
data.frame(
term = row.names(cf),
estimate = cf[, "Estimate"],
p.value = cf[, 4], # magic number because name changes between lm() and glm(*, family = *)
conf.low = ci[, "2.5 %"],
conf.high = ci[, "97.5 %"],
row.names = NULL
)
}