Dear R-devel,
There seems to be a discrepancy in the order in which lm and rlm evaluate their arguments. This causes rlm to sometimes produce an error where lm is just fine.
Here is a little script that illustrate the issue:
library(MASS)
## create data
n <- 100
dat <- data.frame(x=rep(c(-1,0,1), n), y=rnorm(3*n))
## call lm, works fine
summary(lm(y ~ as.factor(x), data=dat, subset=x!=0))
Call:
lm(formula = y ~ as.factor(x), data = dat, subset = x != 0)
Residuals:
Min 1Q Median 3Q Max
-2.60619 -0.82160 0.06307 0.65501 2.56677
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.061010 0.100027 0.610 0.543
as.factor(x)1 0.001332 0.141459 0.009 0.992
Residual standard error: 1 on 198 degrees of freedom
Multiple R-squared: 4.479e-07, Adjusted R-squared: -0.00505
F-statistic: 8.868e-05 on 1 and 198 DF, p-value: 0.9925
Error in rlm.default(x, y, weights, method = method, wt.method = wt.method, :
'x' is singular: singular fits are not implemented in rlm
My guess is that rlm first converts x to a factor, which becomes a three-level factor, then subsets on x!=0, which effectively eliminates a level, and then creates a "regression" matrix, which becomes singular due to the absence of data for a level.
Is there a simple way to work around it. The simplest I could think of is
with(subset(dat, x!=0), rlm(y ~ as.factor(x))
which is ok, but most of my scripts make use of data arg to regressions and I'd like to stay consistent as much as practical.
Thanks,
Vadim
Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. Jump Trading, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-----Original Message-----
From: r-devel-bounces at r-project.org
[mailto:r-devel-bounces at r-project.org] On Behalf Of Vadim Ogranovich
Sent: Monday, March 14, 2011 10:37 AM
To: 'r-devel at r-project.org'
Subject: [Rd] discrepancy between lm and MASS:rlm
Dear R-devel,
There seems to be a discrepancy in the order in which lm and
rlm evaluate their arguments. This causes rlm to sometimes
produce an error where lm is just fine.
It may not be a problem with the order of evaluation. rlm()
might not be calling model.frame() with drop.unused.levels=TRUE.
I've made that mistake before with similar symptoms.
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
Here is a little script that illustrate the issue:
library(MASS)
## create data
n <- 100
dat <- data.frame(x=rep(c(-1,0,1), n), y=rnorm(3*n))
## call lm, works fine
summary(lm(y ~ as.factor(x), data=dat, subset=x!=0))
Call:
lm(formula = y ~ as.factor(x), data = dat, subset = x != 0)
Residuals:
Min 1Q Median 3Q Max
-2.60619 -0.82160 0.06307 0.65501 2.56677
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.061010 0.100027 0.610 0.543
as.factor(x)1 0.001332 0.141459 0.009 0.992
Residual standard error: 1 on 198 degrees of freedom
Multiple R-squared: 4.479e-07, Adjusted R-squared: -0.00505
F-statistic: 8.868e-05 on 1 and 198 DF, p-value: 0.9925
Error in rlm.default(x, y, weights, method = method,
wt.method = wt.method, :
'x' is singular: singular fits are not implemented in rlm
My guess is that rlm first converts x to a factor, which
becomes a three-level factor, then subsets on x!=0, which
effectively eliminates a level, and then creates a
"regression" matrix, which becomes singular due to the
absence of data for a level.
Is there a simple way to work around it. The simplest I could
think of is
with(subset(dat, x!=0), rlm(y ~ as.factor(x))
which is ok, but most of my scripts make use of data arg to
regressions and I'd like to stay consistent as much as practical.
Thanks,
Vadim
Note: This email is for the confidential use of the named
addressee(s) only and may contain proprietary, confidential
or privileged information. If you are not the intended
recipient, you are hereby notified that any review,
dissemination or copying of this email is strictly
prohibited, and to please notify the sender immediately and
destroy this email and any attachments. Email transmission
cannot be guaranteed to be secure or error-free. Jump
Trading, therefore, does not make any guarantees as to the
completeness or accuracy of this email or any attachments.
This email is for informational purposes only and does not
constitute a recommendation, offer, request or solicitation
of any kind to buy, sell, subscribe, redeem or perform any
type of transaction of a financial product.
Indeed! I added the drop.unused.levels argument and it now works. Thank you Bill!
Here is a patch which one should use at one's own risk as it redefines rlm.formula as global object, i.e. rlm.formula is no longer in the MASS namespace and this is certainly not a good idea:
rlm.formula <- function (formula, data, weights, ..., subset, na.action, method = c("M",
"MM", "model.frame"), wt.method = c("inv.var", "case"), model = TRUE,
x.ret = TRUE, y.ret = FALSE, contrasts = NULL)
{
mf <- match.call(expand.dots = FALSE)
mf$method <- mf$wt.method <- mf$model <- mf$x.ret <- mf$y.ret <- mf$contrasts <- mf$... <- NULL
mf$drop.unused.levels <- TRUE
mf[[1L]] <- as.name("model.frame")
mf <- eval.parent(mf)
method <- match.arg(method)
wt.method <- match.arg(wt.method)
if (method == "model.frame")
return(mf)
mt <- attr(mf, "terms")
y <- model.response(mf)
offset <- model.offset(mf)
if (!is.null(offset))
y <- y - offset
x <- model.matrix(mt, mf, contrasts)
xvars <- as.character(attr(mt, "variables"))[-1L]
if ((yvar <- attr(mt, "response")) > 0L)
xvars <- xvars[-yvar]
xlev <- if (length(xvars) > 0L) {
xlev <- lapply(mf[xvars], levels)
xlev[!sapply(xlev, is.null)]
}
weights <- model.weights(mf)
if (!length(weights))
weights <- rep(1, nrow(x))
fit <- MASS:::rlm.default(x, y, weights, method = method, wt.method = wt.method,
...)
fit$terms <- mt
cl <- match.call()
cl[[1L]] <- as.name("rlm")
fit$call <- cl
fit$contrasts <- attr(x, "contrasts")
fit$xlevels <- .getXlevels(mt, mf)
fit$na.action <- attr(mf, "na.action")
if (model)
fit$model <- mf
if (!x.ret)
fit$x <- NULL
if (y.ret)
fit$y <- y
fit
}
-----Original Message-----
From: William Dunlap [mailto:wdunlap at tibco.com]
Sent: Monday, March 14, 2011 1:39 PM
To: Vadim Ogranovich; r-devel at r-project.org
Subject: RE: [Rd] discrepancy between lm and MASS:rlm
-----Original Message-----
From: r-devel-bounces at r-project.org
[mailto:r-devel-bounces at r-project.org] On Behalf Of Vadim Ogranovich
Sent: Monday, March 14, 2011 10:37 AM
To: 'r-devel at r-project.org'
Subject: [Rd] discrepancy between lm and MASS:rlm
Dear R-devel,
There seems to be a discrepancy in the order in which lm and
rlm evaluate their arguments. This causes rlm to sometimes
produce an error where lm is just fine.
It may not be a problem with the order of evaluation. rlm()
might not be calling model.frame() with drop.unused.levels=TRUE.
I've made that mistake before with similar symptoms.
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
Here is a little script that illustrate the issue:
library(MASS)
## create data
n <- 100
dat <- data.frame(x=rep(c(-1,0,1), n), y=rnorm(3*n))
## call lm, works fine
summary(lm(y ~ as.factor(x), data=dat, subset=x!=0))
Call:
lm(formula = y ~ as.factor(x), data = dat, subset = x != 0)
Residuals:
Min 1Q Median 3Q Max
-2.60619 -0.82160 0.06307 0.65501 2.56677
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.061010 0.100027 0.610 0.543
as.factor(x)1 0.001332 0.141459 0.009 0.992
Residual standard error: 1 on 198 degrees of freedom
Multiple R-squared: 4.479e-07, Adjusted R-squared: -0.00505
F-statistic: 8.868e-05 on 1 and 198 DF, p-value: 0.9925
Error in rlm.default(x, y, weights, method = method,
wt.method = wt.method, :
'x' is singular: singular fits are not implemented in rlm
My guess is that rlm first converts x to a factor, which
becomes a three-level factor, then subsets on x!=0, which
effectively eliminates a level, and then creates a
"regression" matrix, which becomes singular due to the
absence of data for a level.
Is there a simple way to work around it. The simplest I could
think of is
with(subset(dat, x!=0), rlm(y ~ as.factor(x))
which is ok, but most of my scripts make use of data arg to
regressions and I'd like to stay consistent as much as practical.
Thanks,
Vadim
Note: This email is for the confidential use of the named
addressee(s) only and may contain proprietary, confidential
or privileged information. If you are not the intended
recipient, you are hereby notified that any review,
dissemination or copying of this email is strictly
prohibited, and to please notify the sender immediately and
destroy this email and any attachments. Email transmission
cannot be guaranteed to be secure or error-free. Jump
Trading, therefore, does not make any guarantees as to the
completeness or accuracy of this email or any attachments.
This email is for informational purposes only and does not
constitute a recommendation, offer, request or solicitation
of any kind to buy, sell, subscribe, redeem or perform any
type of transaction of a financial product.
Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. Jump Trading, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.