-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Peter Dalgaard
Sent: Friday, September 23, 2005 10:23 AM
To: Douglas Bates
Cc: Felipe; R-help at stat.math.ethz.ch
Subject: Re: [R] Are least-squares means useful or appropriate?
Douglas Bates <dmbates at gmail.com> writes:
On 9/20/05, Felipe <felipe at unileon.es> wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi.
My question was just theoric. I was wondering if someone who were
using SAS and R could give me their opinion on the topic. I was
trying to use least-squares means for comparison in R, but then I
found some indications against them, and I wanted to know if they
had good basis (as I told earlier, they were not much detailed).
Greetings.
Felipe
As Deepayan said in his reply, the concept of least squares
associated with SAS and is not generally part of the theory
models in statistics. My vague understanding of these (I
SAS user) is that they are an attempt to estimate the
for a particular level of a factor in a model in which that
a non-ignorable interaction with another factor. There is
acceptable definition of such a thing.
(PD goes and fetches the SAS manual....)
Well, yes. it'll do that too, although only if you ask for
the lsmeans of A when an interaction like A*B is present in
the model. This is related to the tests of main effects when
an interaction is present using type III sums of squares,
which has been beaten to death repeatedly on the list. In
both cases, there seems to be an implicit assumption that
categorical variables by nature comes from an underlying
fully balanced design.
If the interaction is absent from the model, the lsmeans are
somewhat more sensible in that they at least reproduce the
parameter estimates as contrasts between different groups.
All continuous variables in the design will be set to their
mean, but values for categorical design variables are
weighted inversely as the number of groups. So if you're
doing an lsmeans of lung function by smoking adjusted for age
and sex you get estimates for the mean of a population of
which everyone has the same age and half are male and half
are female. This makes some sense, but if you do it for sex
adjusting for smoking and age, you are not only forcing the
sexes to smoke equally much, but actually adjusting to
smoking rates of 50%, which could be quite far from reality.
The whole operation really seems to revolve around 2 things:
(1) pairwise comparisons between factor levels. This can alternatively
be done fairly easily using parameter estimates for the relevant
variable and associated covariances. You don't really need all the
mumbo-jumbo of adjusting to particular values of other variables.
(2) plotting effects of a factor with error bars as if they were
simple group means. This has some merit since the standard
parametrizations are misleading at times (e.g. if you choose the
group with the least data as the reference level, std. err. for
the other groups will seem high). However, it seems to me that
concepts like floating variances (see float() in the Epi package)
are more to the point.
R is an interactive language where it is a simple matter to fit a
series of models and base your analysis on a model that is
appropriate. An approach of "give me the answer to any possible
question about this model, whether or not it make sense" is
unnecessary.
In many ways statistical theory and practice has not caught up with
statistical computing. There are concepts that are
of established statistical theory when they are, in fact,
approximations or compromises motivated by the fact that you can't
compute the answer you want - except now you can compute
that won't stop people who were trained in the old system from
assuming that things *must* be done in that way.
In short, I agree with Deepayan - the best thing to do is to ask
someone who uses SAS and least squares means to explain to you what
they are.