Skip to content

Are least-squares means useful or appropriate?

7 messages · Spencer Graves, Felipe, Deepayan Sarkar +3 more

#
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi.
I have been reading about the convenience of using least-squares means
(a. k. a. adjusted means) in multiple comparisons (I used to resort to
them when using SAS). I even read a post in this list warning against
them, but not giving much detail. What do you think about this?
Greetings.

Felipe
-----BEGIN PGP SIGNATURE-----

iEYEARECAAYFAkMqq5gACgkQWtdQtNzjBl4AigCfQJ64O0wrdYK/1iMReW5RtI1d
tMIAn3DQSdk+4D7AK7VQGtWo0TElrFG7
=j9EX
-----END PGP SIGNATURE-----
3 days later
#
Estimado Felipe:

	  If you provide a very simple example (as suggested in the posting 
guide, www.R-project.org/posting-guide.html), it would allow those of 
use who rarely use SAS to respond.  Try to think of the simplest 
possible toy data set and analysis that shows the difference between the 
SAS answer and the answer you get from a certain R function.  If you 
post something simple of that nature that someone can copy from your 
email into R and try other things in a minute or two, it will likely 
increase the speed and utility of a reply.

	  Buena Suerte,
	  spencer graves
Felipe wrote:

            

  
    
#
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi.
My question was just theoric. I was wondering if someone who were using
SAS and R could give me their opinion on the topic. I was trying to use
least-squares means for comparison in R, but then I found some
indications against them, and I wanted to know if they had good basis
(as I told earlier, they were not much detailed).
Greetings.

Felipe
Spencer Graves wrote:
| Estimado Felipe:
|
|       If you provide a very simple example (as suggested in the posting
| guide, www.R-project.org/posting-guide.html), it would allow those of
| use who rarely use SAS to respond.  Try to think of the simplest
| possible toy data set and analysis that shows the difference between the
| SAS answer and the answer you get from a certain R function.  If you
| post something simple of that nature that someone can copy from your
| email into R and try other things in a minute or two, it will likely
| increase the speed and utility of a reply.
|
|       Buena Suerte,
|       spencer graves
|
-----BEGIN PGP SIGNATURE-----

iEYEARECAAYFAkMvzDEACgkQWtdQtNzjBl6NbgCfTg0hPZaSio9tO1iWrKHZY3Os
wzEAn3jdHwqqaHxG0OT8KR6kBlSZDPLp
=KtTd
-----END PGP SIGNATURE-----
2 days later
#
On 9/20/05, Felipe <felipe at unileon.es> wrote:
As a non-'SAS user', I'm not a good person to respond to this, but
aren't you asking the wrong crowd? I have never come across this
concept in a proper statistics course, and in my very brief encounter
with it, it made absolutely no sense to me. But of course this does
not automatically mean that it's non-sense or anything. So rather than
asking R users who do not use it why they do not use something that
there is no obvious reason to use to begin with, why don't you ask
those who do (like whoever taught you to use them when you worked with
SAS) to explain why they do?

Deepayan
#
On 9/20/05, Felipe <felipe at unileon.es> wrote:
As Deepayan said in his reply, the concept of least squares means is
associated with SAS and is not generally part of the theory of linear
models in statistics.  My vague understanding of these (I too am not a
SAS user) is that they are an attempt to estimate the "mean" response
for a particular level of a factor in a model in which that factor has
a non-ignorable interaction with another factor.  There is no clearly
acceptable definition of such a thing.

To understand why there should be an attempt to answer a question that
doesn't make sense, remember the history of SAS, which was developed
in the era of punched cards and magnetic tape.  Beneath the surface of
SAS with its GUI, etc. is the fundamental assumption that your data
are on a reel of magnetic tape over in the "Computer Center" that
houses an IBM Sytem/360 computer and that the way you are going to use
this program is by keypunching a deck of punched cards, putting some
mysterious JCL (the IBM Job Control Language which no one understood
and you learned only by imitation) cards at the beginning and end, and
submitting them at the I/O Window.  The next day you will go to the
computer center to pick up your output only to discover that you had a
JCL error.  You will spend most of the morning tracking down the one
person on campus who can tell you that "ERROR IEH92345" was caused by
the blank between the "DD" and the "*" in the card that reads //SYSIN
DD * so you change that and submit again.  After two or three days of
this you get the JCL right but discover that you have a syntax error
in your SAS code.  Another two or three cycles finally gets you to the
point where you have a card deck that runs and produces output.  At
that point you don't really care if the output makes sense or not -
all you want is some numbers for the report that is now a week
overdue.  You also want all the numbers that you might possibly need,
which is why SAS PROCs always have the potential to produce tons of
output if you ask for it.

R is an interactive language where it is a simple matter to fit a
series of models and base your analysis on a model that is
appropriate.  An approach of "give me the answer to any possible
question about this model, whether or not it make sense" is
unnecessary.

In many ways statistical theory and practice has not caught up with
statistical computing.  There are concepts that are regarded as part
of established statistical theory when they are, in fact, 
approximations or compromises motivated by the fact that you can't
compute the answer you want - except now you can compute it.  However,
that won't stop people who were trained in the old system from
assuming that things *must* be done in that way.

In short, I agree with Deepayan - the best thing to do is to ask
someone who uses SAS and least squares means to explain to you what
they are.
#
Douglas Bates <dmbates at gmail.com> writes:
(PD goes and fetches the SAS manual....)

Well, yes. it'll do that too, although only if you ask for the lsmeans
of A when an interaction like A*B is present in the model. This is
related to the tests of main effects when an interaction is present
using type III sums of squares, which has been beaten to death
repeatedly on the list. In both cases, there seems to be an implicit
assumption that categorical variables by nature comes from an
underlying fully balanced design.

If the interaction is absent from the model, the lsmeans are somewhat
more sensible in that they at least reproduce the parameter estimates
as contrasts between different groups. All continuous variables in the
design will be set to their mean, but values for categorical design
variables are weighted inversely as the number of groups. So if you're
doing an lsmeans of lung function by smoking adjusted for age and sex
you get estimates for the mean of a population of which everyone has
the same age and half are male and half are female. This makes some
sense, but if you do it for sex adjusting for smoking and age, you are
not only forcing the sexes to smoke equally much, but actually
adjusting to  smoking rates of 50%, which could be quite far from
reality. 

The whole operation really seems to revolve around 2 things: 

(1) pairwise comparisons between factor levels. This can alternatively
    be done fairly easily using parameter estimates for the relevant
    variable and associated covariances. You don't really need all the
    mumbo-jumbo of adjusting to particular values of other variables.

(2) plotting effects of a factor with error bars as if they were
    simple group means. This has some merit since the standard
    parametrizations are misleading at times (e.g. if you choose the
    group with the least data as the reference level, std. err. for
    the other groups will seem high). However, it seems to me that
    concepts like floating variances (see float() in the Epi package)
    are more to the point.

  
    
#
Dear Peter, Doug, and Felipe,

My effects package (on CRAN, also see the article at
http://www.jstatsoft.org/counter.php?id=75&url=v08/i15/effect-displays-revis
ed.pdf) will compute and graph adjusted effects of various kinds for linear
and generalized linear models -- generalizing so-called "least-squares
means" (or "population marginal means" or "adjusted means").

A couple of comments: 

By default, the all.effects() function in the effects package computes
effects for high-order terms in the model, absorbing terms marginal to them.
You can ask the effect() function to compute an effect for a term that's
marginal to a higher-order term, and it will do so with a warning, but this
is rarely sensible.

Peter's mention of floating variances (or quasi-variances) in this context
is interesting, but what would most like to see, I think, are the
quasi-variances for the adjusted effects, that is for terms merged with
their lower-order relatives. These, for example, are unaffected by contrast
coding. How to define reasonable quasi-variances in this context has been
puzzling me for a while.

Regards,
 John

--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
--------------------------------