-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi. I have been reading about the convenience of using least-squares means (a. k. a. adjusted means) in multiple comparisons (I used to resort to them when using SAS). I even read a post in this list warning against them, but not giving much detail. What do you think about this? Greetings. Felipe -----BEGIN PGP SIGNATURE----- iEYEARECAAYFAkMqq5gACgkQWtdQtNzjBl4AigCfQJ64O0wrdYK/1iMReW5RtI1d tMIAn3DQSdk+4D7AK7VQGtWo0TElrFG7 =j9EX -----END PGP SIGNATURE-----
Are least-squares means useful or appropriate?
7 messages · Spencer Graves, Felipe, Deepayan Sarkar +3 more
3 days later
Estimado Felipe: If you provide a very simple example (as suggested in the posting guide, www.R-project.org/posting-guide.html), it would allow those of use who rarely use SAS to respond. Try to think of the simplest possible toy data set and analysis that shows the difference between the SAS answer and the answer you get from a certain R function. If you post something simple of that nature that someone can copy from your email into R and try other things in a minute or two, it will likely increase the speed and utility of a reply. Buena Suerte, spencer graves
Felipe wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi. I have been reading about the convenience of using least-squares means (a. k. a. adjusted means) in multiple comparisons (I used to resort to them when using SAS). I even read a post in this list warning against them, but not giving much detail. What do you think about this? Greetings. Felipe -----BEGIN PGP SIGNATURE----- iEYEARECAAYFAkMqq5gACgkQWtdQtNzjBl4AigCfQJ64O0wrdYK/1iMReW5RtI1d tMIAn3DQSdk+4D7AK7VQGtWo0TElrFG7 =j9EX -----END PGP SIGNATURE-----
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Spencer Graves, PhD Senior Development Engineer PDF Solutions, Inc. 333 West San Carlos Street Suite 700 San Jose, CA 95110, USA spencer.graves at pdf.com www.pdf.com <http://www.pdf.com> Tel: 408-938-4420 Fax: 408-280-7915
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi. My question was just theoric. I was wondering if someone who were using SAS and R could give me their opinion on the topic. I was trying to use least-squares means for comparison in R, but then I found some indications against them, and I wanted to know if they had good basis (as I told earlier, they were not much detailed). Greetings. Felipe
Spencer Graves wrote:
| Estimado Felipe: | | If you provide a very simple example (as suggested in the posting | guide, www.R-project.org/posting-guide.html), it would allow those of | use who rarely use SAS to respond. Try to think of the simplest | possible toy data set and analysis that shows the difference between the | SAS answer and the answer you get from a certain R function. If you | post something simple of that nature that someone can copy from your | email into R and try other things in a minute or two, it will likely | increase the speed and utility of a reply. | | Buena Suerte, | spencer graves | -----BEGIN PGP SIGNATURE----- iEYEARECAAYFAkMvzDEACgkQWtdQtNzjBl6NbgCfTg0hPZaSio9tO1iWrKHZY3Os wzEAn3jdHwqqaHxG0OT8KR6kBlSZDPLp =KtTd -----END PGP SIGNATURE-----
2 days later
On 9/20/05, Felipe <felipe at unileon.es> wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi. My question was just theoric. I was wondering if someone who were using SAS and R could give me their opinion on the topic. I was trying to use least-squares means for comparison in R, but then I found some indications against them, and I wanted to know if they had good basis (as I told earlier, they were not much detailed).
As a non-'SAS user', I'm not a good person to respond to this, but aren't you asking the wrong crowd? I have never come across this concept in a proper statistics course, and in my very brief encounter with it, it made absolutely no sense to me. But of course this does not automatically mean that it's non-sense or anything. So rather than asking R users who do not use it why they do not use something that there is no obvious reason to use to begin with, why don't you ask those who do (like whoever taught you to use them when you worked with SAS) to explain why they do? Deepayan
On 9/20/05, Felipe <felipe at unileon.es> wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi. My question was just theoric. I was wondering if someone who were using SAS and R could give me their opinion on the topic. I was trying to use least-squares means for comparison in R, but then I found some indications against them, and I wanted to know if they had good basis (as I told earlier, they were not much detailed). Greetings. Felipe
As Deepayan said in his reply, the concept of least squares means is associated with SAS and is not generally part of the theory of linear models in statistics. My vague understanding of these (I too am not a SAS user) is that they are an attempt to estimate the "mean" response for a particular level of a factor in a model in which that factor has a non-ignorable interaction with another factor. There is no clearly acceptable definition of such a thing. To understand why there should be an attempt to answer a question that doesn't make sense, remember the history of SAS, which was developed in the era of punched cards and magnetic tape. Beneath the surface of SAS with its GUI, etc. is the fundamental assumption that your data are on a reel of magnetic tape over in the "Computer Center" that houses an IBM Sytem/360 computer and that the way you are going to use this program is by keypunching a deck of punched cards, putting some mysterious JCL (the IBM Job Control Language which no one understood and you learned only by imitation) cards at the beginning and end, and submitting them at the I/O Window. The next day you will go to the computer center to pick up your output only to discover that you had a JCL error. You will spend most of the morning tracking down the one person on campus who can tell you that "ERROR IEH92345" was caused by the blank between the "DD" and the "*" in the card that reads //SYSIN DD * so you change that and submit again. After two or three days of this you get the JCL right but discover that you have a syntax error in your SAS code. Another two or three cycles finally gets you to the point where you have a card deck that runs and produces output. At that point you don't really care if the output makes sense or not - all you want is some numbers for the report that is now a week overdue. You also want all the numbers that you might possibly need, which is why SAS PROCs always have the potential to produce tons of output if you ask for it. R is an interactive language where it is a simple matter to fit a series of models and base your analysis on a model that is appropriate. An approach of "give me the answer to any possible question about this model, whether or not it make sense" is unnecessary. In many ways statistical theory and practice has not caught up with statistical computing. There are concepts that are regarded as part of established statistical theory when they are, in fact, approximations or compromises motivated by the fact that you can't compute the answer you want - except now you can compute it. However, that won't stop people who were trained in the old system from assuming that things *must* be done in that way. In short, I agree with Deepayan - the best thing to do is to ask someone who uses SAS and least squares means to explain to you what they are.
Douglas Bates <dmbates at gmail.com> writes:
On 9/20/05, Felipe <felipe at unileon.es> wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi. My question was just theoric. I was wondering if someone who were using SAS and R could give me their opinion on the topic. I was trying to use least-squares means for comparison in R, but then I found some indications against them, and I wanted to know if they had good basis (as I told earlier, they were not much detailed). Greetings. Felipe
As Deepayan said in his reply, the concept of least squares means is associated with SAS and is not generally part of the theory of linear models in statistics. My vague understanding of these (I too am not a SAS user) is that they are an attempt to estimate the "mean" response for a particular level of a factor in a model in which that factor has a non-ignorable interaction with another factor. There is no clearly acceptable definition of such a thing.
(PD goes and fetches the SAS manual....)
Well, yes. it'll do that too, although only if you ask for the lsmeans
of A when an interaction like A*B is present in the model. This is
related to the tests of main effects when an interaction is present
using type III sums of squares, which has been beaten to death
repeatedly on the list. In both cases, there seems to be an implicit
assumption that categorical variables by nature comes from an
underlying fully balanced design.
If the interaction is absent from the model, the lsmeans are somewhat
more sensible in that they at least reproduce the parameter estimates
as contrasts between different groups. All continuous variables in the
design will be set to their mean, but values for categorical design
variables are weighted inversely as the number of groups. So if you're
doing an lsmeans of lung function by smoking adjusted for age and sex
you get estimates for the mean of a population of which everyone has
the same age and half are male and half are female. This makes some
sense, but if you do it for sex adjusting for smoking and age, you are
not only forcing the sexes to smoke equally much, but actually
adjusting to smoking rates of 50%, which could be quite far from
reality.
The whole operation really seems to revolve around 2 things:
(1) pairwise comparisons between factor levels. This can alternatively
be done fairly easily using parameter estimates for the relevant
variable and associated covariances. You don't really need all the
mumbo-jumbo of adjusting to particular values of other variables.
(2) plotting effects of a factor with error bars as if they were
simple group means. This has some merit since the standard
parametrizations are misleading at times (e.g. if you choose the
group with the least data as the reference level, std. err. for
the other groups will seem high). However, it seems to me that
concepts like floating variances (see float() in the Epi package)
are more to the point.
R is an interactive language where it is a simple matter to fit a series of models and base your analysis on a model that is appropriate. An approach of "give me the answer to any possible question about this model, whether or not it make sense" is unnecessary. In many ways statistical theory and practice has not caught up with statistical computing. There are concepts that are regarded as part of established statistical theory when they are, in fact, approximations or compromises motivated by the fact that you can't compute the answer you want - except now you can compute it. However, that won't stop people who were trained in the old system from assuming that things *must* be done in that way. In short, I agree with Deepayan - the best thing to do is to ask someone who uses SAS and least squares means to explain to you what they are.
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
O__ ---- Peter Dalgaard ??ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
Dear Peter, Doug, and Felipe, My effects package (on CRAN, also see the article at http://www.jstatsoft.org/counter.php?id=75&url=v08/i15/effect-displays-revis ed.pdf) will compute and graph adjusted effects of various kinds for linear and generalized linear models -- generalizing so-called "least-squares means" (or "population marginal means" or "adjusted means"). A couple of comments: By default, the all.effects() function in the effects package computes effects for high-order terms in the model, absorbing terms marginal to them. You can ask the effect() function to compute an effect for a term that's marginal to a higher-order term, and it will do so with a warning, but this is rarely sensible. Peter's mention of floating variances (or quasi-variances) in this context is interesting, but what would most like to see, I think, are the quasi-variances for the adjusted effects, that is for terms merged with their lower-order relatives. These, for example, are unaffected by contrast coding. How to define reasonable quasi-variances in this context has been puzzling me for a while. Regards, John -------------------------------- John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox --------------------------------
-----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Peter Dalgaard Sent: Friday, September 23, 2005 10:23 AM To: Douglas Bates Cc: Felipe; R-help at stat.math.ethz.ch Subject: Re: [R] Are least-squares means useful or appropriate? Douglas Bates <dmbates at gmail.com> writes:
On 9/20/05, Felipe <felipe at unileon.es> wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi. My question was just theoric. I was wondering if someone who were using SAS and R could give me their opinion on the topic. I was trying to use least-squares means for comparison in R, but then I found some indications against them, and I wanted to know if they had good basis (as I told earlier, they were not much detailed). Greetings. Felipe
As Deepayan said in his reply, the concept of least squares
means is
associated with SAS and is not generally part of the theory
of linear
models in statistics. My vague understanding of these (I
too am not a
SAS user) is that they are an attempt to estimate the
"mean" response
for a particular level of a factor in a model in which that
factor has
a non-ignorable interaction with another factor. There is
no clearly
acceptable definition of such a thing.
(PD goes and fetches the SAS manual....)
Well, yes. it'll do that too, although only if you ask for
the lsmeans of A when an interaction like A*B is present in
the model. This is related to the tests of main effects when
an interaction is present using type III sums of squares,
which has been beaten to death repeatedly on the list. In
both cases, there seems to be an implicit assumption that
categorical variables by nature comes from an underlying
fully balanced design.
If the interaction is absent from the model, the lsmeans are
somewhat more sensible in that they at least reproduce the
parameter estimates as contrasts between different groups.
All continuous variables in the design will be set to their
mean, but values for categorical design variables are
weighted inversely as the number of groups. So if you're
doing an lsmeans of lung function by smoking adjusted for age
and sex you get estimates for the mean of a population of
which everyone has the same age and half are male and half
are female. This makes some sense, but if you do it for sex
adjusting for smoking and age, you are not only forcing the
sexes to smoke equally much, but actually adjusting to
smoking rates of 50%, which could be quite far from reality.
The whole operation really seems to revolve around 2 things:
(1) pairwise comparisons between factor levels. This can alternatively
be done fairly easily using parameter estimates for the relevant
variable and associated covariances. You don't really need all the
mumbo-jumbo of adjusting to particular values of other variables.
(2) plotting effects of a factor with error bars as if they were
simple group means. This has some merit since the standard
parametrizations are misleading at times (e.g. if you choose the
group with the least data as the reference level, std. err. for
the other groups will seem high). However, it seems to me that
concepts like floating variances (see float() in the Epi package)
are more to the point.
R is an interactive language where it is a simple matter to fit a series of models and base your analysis on a model that is appropriate. An approach of "give me the answer to any possible question about this model, whether or not it make sense" is unnecessary. In many ways statistical theory and practice has not caught up with statistical computing. There are concepts that are
regarded as part
of established statistical theory when they are, in fact, approximations or compromises motivated by the fact that you can't compute the answer you want - except now you can compute
it. However,
that won't stop people who were trained in the old system from assuming that things *must* be done in that way. In short, I agree with Deepayan - the best thing to do is to ask someone who uses SAS and least squares means to explain to you what they are.
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
-- O__ ---- Peter Dalgaard ??ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html