An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20111202/1dbec1ef/attachment.pl>
Unexplained behavior of level names when using ordered factors in lm?
6 messages · Tal Galili, David Winsemius, Bert Gunter +1 more
On Dec 2, 2011, at 9:51 AM, Tal Galili wrote:
Hello dear all, I am unable to understand why when I run the following three lines: set.seed(4254)
a <- data.frame(y = rnorm(40), x=ordered(sample(1:5, 40, T))) summary(lm(y ~ x, a))
The output I get includes factor levels which are not relevant to what I am actually using: Call:
lm(formula = y ~ x, data = a)
Residuals:
Min 1Q Median 3Q Max
-1.4096 -0.6400 -0.1244 0.5886 2.1891
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.03276 0.15169 -0.216 0.830
x.L -0.28968 0.33866 -0.855 0.398
x.Q -0.38813 0.33851 -1.147 0.259
x.C -0.27183 0.34027 -0.799 0.430
x^4 0.25993 0.33935 0.766 0.449
Those are polynomial contrasts: linear, quadratic, cubic and quartic. If you don't want contrasts based on ordered factors then just use regular factors. You should probably be looking at: ?"C" (...yet another function whose name should be avoided in naming data- objects.)
David. >> Residual standard error: 0.9564 on 35 degrees of freedom >> Multiple R-squared: 0.08571, Adjusted R-squared: -0.01878 >> F-statistic: 0.8202 on 4 and 35 DF, p-value: 0.5211 > > > I am guessing that this is having something to do with the contrast > matrix > that is used, but this is not clear to me. > Can anyone suggest a good read, or an explanation? > > Thanks. > > > ----------------Contact > Details:------------------------------------------------------- > Contact me: Tal.Galili at gmail.com | 972-52-7275845 > Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il > (Hebrew) | > www.r-statistics.com (English) > ---------------------------------------------------------------------------------------------- > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT
?ordered ?C ?contr.poly If you don't know what polynomial contrasts are, consult any good linear models text. MASS has a good, though a bit terse, section on this. -- Bert
On Fri, Dec 2, 2011 at 6:51 AM, Tal Galili <tal.galili at gmail.com> wrote:
Hello dear all, I am unable to understand why when I run the following three lines: set.seed(4254)
a <- data.frame(y = rnorm(40), x=ordered(sample(1:5, 40, T))) summary(lm(y ~ x, a))
The output I get includes factor levels which are not relevant to what I am actually using: Call:
lm(formula = y ~ x, data = a) Residuals: ? ? Min ? ? ?1Q ?Median ? ? ?3Q ? ? Max -1.4096 -0.6400 -0.1244 ?0.5886 ?2.1891 Coefficients: ? ? ? ? ? ? Estimate Std. Error t value Pr(>|t|) (Intercept) -0.03276 ? ?0.15169 ?-0.216 ? ?0.830 x.L ? ? ? ? -0.28968 ? ?0.33866 ?-0.855 ? ?0.398 x.Q ? ? ? ? -0.38813 ? ?0.33851 ?-1.147 ? ?0.259 x.C ? ? ? ? -0.27183 ? ?0.34027 ?-0.799 ? ?0.430 x^4 ? ? ? ? ?0.25993 ? ?0.33935 ? 0.766 ? ?0.449 Residual standard error: 0.9564 on 35 degrees of freedom Multiple R-squared: 0.08571, Adjusted R-squared: -0.01878 F-statistic: 0.8202 on 4 and 35 DF, ?p-value: 0.5211
I am guessing that this is having something to do with the contrast matrix that is used, but this is not clear to me. Can anyone suggest a good read, or an explanation? Thanks. ----------------Contact Details:------------------------------------------------------- Contact me: Tal.Galili at gmail.com | ?972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) ---------------------------------------------------------------------------------------------- ? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
Maybe should have explicitly said:
C(ordered(1:5))
[1] 1 2 3 4 5 attr(,"contrasts") ordered contr.poly Levels: 1 < 2 < 3 < 4 < 5 -- Bert
On Fri, Dec 2, 2011 at 7:06 AM, Bert Gunter <bgunter at gene.com> wrote:
?ordered ?C ?contr.poly If you don't know what polynomial contrasts are, consult any good linear models text. MASS has a good, though a bit terse, section on this. -- Bert On Fri, Dec 2, 2011 at 6:51 AM, Tal Galili <tal.galili at gmail.com> wrote:
Hello dear all, I am unable to understand why when I run the following three lines: set.seed(4254)
a <- data.frame(y = rnorm(40), x=ordered(sample(1:5, 40, T))) summary(lm(y ~ x, a))
The output I get includes factor levels which are not relevant to what I am actually using: Call:
lm(formula = y ~ x, data = a) Residuals: ? ? Min ? ? ?1Q ?Median ? ? ?3Q ? ? Max -1.4096 -0.6400 -0.1244 ?0.5886 ?2.1891 Coefficients: ? ? ? ? ? ? Estimate Std. Error t value Pr(>|t|) (Intercept) -0.03276 ? ?0.15169 ?-0.216 ? ?0.830 x.L ? ? ? ? -0.28968 ? ?0.33866 ?-0.855 ? ?0.398 x.Q ? ? ? ? -0.38813 ? ?0.33851 ?-1.147 ? ?0.259 x.C ? ? ? ? -0.27183 ? ?0.34027 ?-0.799 ? ?0.430 x^4 ? ? ? ? ?0.25993 ? ?0.33935 ? 0.766 ? ?0.449 Residual standard error: 0.9564 on 35 degrees of freedom Multiple R-squared: 0.08571, Adjusted R-squared: -0.01878 F-statistic: 0.8202 on 4 and 35 DF, ?p-value: 0.5211
I am guessing that this is having something to do with the contrast matrix that is used, but this is not clear to me. Can anyone suggest a good read, or an explanation? Thanks. ----------------Contact Details:------------------------------------------------------- Contact me: Tal.Galili at gmail.com | ?972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) ---------------------------------------------------------------------------------------------- ? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20111202/4067e29b/attachment.pl>
Hi Bert, Since you opened the door ...
On Fri, Dec 2, 2011 at 10:06 AM, Bert Gunter <gunter.berton at gene.com> wrote:
?ordered ?C ?contr.poly If you don't know what polynomial contrasts are, consult any good linear models text. MASS has a good, though a bit terse, section on this.
Do you have a "favorite" liner model text with a bit more exposition than MASS? Even though this list isn't for teaching stats, whenever I can catch some of the tried and true statisticians talking about texts on specific subject matter, I like to take advantage of it to see what I need to add to my amazon wish list to help sharpen the old saw :-) Thanks, -steve
Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact