Skip to content

[NB] lm problems

6 messages · Matej Cepl, Brian Ripley, John Fox +1 more

#
Hi,

I have probably overlooked something obvious, but could anybody
help me with following, please?

Trying to make regression analysis. I have a huge dataframe with
results from National Opinion Survey on Crime and Justice
(www.abacon.com/fox/) with two variables G5 and N3 which are
imported to R as ordered factors:

	> levels(noscj$G5)
	[1] "Strongly agree"    "Agree"             "Neither"
	[4] "Disagree"          "Strongly disagree"
	> levels(noscj$N3)
	[1] "Serious problem"  "Somewhat problem" "Minor problem"  "Not
	a problem"
	>

(missing values are duly recoded as NA). When I try linear
regression I get a lot of warnings which I have not managed to
parse succesfully:

	> lm(G5 ~ N3,data=noscj)
	
	Call:
	lm(formula = G5 ~ N3, data = noscj)
	
	Coefficients:
	(Intercept)         N3.L         N3.Q         N3.C
	    3.38087     -0.05821     -0.15364      0.04367
	
	Warning message:
	"-" not meaningful for ordered factors in: Ops.ordered(y,
	z$residuals)
	> summary(lm(G5 ~ N3,data=noscj))
	
	Call:
	lm(formula = G5 ~ N3, data = noscj)
	
	Residuals:
	[1] <NA> <NA> <NA> <NA> <NA>
	Levels:  Strongly agree Agree Neither Disagree Strongly disagree
	
	Coefficients:
	            Estimate Std. Error t value Pr(>|t|)
	(Intercept)  3.38087
	N3.L        -0.05821
	N3.Q        -0.15364
	N3.C         0.04367
	
	Residual standard error: NA on 980 degrees of freedom
	Multiple R-Squared:    NA,      Adjusted R-squared:    NA
	F-statistic:    NA on 3 and 980 DF,  p-value: NA
	
	Warning messages:
	1: "-" not meaningful for ordered factors in: Ops.ordered(y,
	z$residuals)
	2: "^" not meaningful for ordered factors in: Ops.ordered(r, 2)
	3: ">" not meaningful for factors in: Ops.factor(qs[i], -Inf)
	4: "+" not meaningful for factors in: Ops.factor(qs[i],
	.minus(x[hi[i]], x[lo[i]]) * (index[i] - lo[i]))
	>

Could anybody tell me, what's going on, please? I have no clue
what "^", ">", etc. means.

	Thanks a lot (and thanks for your patience)

		Matej
#
You can't do linear regression with an ordered factor as a response.
If you mean to code the levels you need to do so explicitly by
codes(G5).
On Tue, 26 Nov 2002, Matej Cepl wrote:

            

  
    
#
Dear Matej,

The response variable in a linear model fit by lm has to be a numeric 
variable. (The warnings are produced when lm tries to perform arithmetic 
operations on an ordered factor.) You could use as.numeric(G5) on the left 
hand side of the model, but you should probably think about whether you 
really want to fit a linear model to categorical data.

I hope that this helps,
  John
At 03:40 PM 11/26/2002 -0500, Matej Cepl wrote:
-----------------------------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario, Canada L8S 4M4
email: jfox at mcmaster.ca
phone: 905-525-9140x23604
web: www.socsci.mcmaster.ca/jfox
-----------------------------------------------------

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
On Tue, 26 Nov 2002, Matej Cepl wrote:

            
Well, all the warnings are of the form "  not meaningful for ordered
factors".  The problem is that lm is not meaningful for ordered factors.

You appear to want a linear regression model where the response is
1,2,3,4,5 according to the levels of G5. You need to define a variable
like that.  You can probably just use

numG5<-unclass(noscj$G5)


	-thomas

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
Well, I do think it IS silly idea (moreover, in one other case we
do regression analysis where dependent variable is sex -- which
seems to me really ugly), but it is my homework assignment from
statistics class. Not to make my instructor too silly, they have
the problem that an example dataset for the textbook has just one
interval variable. On the other hand instructor (James Fox, your
namesake BTW -- School of Criminal Justice at Northeastern
University, Boston, MA) is the author of the textbook so I think,
that there should be really TWO datasets (one for tests of
significance and variance and other one for regression
calculations) and that it is his fault after all.

Thanks for the help,

Matej
On Tue, 26 Nov 2002, John Fox wrote:

            
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
On Tue, 26 Nov 2002 ripley at stats.ox.ac.uk wrote:

            
Thanks to everybody who answered to my query. Just to justify
myself in your eyes, let me tell you that a silly idea of doing
regression analysis on nominal variables is not mine, but of my
instructor in school.

Thanks again,

Matej

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._