Can't understand error message :-{ - R-help

Tue, Mar 2, 1999 8:10 AM #

I'm sorry if this is a basic question, but I'm stumped. I'm just trying to plot
the residuals from a linear model against another variable in the data frame.
Here are the lines I'm trying to execute:

size <- read.table(file="/u67/abasl70/surveys/annenberg/mega/smschl.dat",
header=T)
sizef <- data.frame(size, row.names=size$unit)
attach(sizef)
mschmod <- lm (mavgres ~ crimesch + socstat + povnojob + ploinc94 + aa94 +
hisp94 + minty94 + mixed94, data=sizef)
plot(mschmod$residuals ~ size94)

The last line gives this error message:
Error in model.frame(formula, rownames, variables, varnames, extras,
extranames,  : variable lengths differ

In fact, the lengths are different:

[1] 379

[1] 384

but I'm sure I don't know 1) how this happened since it all came from the same
data frame, or 2) why it should prevent me from plotting the data. If someone
can tell me how to fix this (before I have to present the results at this
Thursday's meeting!) I'd be very appreciative. 
______________________________________________________________________
Stuart Luppescu         -=-=-  University of Chicago
ÂºÃÃŠÂ¸ Â¤ÃˆÃƒÃ’Ã†Ã ÃˆÃ¾Â¤ÃŽÃ‰Ã£(EUC)  -=-=-  s-luppescu at uchicago.edu
http://www.consortium-chicago.org/people/sl/sl.html
ICQ #21172047  AIM: psycho7070
Bare feet magnetize sharp metal objects so they point upward from the
floor -- especially in the dark.

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Peter Dalgaard

Tue, Mar 2, 1999 8:43 AM #

Stuart Luppescu <s-luppescu at uchicago.edu> writes:

(1) Missing values in response and/or regressors cause cases to be
    discarded. 
(2) Plotting which of the y's against which x's ?

plot(mschmod$residuals ~ size94[complete.cases(mavgres,crimesch,
socstat,povnojob,ploinc94,aa94,hisp94,minty94,mixed94)])

should do the trick. Or, simpler but sneakier:

attach(sizef[rownames(mschmod$model),])
plot(residuals(mschmod) ~ size94)
detach()

It should also work with:

evalq(plot(residuals(mschmod) ~ size94), sizef[rownames(mschmod$model),])

(none of the above is tested, since I don't have your data of course)

O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

John Logsdon

Thu, Mar 4, 1999 11:53 AM #

On 2 Mar 1999, Peter Dalgaard BSA wrote:

The problems of plotting residuals vs fitted data/covariates where there
are NAs caught me out a little while ago.  Would it not be better if the
fitting functions lm, glm etc and plot were consistent?  Thus either (a)
plot() omitted cases in the X or the Y which were NA before checking for
length consistency or (b) residuals() etc included NA in the appropriate
places. 

Another consequence of the present inconsistency is that, it *is* possible
that the vector contraction just happens to return a vector of the same
length.  This happened to me without realising it for a time - I had 625
items and took a subset of 620 but, through ignorance of the correct
commands and a good measure of stupidity, it was the wrong subset! 
Fortunately I checked and from the plots it was obvious that something was
wrong but it makes me rather nervous when handling big data sets.  Mea
culpa I know but it is an easy trap to fall into. 

Where there are lots of NA's and lots of subsets of data, I also find
myself making too many similar copies of data to avoid very long plot() 
commands then forgetting which is which.  Another case for an extended
form of ls() (ls-l maybe) as recently suggested which includes the
structure size and perhaps class etc.

John

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Peter Dalgaard

Thu, Mar 4, 1999 4:10 PM #

John Logsdon <j.logsdon at lancaster.ac.uk> writes:

(a) won't work if you think closer about it. (b) might. I wouldn't be
surprised if there's a rationale for the way things are now, but I
can't seem to reconstruct it. Well, there's space saving of course,
but given the waste in other areas, that is hardly a crucial point.
Possibly, consistent behaviour of drop(), etc. has something to do
with it.

O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Brian Ripley

Thu, Mar 4, 1999 9:06 PM #

On 5 Mar 1999, Peter Dalgaard BSA wrote:

I hope that (b) does work, as that is the direction S-PLUS is taking,
prompted by passionate advocacy from Terry Therneau whose survival code
does this. But, you do have to be very careful: you are implicitly assuming
(as does Terry, explicitly) that na.action=na.omit. That is by no means the
only possibility (not even the default), and na.action could also increase
the number of cases (multiple imputation).  It isn't just residuals: the
issue over predict is subtler, and you may want to handle fitted, residuals
and predict separately. And when you start doing this you may break a lot
of code.

The best way to avoid trouble is to use the row names/vector names, which
tell you which of the original cases you have. Now they are passed down
correctly in R (I hope) you can just match the sets of names. (What, I
hear, you want the software to do that? Oh well, one day, for some plot
methods.)

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Martin Maechler

Fri, Mar 5, 1999 12:23 AM #

PD> John Logsdon <j.logsdon at lancaster.ac.uk> writes:

>> On 2 Mar 1999, Peter Dalgaard BSA wrote:

>> 
    >> > 
    >> > (1) Missing values in response and/or regressors cause cases to be
    >> > discarded.  > (2) Plotting which of the y's against which x's ?
    >> > 
    >> > plot(mschmod$residuals ~ size94[complete.cases(mavgres,crimesch, >
    >> socstat,povnojob,ploinc94,aa94,hisp94,minty94,mixed94)])
    >> > 
    >> > should do the trick. Or, simpler but sneakier:
    >> > 
    >> > attach(sizef[rownames(mschmod$model),]) > plot(residuals(mschmod)
    >> ~ size94) > detach()
    >> > 
    >> > It should also work with:
    >> > 
    >> > evalq(plot(residuals(mschmod) ~ size94),
    >> sizef[rownames(mschmod$model),])
    >> > 
    >> > (none of the above is tested, since I don't have your data of
    >> course)
    >> 
    >> The problems of plotting residuals vs fitted data/covariates where
    >> there are NAs caught me out a little while ago.  Would it not be
    >> better if the fitting functions lm, glm etc and plot were
    >> consistent?  Thus either (a) plot() omitted cases in the X or the Y
    >> which were NA before checking for length consistency or (b)
    >> residuals() etc included NA in the appropriate places.

    PD> (a) won't work if you think closer about it.
yes, agreed.

    PD> (b) might. I wouldn't
    PD> be surprised if there's a rationale for the way things are now, but
    PD> I can't seem to reconstruct it. Well, there's space saving of
    PD> course, but given the waste in other areas, that is hardly a
    PD> crucial point.  Possibly, consistent behaviour of drop(), etc. has
    PD> something to do with it.

Werner Stahel (in our stat group) has been using hacked versions of  lm
and some hacked lm methods which exactly address this,
i.e. they follow the "b)"  approach;  however I think that it's still a
hack that only works in some (most used) cases.
One would have to change quite a few  lm/glm/... methods probably.

I do think it'd be a worthwhile route, though incompatible with S.

Would one want to have a global option() to toggle this behavior?
It looks dangerous and undesirable (a la octave ..) to have functions
return different results depending on options().
The "contrasts" case is a half step in that direction, and it has had all
kind of adverse consequences.  
Ideally, options() should only affect the way results are *displayed*, 
not the way they are computed (and stored). 

Other opinions?
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._