Basic question: why does a scatter plot of a variable against itself works like this?

Interestingly, fitting an LM with x on both sides gives a warning, and
then drops it from the RHS, leaving you with just an intercept:
lm(x~x,data=d)
Call:
lm(formula = x ~ x, data = d)

Coefficients:
(Intercept)
          4

Warning messages:
1: In model.matrix.default(mt, mf, contrasts) :
  the response appeared on the right-hand side and was dropped
2: In model.matrix.default(mt, mf, contrasts) :
  problem with term 1 in model.matrix: no columns are assigned

there's no numerical problem fitting a line through the points:

 > d$xx=d$x
 > lm(x~xx,data=d)

Call:
lm(formula = x ~ xx, data = d)

Coefficients:
(Intercept)           xx
  5.128e-16    1.000e+00

It seems to be R saying "Ummm did you really mean to do this? It's kinda dumb".

I suppose this could occur if you had a nested loop over all columns
in a data frame, fitting an LM with every column, and didn't skip if
i==j

Except of course it doesn't:

 - fit with two indexes set to one:
i=1;j=1
lm(d[,i]~d[,j])
Call:
lm(formula = d[, i] ~ d[, j])

Coefficients:
(Intercept)       d[, j]
  5.128e-16    1.000e+00

- fit with two ones:
lm(d[,1]~d[,1])
Call:
lm(formula = d[, 1] ~ d[, 1])

Coefficients:
(Intercept)
          4

Warning messages:
1: In model.matrix.default(mt, mf, contrasts) :
  the response appeared on the right-hand side and was dropped
2: In model.matrix.default(mt, mf, contrasts) :
  problem with term 1 in model.matrix: no columns are assigned

Obviously this can all be explained in terms of R (or lm's, or
model.matrix's) evaluation schemes, but it seems far from intuitive.

Barry
It probably happens because plot(formula) makes one call to terms(formula) to
analyze the formula.  terms() says there is one variable in the formula,
the response, so plot(x~x) is the same a plot(seq_along(x), x).
If you give it plot(~x) , terms() also says there is one variable, but
no response, so you get the same plot as plot(x, rep(1,length(x))).
This is also the reason that plot(y1+y2 ~ x1+x2) makes one plot of the sum of y1 and y2
for each term on the right side instead of 4 plots, plot(x1,y1), plot(x1,y2),plot(x2,y1),
and plot(x2,y2).

One could write a plot function that called terms separately on the left and
right sides of the formula.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
Of Tal Galili
Sent: Wednesday, November 06, 2013 8:40 AM
To: r-help at r-project.org
Subject: [R] Basic question: why does a scatter plot of a variable against itself works like
this?

Hello all,

I just noticed the following behavior of plot:
x <- c(1,2,9)
plot(x ~ x) # this is just like doing:
plot(x)
# when maybe we would like it to give this:
plot(x ~ c(x))
# the same as:
plot(x ~ I(x))

I was wondering if there is some reason for this behavior.

Thanks,
Tal

----------------Contact
Details:-------------------------------------------------------
Contact me: Tal.Galili at gmail.com |
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
----------------------------------------------------------------------------------------------

      [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.