The problem or actual R implementation relies on an assumption
that median(x[i] | x[i] <= quantile(x, 1/3)) == quantile(x, 1/6)
which reveals not to be true despite very trustful appearance.
If we continue with the example of x=y=1:9
then quantile(x, 1/6)=2.5 (here quantile() is taken in C-code sens, not R's one)
while median(y[i] | x[i] <= quantile(x, 1/3))=2
On the other sample's side we've got 7.5 and 8 for x and y respectively.
Hence the slope (8-2)/(7.5-2.5)=1.2
To get a correct version of this, one should calculate x robust points in the same way as the y's,
i.e. xb=median(x[i] | x[i] <= quantile(x, 1/3)) and xt=median(x[i] | x[i] >= quantile(x, 2/3))
Best,
Serguei.
Le 29/05/2017 ? 10:02, peter dalgaard a ?crit :
line(1:9+rnorm(9,,1e-1),1:9+rnorm(9,,1e-1))
Call:
line(1:9 + rnorm(9, , 0.1), 1:9 + rnorm(9, , 0.1))
Coefficients:
[1] -0.9407 1.1948
I.e., it is not likely an issue with exact integers or perfect fit.
-pd
On 29 May 2017, at 07:21 , GlenB <glnbrntt at gmail.com> wrote:
Tukey divides the points into three groups, not the x and y values
I'll try to get hold of the book for a direct quote, might take a couple
of days.
Ah well, I can't get it for a week. But the fact that it's often called
Tukey's three group line (try a search on *tukey three group line* and
you'll get plenty of hits) is pretty much a giveaway.
On Mon, May 29, 2017 at 2:19 PM, GlenB <glnbrntt at gmail.com> wrote:
Tukey divides the points into three groups, not the x and y values
separately.
I'll try to get hold of the book for a direct quote, might take a couple
of days.
On Mon, May 29, 2017 at 8:40 AM, Duncan Murdoch <murdoch.duncan at gmail.com>
wrote:
On 27/05/2017 9:28 PM, GlenB wrote:
Bug: stats::line() does not produce correct Tukey line when n mod 6 is 2
or
3
Example: line(1:9,1:9) should have intercept 0 and slope 1 but it gives
intercept -1 and slope 1.2
Trying line(1:i,1:i) across a range of i makes it clear there's a cycle
of
length 6, with four of every six correct.
Bug has been present across many versions.
The machine I just tried it on just now has R3.2.3:
If you look at the source (in src/library/stats/src/line.c), the
explanation is clear: the x value is chosen as the 1/6 quantile (according
to a particular definition of quantile), and the y value is chosen as the
median of the y values where x is less than or equal to the 1/3 quantile.
Those are different definitions (though I think they would be
asymptotically equivalent under pretty weak assumptions), so it's not
surprising the x value doesn't correspond perfectly to the y value, and the
line ends up "wrong".
So is it a bug? Well, that depends on Tukey's definition. I don't have
a copy of his book handy so I can't really say. Maybe the R function is
doing exactly what Tukey said it should, and that's not a bug. Or maybe R
is wrong.
Duncan Murdoch
[[alternative HTML version deleted]]