Skip to content

[Package car/data.ellipse]: confidence intervals off by factor sqrt(2)??? (PR#2584)

6 messages · John Fox, volker.franz@tuebingen.mpg.de, Deepayan Sarkar

#
Full_Name: Volker Franz
Version: Version 1.6.2  (2003-01-10)
OS: Debian
Submission from: (NULL) (192.124.28.104)


Hi there, 

it seems to me that data.ellipse of package "car" (Version 1.0-1)
produces confidence interval's which are too big. To see this, do:

library(car)
plot(c(-2,2),c(-2,2),pch=0)
data.ellipse(rnorm(10000),rnorm(10000),levels=0.68,plot.points=F)
abline(v=+1)
abline(v=-1)
abline(h=+1)
abline(h=-1)

To my knowledge, this should result in a circle with radius
1. However, the circle is larger. It seems that the problem is due to
an erroneous specification of the degrees of freedom and can be fixed
with the following patch:

======================================================================
--- Ellipse.R	Wed Feb 26 17:49:43 2003
+++ Ellipse.orig	Thu Sep 19 18:20:41 2002
@@ -34,7 +34,7 @@
         stop("x and y must be vectors of the same length")
     if (plot.points & !add) plot(x, y, xlab=xlab, ylab=ylab, col=col, pch=pch,
las=las, ...)
     if (plot.points & add) points(x, y, col=col, pch=pch, ...)
-    dfn<-1
+    dfn<-2
     dfd<-length(x)-1
     if (robust) {
         require(MASS)
======================================================================

Or --- am I totally on the wrong track here?

Best 
Volker
#
Dear Volker,

If the data ellipse (or, in this case, circle) is scaled so that its 
shadows (projections) on the axes each includes 68% of the data (that is of 
the marginal distribution of each variable), then the ellipse will include 
less than 68% of the data (i.e., of the joint distribution of the two 
variables). Conversely, to include 68% of the data in the ellipse, the 
shadows of the ellipse have to be larger.

Did I understand your point correctly?

John
At 09:40 PM 2/26/2003 +0100, volker.franz@tuebingen.mpg.de wrote:
____________________________
John Fox
Department of Sociology
McMaster University
email: jfox@mcmaster.ca
web: http://www.socsci.mcmaster.ca/jfox
#
Hi John,
JF> Dear Volker, If the data ellipse (or, in this case, circle) is
    JF> scaled so that its shadows (projections) on the axes each
    JF> includes 68% of the data (that is of the marginal distribution
    JF> of each variable), then the ellipse will include less than 68%
    JF> of the data (i.e., of the joint distribution of the two
    JF> variables). Conversely, to include 68% of the data in the
    JF> ellipse, the shadows of the ellipse have to be larger.
    JF> Did I understand your point correctly?

I am not sure. I will try to rephrase my initial request:

Let X by a one--dimensional random variable (standard normal
distribution; mean=0; std=1). The 68% confidence intervall of X will
approximately be: [-1,1]. Now, if I combine X with a stochastically
independent second random variable Y, the marginal distribution of X
should not change. Therefore, the projections of the error ellipse on
the X--axis should still be: [-1,1].

If I do this with the function data.ellipse: 

   data.ellipse(rnorm(10000),rnorm(10000),levels=0.68,plot.points=F)

I get a projection on the X-axis which is larger than [-1,1]. In fact,
it is a little bit larger than [-sqrt(2),+sqrt(2)].

My interpretation is that this is due to the construction of the
radius in data.ellipse:

   dfn<-2
   radius <- sqrt ( dfn * qf(level, dfn, dfd ))

I would expect a dfn<-1 here (such that the radius would correspond to
the t-distribution). 

Does this make sense?

Volker
#
On Wednesday 26 February 2003 04:23 pm, Volker Franz wrote:
Why so ? Let Y be an independent copy of X (i.e., Y ~ N(0,1) too, independent 
of X). Then P(Y is in [-Inf , Inf]) = 1. Now, think of the 2-D confidence 
region [-1, 1] x [-Inf, Inf]. This will have (by independence of X and Y) 
probability 0.68.

Now, how can you expect an ellipse that will have the same X-range, that is a 
strict subset of this region, to still have joint probability 0.68 ?

Hope that helps,

Deepayan
#
Dear Volker,
At 11:23 PM 2/26/2003 +0100, Volker Franz wrote:
This is a data ellipse, not a confidence ellipse, but the same point arises 
in both cases: For the ellipse to enclose 68 percent of the joint 
distribution of the two variables, its projections on the axes must include 
more than 68% of each marginal distribution. Just think about projecting 
the individual points onto the axes -- there are points outside of the 
ellipse that are inside its shadow on an individual axis.

I hope that this helps,
  John

____________________________
John Fox
Department of Sociology
McMaster University
email: jfox@mcmaster.ca
web: http://www.socsci.mcmaster.ca/jfox
#
Hi John and Deepayan, 

ok, I got your points and agree. You are right --- and I am sorry for
being too fast in sending this report.

Thank you for the help!!!
Volker
--