an interesting qqnorm question

If I understand your problem, you are computing the difference between
your data and the quantiles of a standard gaussian variable -- in
other words, the difference between the data and the red line, in the
following picture.

  N <- 100  # Sample size
  m <- 1    # Mean
  s <- 2    # dispersion
  x <- m + s * rt(N, df=2)  # Non-gaussian data

  qqnorm(x)
  abline(0,1, col="red") 

And you get 

  y <- sort(x) - qnorm(ppoints(N))
  hist(y)

This is probably not the right line (not only because your mean is 1, 
the slope is wrong as well -- if the data were gaussian, you could
estimate it with the standard deviation).

You can use the "qqline" function to get the line passing throught the
first and third quartiles, which is probably closer to what you have
in mind.

  qqnorm(x)
  abline(0,1, col="red") 
  qqline(x, col="blue")

The differences are 

  x1 <- quantile(x, .25)
  x2 <- quantile(x, .75)
  b <- (x2-x1) / (qnorm(.75)-qnorm(.25))
  a <- x1 - b * qnorm(.25)
  y <- sort(x) - (a + b * qnorm(ppoints(N)))
  hist(y)

And you want to know when the differences ceases to be "significantly"
different from zero.

  plot(y)
  abline(h=0, lty=3)

You can use the plot fo fix a threshold, but unless you have a model
describing how non-gaussian you data are, this will be empirical. 

You will note that, in those simulations, the differences (either
yours or those from the lines through the first and third quartiles)
are not gaussian at all.

-- Vincent

an interesting qqnorm question

Thread (4 messages)