an interesting qqnorm question
If I understand your problem, you are computing the difference between your data and the quantiles of a standard gaussian variable -- in other words, the difference between the data and the red line, in the following picture. N <- 100 # Sample size m <- 1 # Mean s <- 2 # dispersion x <- m + s * rt(N, df=2) # Non-gaussian data qqnorm(x) abline(0,1, col="red") And you get y <- sort(x) - qnorm(ppoints(N)) hist(y) This is probably not the right line (not only because your mean is 1, the slope is wrong as well -- if the data were gaussian, you could estimate it with the standard deviation). You can use the "qqline" function to get the line passing throught the first and third quartiles, which is probably closer to what you have in mind. qqnorm(x) abline(0,1, col="red") qqline(x, col="blue") The differences are x1 <- quantile(x, .25) x2 <- quantile(x, .75) b <- (x2-x1) / (qnorm(.75)-qnorm(.25)) a <- x1 - b * qnorm(.25) y <- sort(x) - (a + b * qnorm(ppoints(N))) hist(y) And you want to know when the differences ceases to be "significantly" different from zero. plot(y) abline(h=0, lty=3) You can use the plot fo fix a threshold, but unless you have a model describing how non-gaussian you data are, this will be empirical. You will note that, in those simulations, the differences (either yours or those from the lines through the first and third quartiles) are not gaussian at all. -- Vincent
On 4/22/05, WeiWei Shi <helprhelp at gmail.com> wrote:
hope it is not b/c some central limit therory, otherwise my initial plan will fail :) On 4/22/05, WeiWei Shi <helprhelp at gmail.com> wrote:
Hi, r-gurus:
I happened to have a question in my work:
I have a dataset, which has only one dimention, like
0.99037297527605
0.991179836732708
0.995635340631367
0.997186769599305
0.991632565640424
0.984047197106486
0.99225943762649
1.00555642128421
0.993725402926564
....
the data is saved in a file called f392.txt.
I used the following codes to play around :)
k<-read.table("f392.txt", header=F) # read into k
kk<-k[[1]]
l<-qqnorm(kk)
diff=c()
lenk<-length(kk)
i=1
while (i<=lenk){
diff[i]=l$y[i]-l$x[i] # save the difference of therotical quantile
and sample quantile
# remember, my sample mean is around 1
while the therotical one, 0
i<-i+1
}
hist(diff, breaks=300) # analyze the distr of such diff
qqnorm(diff)
my question is:
from l<-qqnorm(kk), I wanted to know, from which point (or cut), the
sample points start to become away from therotical ones. That's the
reason I played around the "diff" list, which gives me the difference.
To my surprise, the diff is perfectly normal. I tried to use some
kk<-c(1, 2, 5, -1 , ...) to test, I concluded it must be some
distribution my sample follows gives this finding.
So, any suggestion on the distribution of my sample? I think there
might be some mathematical inference which can leads this observation,
but not quite sure.
btw,
fitdistr(kk, 't')
m s df 9.999965e-01 7.630770e-03 3.742244e+00 (5.317674e-05) (5.373884e-05) (8.584725e-02) btw2, can anyone suggest a way to find the "cut" or "threshold" from my sample to discretize them into 3 groups: two tail-group and one main group.--------- my focus. Thanks, Ed
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html