Skip to content

Sampling in R

4 messages · Seyit Ali Kayis, Uwe Ligges, PIKAL Petr +1 more

#
Seyit Ali Kayis wrote:
Well, if you are permuting Xvar and Yvar separately or sorting them 
(separately), then you cannot expect to get the same correlation again. 
Look at the formula and make an example for yourself with just, say, 3 
data points ...

Uwe Ligges
#
Hi

r-help-bounces at r-project.org napsal dne 21.04.2009 12:25:01:
variables
am
getting
of my
correlation
Xvar
for
codes
Xvar<-c(0.1818182,0.5384615,0.5535714,0.4680851,0.4545455,0.4385965,0.5185185,
1914894,0.1489362,0.1363636,0.2244898,0.2325581,0.1333333,0.1818182,0.1702128,
0.4545455,0.4310345,0.4237288,0.4814815,0.4912281,0.4333333,0.4,0.4285714,
0.3809524,0.5272727,0.4814815,0.5254237,0.627451,0.5,0.5471698,0.5454545,
5098039,0.4385965,0.5283019,0.5471698,0.625,0.4310345,0.4912281,0.5283019,
Yvar<-c(0.2553191,0.4107143,0.5660377,0.3888889,0.3606557,0.2898551,0.3818182,
2258065,0.2321429,0.2,0.2264151,0.22,0.2115385,0.2459016,0.1166667,0.1785714,
6181818,0.4827586,0.3709677,0.3965517,0.4821429,0.4545455,0.359375,0.4576271,
0.3636364,0.3823529,0.2816901,0.4722222,0.5,0.3521127,0.4393939,0.3787879,
4067797,0.3666667,0.3928571,0.4285714,0.5,0.2923077,0.4561404,0.45,0.5538462,
4571429,0.4,0.3846154,0.3870968,0.4915254,0.530303,0.4375,0.4918033,0.4179104,
3666667,0.4,0.4477612,0.2571429,0.4032258,0.3382353,0.4814815,0.4090909,0.3548387,
AFAICU you do not sample your data you shuffle them. Then you compute cor 
with shuffled data (X and Y are shuffled independently) which results in 
low correlation (it is like shuffling cards).

Maybe you could use smaller size and sample not original data but a vector 
of indices

perm.cor<-rep(NA, 49999)

for (iperm in 1:nperm)  {
ind <- sample(1:length(Xvar), size = 100, replace=FALSE)
perm.cor[iperm] <- cor(Xvar[ind], Yvar[ind])
perm.cor
}
max(perm.cor)
hist(perm.cor)

The result seems to be quite reasonable.

Regards
Petr
appreciated.
------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------
%2Eco%2Enz%2F%3Ftracking%3Dsk%3Atl%3Asknzsal%3Amsnnz%3A0%3Ahottag%3Aearn%
http://www.R-project.org/posting-guide.html
#
When you shuffle the observations independently, you are performing a
permutation test (though for this you only need to shuffle one side of
the pairs). When you sort the observations you are doing something
ridiculous that has no statistical meaning that I know.

I'm not very familiar with bootstrap CI's, but I think the idea is to
sample the PAIRS of data WITH replacement:
http://lmgtfy.com/?q=bootstrap+correlation

(first link is to a good overview by David Howell)
On Tue, Apr 21, 2009 at 7:25 AM, Seyit Ali Kayis <s_a_kayis at hotmail.com> wrote: