Sampling in R

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090421/7d4ec48e/attachment-0001.pl>
Dear R users, 

I need to do sampling without replacement (bootstraps). I have two variables (Xvar, Yvar). 
I have a correlation from original data set cor(Xvar, Yvar)=0.6174221. I am doing 50000 sampling, 
and in each sampling  calculating correlations, saving, sorting and  getting 95% cutt off point (0.1351877). 
I am getting maximum value as 0.3507219 (much smaller than correlation of my original data). 
I repeated the sampling a  couple of time and none of them produced a correlation 
coefficient higher than my original data set. However, if I sort out my Xvar and Yvar and 
obtain correlation it is 0.9657125 which is much higher than correlation for my original data. 
I am doing sampling in another program and getting at least 1% higher correlation than mine. 
Now I am getting confused with sampling(random data) in R. My data and codes for the scenario above are below

Xvar<-c(0.1818182,0.5384615,0.5535714,0.4680851,0.4545455,0.4385965,0.5185185,0.4035088,0.4901961,0.3650794,0.462963,0.4,0.56,0.3965517,0.4909091,
        0.4716981,0.4310345,0.2,0.1509434,0.2647059,0.173913,0.1914894,0.1914894,0.1489362,0.1363636,0.2244898,0.2325581,0.1333333,0.1818182,0.1702128,
        0.2173913,0.2380952,0.1632653,0.5614035,0.3396226,0.4909091,0.3770492,0.5,0.5185185,0.5,0.4666667,0.4464286,0.362069,0.4285714,0.4561404,
        0.4736842,0.4545455,0.4166667,0.4181818,0.4590164,0.5166667,0.5423729,0.4833333,0.5454545,0.4393939,0.5172414,0.4098361,0.4745763,0.4754098,
        0.5166667,0.5,0.4603175,0.42,0.4038462,0.4897959,0.3148148,0.3673469,0.4,0.4583333,0.3877551,0.4375,0.4117647,0.4313725,0.5333333,0.3962264,
        0.3548387,0.5272727,0.4137931,0.3928571,0.4666667,0.4210526,0.4363636,0.4545455,0.4310345,0.4237288,0.4814815,0.4912281,0.4333333,0.4,0.4285714,
        0.4516129,0.5090909,0.4464286,0.4642857,0.4166667,0.4098361,0.4909091,0.3809524,0.5272727,0.4814815,0.5254237,0.627451,0.5,0.5471698,0.5454545,
        0.5925926,0.5769231,0.5818182,0.4444444,0.4915254,0.4727273,0.4107143,0.4285714,0.4310345,0.4237288,0.4285714,0.440678,0.4237288,0.4807692,
        0.4150943,0.4615385,0.4107143,0.4814815,0.4074074,0.4210526,0.5263158,0.440678,0.4576271,0.5344828,0.5,0.5636364,0.4677419,0.5,0.5192308,
        0.4642857,0.5090909,0.58,0.4482759,0.5098039,0.4035088,0.4210526,0.5098039,0.4385965,0.5283019,0.5471698,0.625,0.4310345,0.4912281,0.5283019,
        0.4576271,0.5471698,0.4745763,0.4821429)

Yvar<-c(0.2553191,0.4107143,0.5660377,0.3888889,0.3606557,0.2898551,0.3818182,0.4,0.4,0.3278689,0.2903226,0.4074074,0.4181818,0.3,0.2238806,0.3728814,
        0.3709677,0.2307692,0.2830189,0.2244898,0.2142857,0.2131148,0.22,0.2258065,0.2321429,0.2,0.2264151,0.22,0.2115385,0.2459016,0.1166667,0.1785714,
        0.2068966,0.6,0.4285714,0.3134328,0.4461538,0.3965517,0.4769231,0.6181818,0.4827586,0.3709677,0.3965517,0.4821429,0.4545455,0.359375,0.4576271,
        0.4516129,0.5272727,0.4603175,0.4,0.4912281,0.5384615,0.5,0.4516129,0.4126984,0.4655172,0.5263158,0.4925373,0.358209,0.4285714,0.4920635,
        0.4482759,0.3235294,0.4,0.4375,0.440678,0.3898305,0.35,0.4528302,0.58,0.4153846,0.3174603,0.5185185,0.3870968,0.2894737,0.3709677,0.369863,
        0.3676471,0.3636364,0.3088235,0.328125,0.4032258,0.4084507,0.3188406,0.3636364,0.3823529,0.2816901,0.4722222,0.5,0.3521127,0.4393939,0.3787879,
        0.453125,0.4324324,0.4057971,0.4545455,0.4492754,0.5,0.4098361,0.4067797,0.3666667,0.3928571,0.4285714,0.5,0.2923077,0.4561404,0.45,0.5538462,
        0.4626866,0.4057971,0.3676471,0.5322581,0.5428571,0.375,0.4411765,0.4571429,0.4,0.3846154,0.3870968,0.4915254,0.530303,0.4375,0.4918033,0.4179104,
        0.4032258,0.3606557,0.5178571,0.4848485,0.390625,0.375,0.4375,0.3666667,0.4,0.4477612,0.2571429,0.4032258,0.3382353,0.4814815,0.4090909,0.3548387,
        0.4821429,0.5,0.557377,0.4333333,0.5454545,0.4590164,0.3943662,0.5076923,0.5,0.3283582,0.3676471,0.559322)

my.cor<-cor(Xvar, Yvar)
print(my.cor)

nperm<-49999
Perm.Cor<-NULL

for (iperm in 1:nperm)  {
XvarNew<-sample(Xvar, size=length(Xvar), replace=FALSE)    
YvarNew<-sample(Yvar, size=length(Yvar), replace=FALSE)     
perm.cor<-cor(XvarNew, YvarNew)
Perm.Cor<-c(Perm.Cor, perm.cor)
                        }
print(max(Perm.Cor))
XvarSorted<-sort(Xvar, decreasing=TRUE)
YvarSorted<-sort(Yvar, decreasing=TRUE)
max.cor<-cor(XvarSorted, YvarSorted)
print(max.cor)
if(mat.cor>0) Perm.Cor.Sorted<-sort(Perm.Cor, decreasing=TRUE)                        
if(mat.cor<0) Perm.Cor.Sorted<-sort(Perm.Cor, decreasing=FALSE)                        
T95<-Perm.Cor.Sorted[(nperm+1)*0.05]    # 95% treshold value
T99<-Perm.Cor.Sorted[(nperm+1)*0.01]    # 99% treshold value

I want to understand where I am making a mistake. Any comment is deeply appreciated.
Well, if you are permuting Xvar and Yvar separately or sorting them 
(separately), then you cannot expect to get the same correlation again. 
Look at the formula and make an example for yourself with just, say, 3 
data points ...

Uwe Ligges
Kind Regards

Seyit Ali

------------------------------------------------------------------------------------------------------------------ 
Dr. Seyit Ali KAYIS
Selcuk University
Faculty of Agriculture
Kampus, Konya, TURKEY

            s_a_kayis at yahoo.com,    s_a_kayis at hotmail.com
Tell: +90 332 223 2830  Mobile: +90 535 587 1139  Fax: +90 332 241 0108

                   Greetings from Konya, TURKEY
                http://www.ziraat.selcuk.edu.tr/skayis/
---------------------------------------------------------------------------------------------------------------------- 

_________________________________________________________________
Earning enough? Find out with SEEK Salary Survey

%2Eco%2Enz%2F%3Ftracking%3Dsk%3Atl%3Asknzsal%3Amsnnz%3A0%3Ahottag%3Aearn%5Fenough&_t=757263783&_r=Seek_NZ_tagline&_m=EXT
	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Hi

r-help-bounces at r-project.org napsal dne 21.04.2009 12:25:01:
Dear R users, 

I need to do sampling without replacement (bootstraps). I have two 
variables
(Xvar, Yvar). 
I have a correlation from original data set cor(Xvar, Yvar)=0.6174221. I 
am
doing 50000 sampling, 
and in each sampling  calculating correlations, saving, sorting and 
getting
95% cutt off point (0.1351877). 
I am getting maximum value as 0.3507219 (much smaller than correlation 
of my
original data). 
I repeated the sampling a  couple of time and none of them produced a 
correlation
coefficient higher than my original data set. However, if I sort out my 
Xvar
and Yvar and 
obtain correlation it is 0.9657125 which is much higher than correlation 
for
my original data. 
I am doing sampling in another program and getting at least 1% higher 
correlation than mine. 
Now I am getting confused with sampling(random data) in R. My data and 
codes
for the scenario above are below

Xvar<-c(0.1818182,0.5384615,0.5535714,0.4680851,0.4545455,0.4385965,0.5185185,
0.4035088,0.4901961,0.3650794,0.462963,0.4,0.56,0.3965517,0.4909091,
0.4716981,0.4310345,0.2,0.1509434,0.2647059,0.173913,0.1914894,0.

1914894,0.1489362,0.1363636,0.2244898,0.2325581,0.1333333,0.1818182,0.1702128,
0.2173913,0.2380952,0.1632653,0.5614035,0.3396226,0.4909091,0.3770492,
0.5,0.5185185,0.5,0.4666667,0.4464286,0.362069,0.4285714,0.4561404,
0.4736842,0.4545455,0.4166667,0.4181818,0.4590164,0.5166667,0.5423729,
0.4833333,0.5454545,0.4393939,0.5172414,0.4098361,0.4745763,0.4754098,
0.5166667,0.5,0.4603175,0.42,0.4038462,0.4897959,0.3148148,0.3673469,
0.4,0.4583333,0.3877551,0.4375,0.4117647,0.4313725,0.5333333,0.3962264,
0.3548387,0.5272727,0.4137931,0.3928571,0.4666667,0.4210526,0.4363636,

0.4545455,0.4310345,0.4237288,0.4814815,0.4912281,0.4333333,0.4,0.4285714,
0.4516129,0.5090909,0.4464286,0.4642857,0.4166667,0.4098361,0.4909091,

0.3809524,0.5272727,0.4814815,0.5254237,0.627451,0.5,0.5471698,0.5454545,
0.5925926,0.5769231,0.5818182,0.4444444,0.4915254,0.4727273,0.4107143,
0.4285714,0.4310345,0.4237288,0.4285714,0.440678,0.4237288,0.4807692,
0.4150943,0.4615385,0.4107143,0.4814815,0.4074074,0.4210526,0.5263158,
0.440678,0.4576271,0.5344828,0.5,0.5636364,0.4677419,0.5,0.5192308,
0.4642857,0.5090909,0.58,0.4482759,0.5098039,0.4035088,0.4210526,0.

5098039,0.4385965,0.5283019,0.5471698,0.625,0.4310345,0.4912281,0.5283019,
        0.4576271,0.5471698,0.4745763,0.4821429)

Yvar<-c(0.2553191,0.4107143,0.5660377,0.3888889,0.3606557,0.2898551,0.3818182,
0.4,0.4,0.3278689,0.2903226,0.4074074,0.4181818,0.3,0.2238806,0.3728814,
0.3709677,0.2307692,0.2830189,0.2244898,0.2142857,0.2131148,0.22,0.

2258065,0.2321429,0.2,0.2264151,0.22,0.2115385,0.2459016,0.1166667,0.1785714,
0.2068966,0.6,0.4285714,0.3134328,0.4461538,0.3965517,0.4769231,0.

6181818,0.4827586,0.3709677,0.3965517,0.4821429,0.4545455,0.359375,0.4576271,
0.4516129,0.5272727,0.4603175,0.4,0.4912281,0.5384615,0.5,0.4516129,0.
4126984,0.4655172,0.5263158,0.4925373,0.358209,0.4285714,0.4920635,
0.4482759,0.3235294,0.4,0.4375,0.440678,0.3898305,0.35,0.4528302,0.58,
0.4153846,0.3174603,0.5185185,0.3870968,0.2894737,0.3709677,0.369863,
0.3676471,0.3636364,0.3088235,0.328125,0.4032258,0.4084507,0.3188406,

0.3636364,0.3823529,0.2816901,0.4722222,0.5,0.3521127,0.4393939,0.3787879,
0.453125,0.4324324,0.4057971,0.4545455,0.4492754,0.5,0.4098361,0.

4067797,0.3666667,0.3928571,0.4285714,0.5,0.2923077,0.4561404,0.45,0.5538462,
0.4626866,0.4057971,0.3676471,0.5322581,0.5428571,0.375,0.4411765,0.

4571429,0.4,0.3846154,0.3870968,0.4915254,0.530303,0.4375,0.4918033,0.4179104,
        0.4032258,0.3606557,0.5178571,0.4848485,0.390625,0.375,0.4375,0.

3666667,0.4,0.4477612,0.2571429,0.4032258,0.3382353,0.4814815,0.4090909,0.3548387,
0.4821429,0.5,0.557377,0.4333333,0.5454545,0.4590164,0.3943662,0.
5076923,0.5,0.3283582,0.3676471,0.559322)

my.cor<-cor(Xvar, Yvar)
print(my.cor)

nperm<-49999
Perm.Cor<-NULL

for (iperm in 1:nperm)  {
XvarNew<-sample(Xvar, size=length(Xvar), replace=FALSE) 
YvarNew<-sample(Yvar, size=length(Yvar), replace=FALSE) 
perm.cor<-cor(XvarNew, YvarNew)
Perm.Cor<-c(Perm.Cor, perm.cor)
                        }
AFAICU you do not sample your data you shuffle them. Then you compute cor 
with shuffled data (X and Y are shuffled independently) which results in 
low correlation (it is like shuffling cards).

Maybe you could use smaller size and sample not original data but a vector 
of indices

perm.cor<-rep(NA, 49999)

for (iperm in 1:nperm)  {
ind <- sample(1:length(Xvar), size = 100, replace=FALSE)
perm.cor[iperm] <- cor(Xvar[ind], Yvar[ind])
perm.cor
}
max(perm.cor)
hist(perm.cor)

The result seems to be quite reasonable.

Regards
Petr
print(max(Perm.Cor))
XvarSorted<-sort(Xvar, decreasing=TRUE)
YvarSorted<-sort(Yvar, decreasing=TRUE)
max.cor<-cor(XvarSorted, YvarSorted)
print(max.cor)
if(mat.cor>0) Perm.Cor.Sorted<-sort(Perm.Cor, decreasing=TRUE)   
if(mat.cor<0) Perm.Cor.Sorted<-sort(Perm.Cor, decreasing=FALSE)   
T95<-Perm.Cor.Sorted[(nperm+1)*0.05]    # 95% treshold value
T99<-Perm.Cor.Sorted[(nperm+1)*0.01]    # 99% treshold value

I want to understand where I am making a mistake. Any comment is deeply 
appreciated.
Kind Regards

Seyit Ali

------------------------------------------------------------------------------------------------------------------
Dr. Seyit Ali KAYIS
Selcuk University
Faculty of Agriculture
Kampus, Konya, TURKEY

            s_a_kayis at yahoo.com,    s_a_kayis at hotmail.com
Tell: +90 332 223 2830  Mobile: +90 535 587 1139  Fax: +90 332 241 0108

                   Greetings from Konya, TURKEY
                http://www.ziraat.selcuk.edu.tr/skayis/

----------------------------------------------------------------------------------------------------------------------

_________________________________________________________________
Earning enough? Find out with SEEK Salary Survey

%2Eco%2Enz%2F%3Ftracking%3Dsk%3Atl%3Asknzsal%3Amsnnz%3A0%3Ahottag%3Aearn%
5Fenough&_t=757263783&_r=Seek_NZ_tagline&_m=EXT
   [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
When you shuffle the observations independently, you are performing a
permutation test (though for this you only need to shuffle one side of
the pairs). When you sort the observations you are doing something
ridiculous that has no statistical meaning that I know.

I'm not very familiar with bootstrap CI's, but I think the idea is to
sample the PAIRS of data WITH replacement:
http://lmgtfy.com/?q=bootstrap+correlation

(first link is to a good overview by David Howell)
Dear R users,

I need to do sampling without replacement (bootstraps). I have two variables (Xvar, Yvar).
I have a correlation from original data set cor(Xvar, Yvar)=0.6174221. I am doing 50000 sampling,
and in each sampling ?calculating correlations, saving, sorting and ?getting 95% cutt off point (0.1351877).
I am getting maximum value as 0.3507219 (much smaller than correlation of my original data).
I repeated the sampling a ?couple of time and none of them produced a correlation
coefficient higher than my original data set. However, if I sort out my Xvar and Yvar and
obtain correlation it is 0.9657125 which is much higher than correlation for my original data.
I am doing sampling in another program and getting at least 1% higher correlation than mine.
Now I am getting confused with sampling(random data) in R. My data and codes for the scenario above are below

Xvar<-c(0.1818182,0.5384615,0.5535714,0.4680851,0.4545455,0.4385965,0.5185185,0.4035088,0.4901961,0.3650794,0.462963,0.4,0.56,0.3965517,0.4909091,
? ? ? ?0.4716981,0.4310345,0.2,0.1509434,0.2647059,0.173913,0.1914894,0.1914894,0.1489362,0.1363636,0.2244898,0.2325581,0.1333333,0.1818182,0.1702128,
? ? ? ?0.2173913,0.2380952,0.1632653,0.5614035,0.3396226,0.4909091,0.3770492,0.5,0.5185185,0.5,0.4666667,0.4464286,0.362069,0.4285714,0.4561404,
? ? ? ?0.4736842,0.4545455,0.4166667,0.4181818,0.4590164,0.5166667,0.5423729,0.4833333,0.5454545,0.4393939,0.5172414,0.4098361,0.4745763,0.4754098,
? ? ? ?0.5166667,0.5,0.4603175,0.42,0.4038462,0.4897959,0.3148148,0.3673469,0.4,0.4583333,0.3877551,0.4375,0.4117647,0.4313725,0.5333333,0.3962264,
? ? ? ?0.3548387,0.5272727,0.4137931,0.3928571,0.4666667,0.4210526,0.4363636,0.4545455,0.4310345,0.4237288,0.4814815,0.4912281,0.4333333,0.4,0.4285714,
? ? ? ?0.4516129,0.5090909,0.4464286,0.4642857,0.4166667,0.4098361,0.4909091,0.3809524,0.5272727,0.4814815,0.5254237,0.627451,0.5,0.5471698,0.5454545,
? ? ? ?0.5925926,0.5769231,0.5818182,0.4444444,0.4915254,0.4727273,0.4107143,0.4285714,0.4310345,0.4237288,0.4285714,0.440678,0.4237288,0.4807692,
? ? ? ?0.4150943,0.4615385,0.4107143,0.4814815,0.4074074,0.4210526,0.5263158,0.440678,0.4576271,0.5344828,0.5,0.5636364,0.4677419,0.5,0.5192308,
? ? ? ?0.4642857,0.5090909,0.58,0.4482759,0.5098039,0.4035088,0.4210526,0.5098039,0.4385965,0.5283019,0.5471698,0.625,0.4310345,0.4912281,0.5283019,
? ? ? ?0.4576271,0.5471698,0.4745763,0.4821429)

Yvar<-c(0.2553191,0.4107143,0.5660377,0.3888889,0.3606557,0.2898551,0.3818182,0.4,0.4,0.3278689,0.2903226,0.4074074,0.4181818,0.3,0.2238806,0.3728814,
? ? ? ?0.3709677,0.2307692,0.2830189,0.2244898,0.2142857,0.2131148,0.22,0.2258065,0.2321429,0.2,0.2264151,0.22,0.2115385,0.2459016,0.1166667,0.1785714,
? ? ? ?0.2068966,0.6,0.4285714,0.3134328,0.4461538,0.3965517,0.4769231,0.6181818,0.4827586,0.3709677,0.3965517,0.4821429,0.4545455,0.359375,0.4576271,
? ? ? ?0.4516129,0.5272727,0.4603175,0.4,0.4912281,0.5384615,0.5,0.4516129,0.4126984,0.4655172,0.5263158,0.4925373,0.358209,0.4285714,0.4920635,
? ? ? ?0.4482759,0.3235294,0.4,0.4375,0.440678,0.3898305,0.35,0.4528302,0.58,0.4153846,0.3174603,0.5185185,0.3870968,0.2894737,0.3709677,0.369863,
? ? ? ?0.3676471,0.3636364,0.3088235,0.328125,0.4032258,0.4084507,0.3188406,0.3636364,0.3823529,0.2816901,0.4722222,0.5,0.3521127,0.4393939,0.3787879,
? ? ? ?0.453125,0.4324324,0.4057971,0.4545455,0.4492754,0.5,0.4098361,0.4067797,0.3666667,0.3928571,0.4285714,0.5,0.2923077,0.4561404,0.45,0.5538462,
? ? ? ?0.4626866,0.4057971,0.3676471,0.5322581,0.5428571,0.375,0.4411765,0.4571429,0.4,0.3846154,0.3870968,0.4915254,0.530303,0.4375,0.4918033,0.4179104,
? ? ? ?0.4032258,0.3606557,0.5178571,0.4848485,0.390625,0.375,0.4375,0.3666667,0.4,0.4477612,0.2571429,0.4032258,0.3382353,0.4814815,0.4090909,0.3548387,
? ? ? ?0.4821429,0.5,0.557377,0.4333333,0.5454545,0.4590164,0.3943662,0.5076923,0.5,0.3283582,0.3676471,0.559322)

my.cor<-cor(Xvar, Yvar)
print(my.cor)

nperm<-49999
Perm.Cor<-NULL

for (iperm in 1:nperm) ?{
XvarNew<-sample(Xvar, size=length(Xvar), replace=FALSE)
YvarNew<-sample(Yvar, size=length(Yvar), replace=FALSE)
perm.cor<-cor(XvarNew, YvarNew)
Perm.Cor<-c(Perm.Cor, perm.cor)
? ? ? ? ? ? ? ? ? ? ? ?}
print(max(Perm.Cor))
XvarSorted<-sort(Xvar, decreasing=TRUE)
YvarSorted<-sort(Yvar, decreasing=TRUE)
max.cor<-cor(XvarSorted, YvarSorted)
print(max.cor)
if(mat.cor>0) Perm.Cor.Sorted<-sort(Perm.Cor, decreasing=TRUE)
if(mat.cor<0) Perm.Cor.Sorted<-sort(Perm.Cor, decreasing=FALSE)
T95<-Perm.Cor.Sorted[(nperm+1)*0.05] ? ?# 95% treshold value
T99<-Perm.Cor.Sorted[(nperm+1)*0.01] ? ?# 99% treshold value

I want to understand where I am making a mistake. Any comment is deeply appreciated.

Kind Regards

Seyit Ali

------------------------------------------------------------------------------------------------------------------
Dr. Seyit Ali KAYIS
Selcuk University
Faculty of Agriculture
Kampus, Konya, TURKEY

? ? ? ? ? ?s_a_kayis at yahoo.com, ? ?s_a_kayis at hotmail.com
Tell: +90 332 223 2830 ?Mobile: +90 535 587 1139 ?Fax: +90 332 241 0108

? ? ? ? ? ? ? ? ? Greetings from Konya, TURKEY
? ? ? ? ? ? ? ?http://www.ziraat.selcuk.edu.tr/skayis/
----------------------------------------------------------------------------------------------------------------------

_________________________________________________________________
Earning enough? Find out with SEEK Salary Survey

%2Eco%2Enz%2F%3Ftracking%3Dsk%3Atl%3Asknzsal%3Amsnnz%3A0%3Ahottag%3Aearn%5Fenough&_t=757263783&_r=Seek_NZ_tagline&_m=EXT
? ? ? ?[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Mike Lawrence
Graduate Student
Department of Psychology
Dalhousie University

Looking to arrange a meeting? Check my public calendar:
http://tr.im/mikes_public_calendar

~ Certainty is folly... I think. ~