Skip to content

Genstat into R - Randomisation test

10 messages · Anne Kempel, Michael Lawrence, Peter Dalgaard +3 more

#
Hello everybody,
I have a question. I would like to get a correlation between 
constitutive and induced plant defence which I messured on 30 plant 
species. So I have table with Species, Induced defence (ID), and 
constitutive defence (CD). Since Induced and constitutive defence are 
not independant (so called spurious correlation) I should do a 
randomisation test. I have a syntax of my supervisor in Genstat, but I 
would really like to try this in R.


"data from trade-off.IDCD"
list
variate [nval=1000] slope
calc ID1=ID

graph ID; CD
calc b=corr(ID; CD)
calc slope$[1]=b

"slope$[1] is the correlation before permutating the data"

for i=2...1000
     randomize ID1
     calc b=corr(CD1; ID1)
     calc slope$[i]=b
endfor

hist slope
describe slope
quantile [proportion=!(0.0005,0.005, 0.025, 0.975, 0.995,0.9995)]slope
print slope$[1]
corr[p=corr] ID,CD


DHISTOGRAM [WINDOW=1; ORIENTATION=vertical; KEY=0; BARCOVERING=1.0] 
slope; PEN=2

Does anybody have done something similar and has any idea how to make 
the randomisation part?
I would be very grateful for any help!!
Thanks in advance,
Anne
#
Looks like that code implements a non-exhaustive variant of the
randomization test, sometimes called a permutation test. Assuming you
have a data frame with columns ID & CD, this should do it:

n.obs = 100
ID=rnorm(n.obs)
CD=rnorm(n.obs)

obs.cor = cor(ID,CD)

num.permutations = 1e4
perm.cor = rep(NA,num.permutations)
start.time=proc.time()[1]
index = 1:n.obs
for(i in 1:num.permutations){
	IDorder = sample(index)
	CDorder = sample(index)
	perm.cor[i] = .Internal(cor(ID[IDorder], CD[CDorder], 4, FALSE))
}
cat('Elapsed time:',start.time-proc.time(1))
sum(perm.cor>obs.cor)/num.permutations
On Wed, Apr 8, 2009 at 10:02 AM, Anne Kempel <kempel at ips.unibe.ch> wrote:

  
    
#
"Assuming you have a data frame with columns ID & CD, this should do it"

Oops, the code I sent doesn't assume this. It assumes that you have
two vectors, ID & CD, as generated in the first 3 lines.
On Wed, Apr 8, 2009 at 11:31 AM, Mike Lawrence <Mike.Lawrence at dal.ca> wrote:

  
    
#
*Hits head*
Of course, the approach taken by your Genstat code of only shuffling
one variable is sufficient and faster:

n.obs = 100
ID=rnorm(n.obs)
CD=rnorm(n.obs)

obs.cor = cor(ID,CD)

num.permutations = 1e4
perm.cor = rep(NA,num.permutations)
start.time=proc.time()[1]
index = 1:n.obs
for(i in 1:num.permutations){
	IDorder = sample(index)
	perm.cor[i] = .Internal(cor(ID[IDorder], CD, 4, FALSE))
}
cat('Elapsed time:',start.time-proc.time(1))
sum(perm.cor>obs.cor)/num.permutations
On Wed, Apr 8, 2009 at 11:33 AM, Mike Lawrence <Mike.Lawrence at dal.ca> wrote:

  
    
#
*sigh*

and the elapsed time line should be:

cat('Elapsed time:',proc.time()[1]-start.time)
On Wed, Apr 8, 2009 at 11:35 AM, Mike Lawrence <Mike.Lawrence at dal.ca> wrote:

  
    
#
Mike Lawrence wrote:
Isn't it the other way around? (Permutation tests can be exhaustive by 
looking at all permutations, if a randomization test did that, then it 
wouldn't be random.)
#
I was taught that Fisher proposed the F-test as a computationally
simpler approximation to what he called a "Randomization test",
consisting of exhaustive permutations. I never looked at the original
Fisher reference myself, so this may be false.

However, I haven't observed a consistent nomenclature when I have seen
these tests discussed, so I typically ensure to mention whether what
I'm doing is exhaustive or non-exhaustive.

I do see the value in your interpretation, and think it makes sense to
drop "randomization"  as a name (despite it's possible historical
significance) and start using "exhaustive permutation test" (to
contrast with "non-exhaustive permutation test").
On Wed, Apr 8, 2009 at 1:18 PM, Peter Dalgaard <p.dalgaard at biostat.ku.dk> wrote:

  
    
#
Peter Dalgaard wrote:
> Mike Lawrence wrote:
>> Looks like that code implements a non-exhaustive variant of the
 >> randomization test, sometimes called a permutation test.
 >
 > Isn't it the other way around? (Permutation tests can be exhaustive 
by looking at all permutations, if a randomization test did that, then 
it wouldn't be random.)

Eugene Edgington wrote an early book (1980) on this subject called 
"Randomization tests", published by Dekker.  As far as I remember, he 
differentiates between "Systematic permutation" tests where one looks at 
all possible permutations.  This is of course prohibitive for anything 
beyond trivially small samples.  For larger samples he uses what he 
calls "Random permutations", where a random sample of the possible 
permutations is used.

Tom
Peter Dalgaard wrote:

  
    
#
At 04:43 AM 4/9/2009, Tom Backer Johnsen wrote:
Edginton and Onghena make a similar distinction in their book, but I 
think such a distinction is without merit.

Do we distinguish between "exact" definite integrals and 
"approximate" ones obtained by numerical integration, of which Monte 
Carlo sampling is just one class of algorithms? Don't we just say: 
"The integral was evaluated numerically by the [whatever] method to 
an accuracy of [whatever], and the value was found to be [whatever]." 
Ditto for optimization problems.

A randomization test has one correct answer based upon theory. We are 
simply trying to calculate that answer's value when it is difficult 
to do so. Any approximate method that is used must be performed such 
that the error of approximation is trivial with respect to the 
contemplated use.

Doing Monte Carlo sampling to find an approximate answer to a 
randomization test, or to an optimization problem, or to a bootstrap 
distribution should be carried out with enough realizations to make 
sure the residual error is trivial.

As Monte Carlo sampling is a "random" sampling-based approximate 
method. The name does create confusion in terminology for 
"randomization" tests for bootstrapping.

================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: ral at lcfltd.com
Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
824 Timberlake Drive                     Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947

"Vere scire est per causas scire"
6 days later
#
Robert, Tom, Peter and all, 

If I remember correctly (don't have my copy at hand right now),
Edgington and Onghena differentiate between randomization tests and
permutation tests along the following lines:

Randomization test: Apply only to randomized experiments, for which we
consider the theoretically possible re-randomizations given the
constraints for the original randomization. In this setting, inference
is valid for the observed population, and does not require random
sampling (while, of course, this assumption is needed if we want to
generalize the findings). I think they also call it re-randomization
tests at least once, while the prefix is usually omitted for simplicity,
but this makes it clearer that an initial randomization is required.

Permutation test: Can be applied even to data from non-randomized
experiments, requires, however, random sampling. Hence, one additional
assumption is needed (one that is common to inference using classical
tests). This is hardly addressed in Edgington & Onghena. 

In both cases, the "answer" can be approximated by random draws if full
randomizations/permutations are difficult or impossible to perform, but
this applies to both methods. The distinction in Edginton (1980) as
mentioned by Tom is not used in the latest edition. 

In this regard, the comparison with "exact vs. approximate integral"
seems inappropriate to me, as the distinction is conceptual, not
computational. Neither is one the (non-)exhaustive variant of the other.
I doubt that "randomization" test is named according to "_random_ choice
of permutations" as implied by Peter, it's rather based on
"randomization" (of an experiment) in its best statistical meaning. 

Not sure whether the distinction is needed, but it might be helpful in
some instances, at least if used consistently. I have never seen another
distinction between these two, and the literature is inconsistent in its
use as well.

Just my two cents worth.
Michael

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Robert A LaBudde
Sent: Donnerstag, 9. April 2009 17:38
To: Tom Backer Johnsen
Cc: r-help at r-project.org
Subject: Re: [R] Genstat into R - Randomisation test
At 04:43 AM 4/9/2009, Tom Backer Johnsen wrote:
Edginton and Onghena make a similar distinction in their book, but I
think such a distinction is without merit.

Do we distinguish between "exact" definite integrals and "approximate"
ones obtained by numerical integration, of which Monte Carlo sampling is
just one class of algorithms? Don't we just say: 
"The integral was evaluated numerically by the [whatever] method to an
accuracy of [whatever], and the value was found to be [whatever]." 
Ditto for optimization problems.

A randomization test has one correct answer based upon theory. We are
simply trying to calculate that answer's value when it is difficult to
do so. Any approximate method that is used must be performed such that
the error of approximation is trivial with respect to the contemplated
use.

Doing Monte Carlo sampling to find an approximate answer to a
randomization test, or to an optimization problem, or to a bootstrap
distribution should be carried out with enough realizations to make sure
the residual error is trivial.

As Monte Carlo sampling is a "random" sampling-based approximate method.
The name does create confusion in terminology for "randomization" tests
for bootstrapping.

================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: ral at lcfltd.com
Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
824 Timberlake Drive                     Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947

"Vere scire est per causas scire"

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.