Hierfstat: g-statistics permutation

Tue, Nov 1, 2016 9:25 PM

Hi
Using some of the permutation functions in hierfstat (test.g; test.between;
gstat.randtest), I've encountered the same issue raised in this recent post:
https://stat.ethz.ch/pipermail/r-sig-genetics/2016-August/000099.html

In my case (6 pops x 8ind; ~8000 loci), testing for significant
differentiation among pops with small numbers of loci (e.g. 20) appears
normal, but as number of loci increase >100 the permuted g-statistics
mostly converge on a single value or 0.

This appears related to missing data. Demonstrated below on small dataset.

library(hierfstat)
#Simulate 100 loci
dat.sim <-
sim.genot(size=8,nbal=2,nbloc=100,nbpop=6,N=1000,mig=0.001,mut=0.0001,f=0)

#perform permutation test with no missing data
g <- test.g(dat.sim[,-1], level = dat.sim$Pop, nperm = 100)

#are all 100 permuted g-statistics are unique? yes
length(unique(g$g.star))

#add one missing genotype per locus. Total missing data = 1%
dat.sim[,-1] <- apply(dat.sim[,-1], 2, function(x){x[sample(1:48,1)] <- NA;
x})

#how many permuted g-statistics are unique? generally less than 20.
length(unique(g$g.star))

Is there any way to use these functions to calculate p-values when missing
data is present? The functions work fine when I remove loci with missing
data, so it's not the end of the world.
Thanks for any advice.
Regards, Dan

Dr. Daniel J. Schmidt
Research Fellow, Australian Rivers Institute
Griffith University 170 Kessels Road, Nathan
Brisbane QLD 4111 Australia

d.schmidt at griffith.edu.au
Office: +61 7 37354165
http://www.rivers.edu.au

	[[alternative HTML version deleted]]